It’s impossible to imagine the Internal Revenue Service or most other number-crunching agencies or companies working without computers. But when the IRS went to computers — the Automatic Data Processing system –there was an uproar. The agency went so far as to produce a short film on the topic called Right On The Button, to convince the public computers were a good thing.
Tag Archives: big data
US seeks information on industry ability to hold bulk phone data
The RFI has been posted to the Federal Business Opportunities site that lists federal government procurement opportunities. The government is looking for information on whether commercially available services can, among other things, provide secure storage and high availability to U.S. telephone metadata records for a sufficient period of time, and ensure that there are no unauthorized queries of the database and no data is provided to the government without proper authorization.
via US seeks information on industry ability to hold bulk phone data | ITworld.
What Hard Drive Should I Buy?
At the end of 2013, we had 27,134 consumer-grade drives spinning in Backblaze Storage Pods. The breakdown by brand looks like this:
Hard Drives by Manufacturer
Brand Number
of DrivesTerabytes Average
Age in YearsSeagate 12,765 39,576 1.4 Hitachi 12,956 36,078 2.0 Western Digital 2,838 2,581 2.5 Toshiba 58 174 0.7 Samsung 18 18 3.7
via Backblaze Blog » What Hard Drive Should I Buy?.
Why do we have the drives we have? Basically, we buy the least expensive drives that will work
There are a lot of numbers tossed around in this article that are difficult to summarize. The above table shows the data set they worked from.
Facebook Considers Vast Increase in Data Collection
The social network may start collecting data on minute user interactions with its content, such as how long a user’s cursor hovers over a certain part of its website, or whether a user’s newsfeed is visible at a given moment on the screen of his or her mobile phone, Facebook analytics chief Ken Rudin said Tuesday during an interview.
via Facebook Considers Vast Increase in Data Collection – Digits – WSJ.
As the head of analytics, Mr. Rudin is preparing the company’s infrastructure for a massive increase in the volume of its data.
Scientific Data Has Become So Complex, We Have to Invent New Math to Deal With It
Scientists like DeDeo and Vespignani make good use of this piecemeal approach to big data analysis, but Yale University mathematician Ronald Coifman says that what is really needed is the big data equivalent of a Newtonian revolution, on par with the 17th century invention of calculus, which he believes is already underway. It is not sufficient, he argues, to simply collect and store massive amounts of data; they must be intelligently curated, and that requires a global framework.
via Scientific Data Has Become So Complex, We Have to Invent New Math to Deal With It – Wired Science.
Among the most notable insights Euler gleaned from the puzzle was that the exact positions of the bridges were irrelevant to the solution; all that mattered was the number of bridges and how they were connected. Mathematicians now recognize in this the seeds of the modern field of topology.
Finding Data on the Internet
The following list of data sources has been modified as of 8/19/13. Most of the data sets listed below are free, however, some are not.
via Finding Data on the Internet | inside-R | A Community Site for R.
In ACLU lawsuit, scientist demolishes NSA’s “It’s just metadata” excuse
Storage and data-mining have come a long way in the past 35 years, Felten notes, and metadata is uniquely easy to analyze—unlike the complicated data of a call itself, with variations in language, voice, and conversation style. “This newfound data storage capacity has led to new ways of exploiting the digital record,” writes Felten. “Sophisticated computing tools permit the analysis of large datasets to identify embedded patterns and relationships, including personal details, habits, and behaviors.”
via In ACLU lawsuit, scientist demolishes NSA’s “It’s just metadata” excuse | Ars Technica.
I remember Ed Felton as being one of the leading researchers who uncovered the Sony rootkit fiasco. Many years ago Sony included a rootkit installer that would install whenever someone played one of their CDs on a Windows PC. Felton’s blog at the time covered that situation well.
The evolution of the NSA’s XKeyscore
In the current generation of Narus’ system, the processing systems run on commodity Linux servers and re-assemble network sessions as they’re captured, mining them for metadata, file attachments, and other application data and then indexing and dumping that information to a searchable database.
via Building a panopticon: The evolution of the NSA’s XKeyscore | Ars Technica.
The hot new technology in Big Data is decades old: SQL
Over the past six months, vendors have responded to the demand for more corporate-friendly analytics by announcing a slew of systems that offer full SQL query capabilities with significant performance improvements over existing Hive/Hadoop systems. These systems are designed to allow full SQL queries over warehouse-size data sets, and in most cases they bypass Hadoop entirely (although some are hybrid approaches). Allowing much faster SQL queries at scale makes big data analytics accessible by many more people in the enterprise and fits in with existing workflows.
via The hot new technology in Big Data is decades old: SQL | Ars Technica.
U.S., British intelligence mining data from nine U.S. Internet companies in broad secret program
Congress obliged with the Protect America Act in 2007 and the FISA Amendments Act of 2008, which immunized private companies that cooperated voluntarily with U.S. intelligence collection. PRISM recruited its first partner, Microsoft, and began six years of rapidly growing data collection beneath the surface of a roiling national debate on surveillance and privacy. Late last year, when critics in Congress sought changes in the FISA Amendments Act, the only lawmakers who knew about PRISM were bound by oaths of office to hold their tongues.
That will teach people not to put so much trust into the cloud.