U.S., British intelligence mining data from nine U.S. Internet companies in broad secret program

Congress obliged with the Protect America Act in 2007 and the FISA Amendments Act of 2008, which immunized private companies that cooperated voluntarily with U.S. intelligence collection. PRISM recruited its first partner, Microsoft, and began six years of rapidly growing data collection beneath the surface of a roiling national debate on surveillance and privacy. Late last year, when critics in Congress sought changes in the FISA Amendments Act, the only lawmakers who knew about PRISM were bound by oaths of office to hold their tongues.

via U.S., British intelligence mining data from nine U.S. Internet companies in broad secret program – The Washington Post.

That will teach people not to put so much trust into the cloud.

Hacking into the Indian Education System

Technically put, I merely needed to write a script to iterate through the various school IDs, check the different servers, and start with a student ID of 1 yet have a way to detect when there were no more students for a given school. I had to retrieve the resultant html files and parse them to extract all the useful information – Name, Date of Birth, ID, School, Marks.

via Hacking into the Indian Education System – On the Stepping Stone – Quora.

Several hours later, I had all the ISC and ICSE results on my very own computer, in a bunch of comma-separated value files. It was truly incredible. 26 megabytes of pure, magnificent data. An Excel file I couldn’t scroll to the bottom of. Just for kicks, I Ctrl+F’d a few names I knew and what do you know? There they were. Line after line of names, subjects and numbers. It was truly mesmerizing.

For Riot Games, Big Data Is Serious Business

Once Riot Games opened up a European base of operations, it couldn’t fit all its data into one instance of mySQL. “So we created a separate instance. That was a bad precedent and we needed to change that,” Livingston added. “We moved quickly to Hadoop as a scalable low-cost storage system. We use Hive to overlay an SQL-type interface on top of the Hadoop File System.” That helped scale up, but “the downside is that it takes a long time to spin up to do your queries, some taking a minute or more to complete, so it is difficult to iterate and build complex queries using Hive.”

via For Riot Games, Big Data Is Serious Business.

Part of the challenge is to maintain a level playing field for all players, yet constantly tweaking game play and game mechanics to make it more interesting for returning players: “We need lots of insight so that competitive play will continue to happen. We don’t want different versions of the game for pros and noobs, for example.”

Welcome to Hive!

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL

via Welcome to Hive!.

Everything You Wanted to Know About Data Mining but Were Afraid to Ask

With data mining it is possible to let the data itself determine the groups. This is one of the black-box type of algorithms that are hard to understand. But in a simple example – again with purchasing behavior – we can imagine that the purchasing habits of different hobbyists would look quite different from each other: gardeners, fishermen and model airplane enthusiasts would all be quite distinct. Machine learning algorithms can detect all of the different subgroups within a dataset that differ significantly from each other.

via Everything You Wanted to Know About Data Mining but Were Afraid to Ask – Alexander Furnas – Technology – The Atlantic.

How To Catch a Criminal With Data

The researchers ultimately turned the department onto a predictive software called SPSS, which had for years been used to crunch data in a host of disciplines not necessarily connected to crime. The department launched a pilot program with it to analyze trends, as part of a strategy of fighting crime by real-time data-mining.

via How To Catch a Criminal With Data – Technology – The Atlantic Cities.

IBM acquired SPSS back in 2009, and did the same late last year with Knisley’s software company, i2. On a computer monitor, Knisley had pulled up a program called COPLINK, which sucks into one massive database all that disjointed information that was once scribbled down by hand.