The Enron E-mails’ Immortal Life

This research has had widespread applications: computer scientists have used the corpus to train systems that automatically prioritize certain messages in an in-box and alert users that they may have forgotten about an important message. Other researchers use the Enron corpus to develop systems that automatically organize or summarize messages. Much of today’s software for fraud detection, counterterrorism operations, and mining workplace behavioral patterns over e-mail has been somehow touched by the data set.

via The Enron E-mails’ Immortal Life | MIT Technology Review.

U.S., British intelligence mining data from nine U.S. Internet companies in broad secret program

Congress obliged with the Protect America Act in 2007 and the FISA Amendments Act of 2008, which immunized private companies that cooperated voluntarily with U.S. intelligence collection. PRISM recruited its first partner, Microsoft, and began six years of rapidly growing data collection beneath the surface of a roiling national debate on surveillance and privacy. Late last year, when critics in Congress sought changes in the FISA Amendments Act, the only lawmakers who knew about PRISM were bound by oaths of office to hold their tongues.

via U.S., British intelligence mining data from nine U.S. Internet companies in broad secret program – The Washington Post.

That will teach people not to put so much trust into the cloud.

Hacking into the Indian Education System

Technically put, I merely needed to write a script to iterate through the various school IDs, check the different servers, and start with a student ID of 1 yet have a way to detect when there were no more students for a given school. I had to retrieve the resultant html files and parse them to extract all the useful information – Name, Date of Birth, ID, School, Marks.

via Hacking into the Indian Education System – On the Stepping Stone – Quora.

Several hours later, I had all the ISC and ICSE results on my very own computer, in a bunch of comma-separated value files. It was truly incredible. 26 megabytes of pure, magnificent data. An Excel file I couldn’t scroll to the bottom of. Just for kicks, I Ctrl+F’d a few names I knew and what do you know? There they were. Line after line of names, subjects and numbers. It was truly mesmerizing.

Wolfram Alpha Drills Deep into Facebook Data

At this year’s South by Southwest (SXSW) conference in Austin, Texas, Wolfram Alpha creator Stephen Wolfram offered up some interesting details about his computational engine. Wolfram Alpha contains more than 10 trillion pieces of data cultivated from primary sources, along with tens of thousands of algorithms and equations. Solving complex math problems is one of the system’s key abilities.

via Wolfram Alpha Drills Deep into Facebook Data.

More information from Data Science of the Facebook World

Some of this is rather depressingly stereotypical. And most of it isn’t terribly surprising to anyone who’s known a reasonable diversity of people of different ages. But what to me is remarkable is how we can see everything laid out in such quantitative detail in the pictures above—kind of a signature of people’s thinking as they go through life.

Google BigQuery is now even bigger

BigQuery is a cloud service that lets users analyze terabyte-sized data sets using SQL-like queries. It’s based on Google’s Dremel querying system, which can analyze data where it’s located (i.e., in the Google File System or BigTable) and which Google uses internally to analyze a variety of different data sets.

via Google BigQuery is now even bigger — Tech News and Analysis.

Startup Creates Software to Give Companies an Edge Recruiting Tech Talent

Since launching in beta last March, Gild has profiled four million software developers and has 70 customers, from high-profile Silicon Valley startups such as Palantir Technologies and Box to large IT providers such as Salesforce and EMC.

via Startup Creates Software to Give Companies an Edge Recruiting Tech Talent | MIT Technology Review.

One of Gild’s biggest data sources is Github, a software developer collaboration site that hosts the most open-source code in the world. Github profiles are already replacing programmers’ résumés in many cases.

Game Studios at the Forefront of Big Data, Cloud

If you want to see the future of Big Data, look no further than the nearest gaming-development studio. It isn’t all fun and first-person-shooting. Game developers are the sentinels of a variety of advanced IT techniques, placing them in front of the general IT population with regard to using real-time analytics and cloud computing, among other areas.

via Game Studios at the Forefront of Big Data, Cloud.

The 5 Commandments Of Data And Why Analytics Efforts Are Still A Big Old Mess

Data has to be a strategic asset. The presence of consultants at a conference like Strata shows how much confusion people still have in realizing how to get the value that vendors promise in such bountiful amounts

via The 5 Commandments Of Data And Why Analytics Efforts Are Still A Big Old Mess | TechCrunch.

I don’t have patience to watch people talk but it sounds like data analytics might be a lucrative field to be in right now.

RSA, IBM Bet On Big Data Analytics To Boost Security

“So think of a host beaconing out to a C2 (command-and-control) site on a regularly scheduled basis,” he tells Dark Reading. “If an analyst can isolate the suspect host, they can eyeball a graph to see that they’re reaching out to this host regularly. But with a big data approach, you can create a rule that computes and analyzes the interval between sessions and determines whether we’re talking about normal human activity, or machine-generated — which is innocuous — or scheduled activity like malware might do.”

via RSA, IBM Bet On Big Data Analytics To Boost Security – Dark Reading.

I recently caught a piece of malware on a PC on my open wifi doing something similar.