Test shows big data text analysis inconsistent, inaccurate

Accuracy of 90 percent with 80 percent consistency sounds good, but the scores are “actually very poor, since they are for an exceedingly easy case,” Amaral said in an announcement from Northwestern about the study.

Applied to messy, inconsistently scrubbed data from many sources in many formats – the base of data for which big data is often praised for its ability to manage – the results would be far less accurate and far less reproducible, according to the paper.

via Test shows big data text analysis inconsistent, inaccurate | Computerworld.

Here’s an interesting explanation as to how LDA, Latent Dinchlet Allocation works.  From: What is a good explanation of Latent Dirichlet Allocation?

From a 3000 foot level as I understand the explanation of LDA; it seems like a mechanism to score words in order to categorize sets of words like paragraphs or entire papers.  Interesting exercise but a human must data model this first.  Any time some program has to estimate or guess like this there will be error, the only issue is how much is acceptable to even use the results that this kind of analysis produces.

Surviving Data Science “at the Speed of Hype”

A good predictive model requires a stable set of inputs with a predictable range of values that won’t drift away from the training set. And the response variable needs to remain of organizational interest.

via Surviving Data Science “at the Speed of Hype” – John Foreman, Data Scientist.

If you want to move at the speed of “now, light, big data, thought, stuff,” pick your big data analytics battles. If your business is currently too chaotic to support a complex model, don’t build one. Focus on providing solid, simple analysis until an opportunity arises that is revenue-important enough and stable enough to merit the type of investment a full-fledged data science modeling effort requires.

Anonymized’ credit card data not so anonymous, study shows

As an example, the researchers wrote about looking at data from September 23 and 24 and who went to a bakery one day and a restaurant the other. Searching through the data set, they found there could be only person who fits the bill – they called him Scott. The study said, “and we now know all of his other transactions, such as the fact that he went shopping for shoes and groceries on 23 September, and how much he spent.”

via News from The Associated Press.

Lensless space telescope could be 1,000 times stronger than Hubble

The Aragoscope is named after French scientist Francois Arago who first noticed how a disk diffracted light waves. The principle is based on using a large disk as a diffraction lens, which bends light from distant objects around the edge of the disk and focuses it like a conventional refraction lens. The phenomenon isn’t very pronounced on the small scale, but if the telescope is extremely large, it not only becomes practical, but also extremely powerful.

via Lensless space telescope could be 1,000 times stronger than Hubble.

Digital music sales on iTunes and beyond are now fading as fast as CDs.

The top 1 percent of bands and solo artists now earn about 80 percent of all revenue from recorded music, as I wrote in “The Shazam Effect.” But the market for streamed music is not so concentrated. The ten most-popular songs accounted for just shy of 2 percent of all streams in 2013 and 2014.

via Digital music sales on iTunes and beyond are now fading as fast as CDs. – The Atlantic.

Introducing WinSCP

WinSCP is an open source free SFTP client, FTP client, WebDAV client and SCP client for Windows. Its main function is file transfer between a local and a remote computer. Beyond this, WinSCP offers scripting and basic file manager functionality.

via Introducing WinSCP :: WinSCP.

This is a very useful program to get files off a PC and onto a Linux server which supports these services out of the box.  I find Samba to be too clunky, unreliable, and very noisy on an open network by broadcasting packets to everyone.  Only now did I have a need for something like this and SCP solves my problem and is more secure and easier to use than standard ftp.   I still map drives using Samba on my closed network but I may try out the windows version of sshfs sometime in the future.   The user interface on this tool is very intuitive and works well.

Inside Obama’s ambitious plan to make your Internet suck less

“The impact of these laws is that a community that moves forward opens itself up to years of litigation as courts will have to figure out what such poorly conceived laws mean,” Mitchell added. “So the danger isn’t so much the cost of additional dollars but the exposure to years of court room wrangling.”

Here is a map showing all the states with anti-municipal broadband laws Obama wants the FCC to go after, along with brief descriptions of the restrictions in place in each state.

via Inside Obama’s ambitious plan to make your Internet suck less.

About Anousheh Ansari

Anousheh is a serial entrepreneur and co-founder and chairman of Prodea Systems, a company that will unleash the power of the Internet to all consumers and dramatically alter and simplify consumer’s digital living experience. Prior to founding Prodea Systems, Anousheh served as co-founder, CEO and chairman of Telecom Technologies, Inc.  The company successfully merged with Sonus Networks, Inc., in 2000.

via Anousheh Ansari – About Anousheh Ansari.

This is an amazing story of accomplishment.  It appears from her Prodea Systems website the company sells home automation and now Internet of Things which is a popular buzzword nowadays.  This company made her enough money so she could  buy a trip to ISS in 2006.

The importance of deleting old stuff—another lesson from the Sony attack

Saving data, especially e-mail and informal chats, is a liability.

It’s also a security risk: the risk of exposure. The exposure could be accidental. It could be the result of data theft, as happened to Sony. Or it could be the result of litigation. Whatever the reason, the best security against these eventualities is not to have the data in the first place.

via The importance of deleting old stuff—another lesson from the Sony attack | Ars Technica.