Mind the Bullshit Asymmetry Principle, articulated by the Italian software developer Alberto Brandolini in 2013: the amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it. Or, as Jonathan Swift put it in 1710, “Falsehood flies, and truth comes limping after it.”Plus ça change.
Into this universe comes Airbus SE, the European aerospace conglomerate. Airbus is starting a new data company, called Airbus Aerial, to provide an array of unmanned aerial vehicles (UAV) services, a field the company estimates could increase to more than $120 billion annually as the use of these fleets expands, said Dirk Hoke, CEO of Airbus’s defence and space group. Hoke introduced the new company Wednesday at Xponential.
The data release, part of the company’s Webscope initiative and announced on Yahoo’s Tumblr blog, is intended for researchers to use in validating recommender systems, high-scale learning algorithms, user-behaviour modelling, collaborative filtering techniques and unsupervised learning methods.
Today, we are proud to announce the public release of the largest-ever machine learning dataset to the research community. The dataset stands at a massive ~110B events (13.5TB uncompressed) of anonymized user-news item interaction data, collected by recording the user-news item interactions of about 20M users from February 2015 to May 2015.
Their overview stated that machine learning techniques emphasized causality less than traditional economic statistical techniques, or what’s usually known as econometrics. In other words, machine learning is more about forecasting than about understanding the effects of policy.
That would make the techniques less interesting to many economists, who are usually more concerned about giving policy recommendations than in making forecasts.
Accuracy of 90 percent with 80 percent consistency sounds good, but the scores are “actually very poor, since they are for an exceedingly easy case,” Amaral said in an announcement from Northwestern about the study.
Applied to messy, inconsistently scrubbed data from many sources in many formats – the base of data for which big data is often praised for its ability to manage – the results would be far less accurate and far less reproducible, according to the paper.
Here’s an interesting explanation as to how LDA, Latent Dinchlet Allocation works. From: What is a good explanation of Latent Dirichlet Allocation?
From a 3000 foot level as I understand the explanation of LDA; it seems like a mechanism to score words in order to categorize sets of words like paragraphs or entire papers. Interesting exercise but a human must data model this first. Any time some program has to estimate or guess like this there will be error, the only issue is how much is acceptable to even use the results that this kind of analysis produces.
A good predictive model requires a stable set of inputs with a predictable range of values that won’t drift away from the training set. And the response variable needs to remain of organizational interest.
If you want to move at the speed of “now, light, big data, thought, stuff,” pick your big data analytics battles. If your business is currently too chaotic to support a complex model, don’t build one. Focus on providing solid, simple analysis until an opportunity arises that is revenue-important enough and stable enough to merit the type of investment a full-fledged data science modeling effort requires.
Microsoft this morning announced a deal to buy Revolution Analytics, the top commercial provider of software and services for the open-source R programming language for statistical computing and predictive analytics.
When data is abundant, intelligence will win
Putting the power to publish and consume content into the hands of more people in more places enables everyone to start conversations with facts. With facts, negotiations can become less about who yells louder, but about who has the stronger data. They can also be an equalizer that enables better decisions and more civil discourse. Or, as Thomas Jefferson put it at the start of his first term, “Error of opinion may be tolerated where reason is left free to combat it.”
It then goes on to say this:
The vast majority of computing will occur in the cloud
Within the next decade, people will use their computers completely differently than how they do today. All of their files, correspondence, contacts, pictures, and videos will be stored or backed-up in the network cloud and they will access them from wherever they happen to be on whatever device they happen to hold.
Of course google wants this for everyone will need to use services like google to access their data. Do people really need all their data accessible to them 24/7? Can anyone trust the security of one’s data when placed in the hands of a stranger?
A bird in the hand is worth two in the bush. There is nothing more secure than a hard drive or more (one or more for backups) in a safety deposit box. No one needs to access their tax returns from anywhere at any time just because they can.
The switch from relational hadn’t been too hard because Riak is a key-value store, which made modeling relatively easy. Key value-stores are relatively simple database management systems that store just pairs of keys and values.
McCaul reckoned, too, migration of data had been made possible because the structure of patient records lent themselves to Riak’s key-value mode
It’s impossible to imagine the Internal Revenue Service or most other number-crunching agencies or companies working without computers. But when the IRS went to computers — the Automatic Data Processing system –there was an uproar. The agency went so far as to produce a short film on the topic called Right On The Button, to convince the public computers were a good thing.