Before committing any real money, the researchers tested the idea on 10 years of historical data on the closing odds and results of 479,440 soccer games played between 2005 and 2015. This simulation paid out 44 percent of the time and delivered a yield of 3.5 percent over the 10-year period. “For an imaginary stake of $50 per bet, this corresponds to an equivalent profit of $98,865 across 56,435 bets,” they say.
Mind the Bullshit Asymmetry Principle, articulated by the Italian software developer Alberto Brandolini in 2013: the amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it. Or, as Jonathan Swift put it in 1710, “Falsehood flies, and truth comes limping after it.”Plus ça change.
When you open an app or look at a browser page, there’s a very fast auction that happens where different advertisers bid to get to show you an ad. Their bid is based on how valuable they think you are, and to decide that, your phone sends them information about you, including, in many cases, an identifying code (that they’ve built a profile around) and your location information, down to your latitude and longitude.
Uptake’s model is to partner with well-known companies in various industries — from construction to mining to aviation — and create software and special algorithms that help these customers collect and understand huge amounts of data. The company is already producing positive cash flow, according to a person with knowledge of the financials who spoke on the condition of anonymity.
Their overview stated that machine learning techniques emphasized causality less than traditional economic statistical techniques, or what’s usually known as econometrics. In other words, machine learning is more about forecasting than about understanding the effects of policy.
That would make the techniques less interesting to many economists, who are usually more concerned about giving policy recommendations than in making forecasts.
When you look at the evidence, it’s hard to deny that the overwhelming majority of men using Ashley Madison weren’t having affairs. They were paying for a fantasy.
The question is, how do you find fakes in a sea of data? Answering that becomes more difficult when you consider that even real users of Ashley Madison were probably giving fake information at least some of the time. But wholesale fakery still leaves its traces in the profile data. I spoke with a data scientist who studies populations, who told me to compare the male and female profiles in aggregate, and look for anomalous patterns.
This means that the top 13 unique mask structures make up 50% of the passwords from the sample. Over 20 million passwords in the sample have a structure within the top 13 masks.
Based on analyzing the data, there are logical factors that help explain how this is possible. When users are asked to provide a password that contains an uppercase letter, over 90% of the time it is put as the first character. When asked to use a digit, most users will put two digits at the end of their password (graduation year perhaps)
If you have AT&T’s gigabit Internet service and wonder why it seems so affordable, here’s the reason—AT&T is boosting profits by rerouting all your Web browsing to an in-house traffic scanning platform, analyzing your Internet habits, then using the results to deliver personalized ads to the websites you visit, e-mail to your inbox, and junk mail to your front door.
Use https. They may know which sites you visit but they won’t know any of the http fields because that is all encrypted. Most big sites like Google and Facebook use https by default nowadays.
Accuracy of 90 percent with 80 percent consistency sounds good, but the scores are “actually very poor, since they are for an exceedingly easy case,” Amaral said in an announcement from Northwestern about the study.
Applied to messy, inconsistently scrubbed data from many sources in many formats – the base of data for which big data is often praised for its ability to manage – the results would be far less accurate and far less reproducible, according to the paper.
Here’s an interesting explanation as to how LDA, Latent Dinchlet Allocation works. From: What is a good explanation of Latent Dirichlet Allocation?
From a 3000 foot level as I understand the explanation of LDA; it seems like a mechanism to score words in order to categorize sets of words like paragraphs or entire papers. Interesting exercise but a human must data model this first. Any time some program has to estimate or guess like this there will be error, the only issue is how much is acceptable to even use the results that this kind of analysis produces.
A good predictive model requires a stable set of inputs with a predictable range of values that won’t drift away from the training set. And the response variable needs to remain of organizational interest.
If you want to move at the speed of “now, light, big data, thought, stuff,” pick your big data analytics battles. If your business is currently too chaotic to support a complex model, don’t build one. Focus on providing solid, simple analysis until an opportunity arises that is revenue-important enough and stable enough to merit the type of investment a full-fledged data science modeling effort requires.