The Secret Betting Strategy That Beats Online Bookmakers

Before committing any real money, the researchers tested the idea on 10 years of historical data on the closing odds and results of 479,440 soccer games played between 2005 and 2015. This simulation paid out 44 percent of the time and delivered a yield of 3.5 percent over the 10-year period. “For an imaginary stake of $50 per bet, this corresponds to an equivalent profit of $98,865 across 56,435 bets,” they say.

Source: The Secret Betting Strategy That Beats Online Bookmakers – MIT Technology Review

How to Call B.S. on Big Data: A Practical Guide

Mind the Bullshit Asymmetry Principle, articulated by the Italian software developer Alberto Brandolini in 2013: the amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it. Or, as Jonathan Swift put it in 1710, “Falsehood flies, and truth comes limping after it.”Plus ça change.

Source: How to Call B.S. on Big Data: A Practical Guide

Company Tracks Iowa Caucusgoers by their Cell Phones

When you open an app or look at a browser page, there’s a very fast auction that happens where different advertisers bid to get to show you an ad. Their bid is based on how valuable they think you are, and to decide that, your phone sends them information about you, including, in many cases, an identifying code (that they’ve built a profile around) and your location information, down to your latitude and longitude.

Source: Company Tracks Iowa Caucusgoers by their Cell Phones – Schneier on Security

Midwest Start-Up Achieves Rare $1 Billion Valuation

Uptake’s model is to partner with well-known companies in various industries — from construction to mining to aviation — and create software and special algorithms that help these customers collect and understand huge amounts of data. The company is already producing positive cash flow, according to a person with knowledge of the financials who spoke on the condition of anonymity.

Source: Midwest Start-Up Achieves Rare $1 Billion Valuation – The New York Times

Economics Has a Math Problem

Their overview stated that machine learning techniques emphasized causality less than traditional economic statistical techniques, or what’s usually known as econometrics. In other words, machine learning is more about forecasting than about understanding the effects of policy.

That would make the techniques less interesting to many economists, who are usually more concerned about giving policy recommendations than in making forecasts.

Source: Economics Has a Math Problem – Bloomberg View

Almost None of the Women in the Ashley Madison Database Ever Used the Site

When you look at the evidence, it’s hard to deny that the overwhelming majority of men using Ashley Madison weren’t having affairs. They were paying for a fantasy.

Source: Almost None of the Women in the Ashley Madison Database Ever Used the Site

The question is, how do you find fakes in a sea of data? Answering that becomes more difficult when you consider that even real users of Ashley Madison were probably giving fake information at least some of the time. But wholesale fakery still leaves its traces in the profile data. I spoke with a data scientist who studies populations, who told me to compare the male and female profiles in aggregate, and look for anomalous patterns.

Statistics Will Crack Your Password

This means that the top 13 unique mask structures make up 50% of the passwords from the sample. Over 20 million passwords in the sample have a structure within the top 13 masks.

via Statistics Will Crack Your Password.

Based on analyzing the data, there are logical factors that help explain how this is possible. When users are asked to provide a password that contains an uppercase letter, over 90% of the time it is put as the first character. When asked to use a digit, most users will put two digits at the end of their password (graduation year perhaps)

AT&T’s plan to watch your Web browsing—and what you can do about it

If you have AT&T’s gigabit Internet service and wonder why it seems so affordable, here’s the reason—AT&T is boosting profits by rerouting all your Web browsing to an in-house traffic scanning platform, analyzing your Internet habits, then using the results to deliver personalized ads to the websites you visit, e-mail to your inbox, and junk mail to your front door.

via AT&T’s plan to watch your Web browsing—and what you can do about it | Ars Technica.

Use https.  They may know which sites you visit but they won’t know any of the http fields because that is all encrypted.  Most big sites like Google and Facebook use https by default nowadays.

Test shows big data text analysis inconsistent, inaccurate

Accuracy of 90 percent with 80 percent consistency sounds good, but the scores are “actually very poor, since they are for an exceedingly easy case,” Amaral said in an announcement from Northwestern about the study.

Applied to messy, inconsistently scrubbed data from many sources in many formats – the base of data for which big data is often praised for its ability to manage – the results would be far less accurate and far less reproducible, according to the paper.

via Test shows big data text analysis inconsistent, inaccurate | Computerworld.

Here’s an interesting explanation as to how LDA, Latent Dinchlet Allocation works.  From: What is a good explanation of Latent Dirichlet Allocation?

From a 3000 foot level as I understand the explanation of LDA; it seems like a mechanism to score words in order to categorize sets of words like paragraphs or entire papers.  Interesting exercise but a human must data model this first.  Any time some program has to estimate or guess like this there will be error, the only issue is how much is acceptable to even use the results that this kind of analysis produces.

Surviving Data Science “at the Speed of Hype”

A good predictive model requires a stable set of inputs with a predictable range of values that won’t drift away from the training set. And the response variable needs to remain of organizational interest.

via Surviving Data Science “at the Speed of Hype” – John Foreman, Data Scientist.

If you want to move at the speed of “now, light, big data, thought, stuff,” pick your big data analytics battles. If your business is currently too chaotic to support a complex model, don’t build one. Focus on providing solid, simple analysis until an opportunity arises that is revenue-important enough and stable enough to merit the type of investment a full-fledged data science modeling effort requires.