Inside Major League Baseball’s “Hypothesis Machine”

Baseball data, over 95% of which has been created over the last five years, will continue to mount—leading MLB decision-makers to invest in more powerful analytics tools. While there are plenty of business intelligence and database options, teams are now looking to supercomputing—or at least, the spawn of HPC—to help them gain the competitive edge.

via Inside Major League Baseball’s “Hypothesis Machine”.

Please.  The problem with current baseball analytics isn’t the deluge of data, it’s the deluge of crackpot theories that add more and more irrelevant variables to the mix.  Most baseball analytics misuse mathematics and created by people who are simply selling a website.

Speaking of selling a website; is this a good place to introduce the sister site to  🙂

All data in above data model crunched using perl,awk, and bash on a standard PC.  Baseball is not that complicated where it requires a supercomputer to crunch historical or current season data.  More  from the article…

He explained that what teams, just like governments and drug development researchers, are looking for is a “hypothesis machine” that will allow them to integrate multiple, deep data wells and pose several questions against the same data.