Taking Moneyball to the IPL: How the Geeks Can Rule Cricket
by Devanshu Mehta
In hindsight, it would appear that baseball was a sport designed by statisticians. Each ball is a discrete event with a finite set of possibilities: ball, strike, foul, hit, out. Each possibility has a distinct impact on that batter’s outcome. A ball increases the probability of a walk. A strike and the first two fouls increase the probability of an out. Hits and outs are results in and of themselves. Coupled with the fact that each team plays at least 162 games a season in the MLB, a huge sample space, and you get a game where you can hope to predict a player’s performance over a season.
Cricket, however, has the space between. The fact that Dravid plays a cover drive straight to the fielder has no direct impact on his probability of being out. Sure you can make tortured claims about how “when Dravid plays a cover drive to the swinging ball at the beginning of his innings, he scores 34 runs on average”. But that’s conditional probability based on a very small sample space. What you might be able to do in cricket is what Andy Flower does. More accurately, what his Cambridge mathematician Nathan Leamon does:
The boy’s gone to town and then some. England’s enthusiasm for Hawkeye extends way beyond the DRS – they’ve used to it log and analyse every ball delivered in Test match cricket around the world in the last five years.
That’s where I’d start. Leamon claims that this is how they got Tendulkar in the recent series. I’ll wait for more evidence before I believe it’s working, but they’re on the right track.
The next step is what they already do in baseball. Go back to every game over the past ten years, track the trajectory of each ball and predicted point of impact with the ground. Armed with that information, and the truth of what actually happened on each of those balls, you can “predict” what would happen any time in the future when a batsman hits a ball at a particular angle in a particular direction. You can say 32 degree trajectory hit to 4 o’clock is out 57% of the time. 12 degree trajectory hit to 7 o’clock is out 9% of the time, but predicted number of runs are 1.2. This allows you to assign real predicted run values to batsmen.
Sure a young batsman with a few games may average 75, but you know that if all his shots had been fielded/caught based on historic precedent, he would have only averaged 22. Of course, this model does not account for run outs– which are substantially different in cricket when compared with baseball– but it’s a start. The Old Batsman raises a valid concern:
Moneyball worked for Billy Beane in part because every franchise plays hundreds of games per season and the vast majority aren’t watched by the other coaches and teams. Test matches are much rarer things, and are more closely observed.
Of course, IPL games are not as rare. And there is significant financial advantage to finding a hidden gem. Predicting a Valthaty is worth a boatload of money. This is what Billy Beane did for years with the always cash-poor Oakland A’s in Major League Baseball. The Old Batsman continues:
And Moneyball only really worked until all of the other teams knew about it and started using the same information. Once they did, the variables of power and money that Beane had overcome reasserted themselves.
Yes, but until then, there is money to be made.