Posts for Tag: algorithm

Using BaseRuns to handle cluster luck

One of the main tenets of Joe Peta's kickass work Trading Bases (by which THOME is largely inspired) is that you need to deal with what he calls "cluster luck," or the occurrences of a team stringing together hits in such a way as to score more runs than would otherwise be expected.

Hits turn into runs at an average rate somewhere near 2:1. That is, most teams, most of the time, have about twice as many hits as they have runs scored. But this isn't always the case. In a classic example I'm citing from memory but definitely stole from somewhere else, if a team tallies 9 hits in a game, how many runs will they score? 4? 5?

If the 9 hits came one in each inning, they very likely scored 0 (barring walks, homers, etc.). If all 9 hits came in just one inning, then even 9 singles scored 6-8 runs, and the number could be much higher than even 9 if some runners reached base via something other than a hit.

The point being, a hit is not a hit is not a hit. And a run is not a run is not a run. A team that scores 800 runs in a season sounds like they killed it. That's an average of nearly 5 runs a game. That should be a playoff team (unless their pitching gave up many more runs than the offense could score). But what if they scored 20 runs in 40 games each, and were shut out the other 122 times? Now we have an historically pathetic record. Yes, I also realize that's not freakin' possible... but you should get the point.

So, cluster luck. Peta's right, you have to take account of it, most especially when evaluating +EV positions on season win total O/U wagers. But he doesn't go into the details of his calculations in Trading Bases. He discusses, at a high level, how you need to deal with it, and how this affected the outlook of some team's 2011 season based on their 2010 run scoring... but I need specifics!

Enter BaseRuns.

Allow me to allow FanGraphs to describe this sucker:

BaseRuns is a formula designed to estimate how many runs a team would be expected to score (or allow) given their underlying offensive (or defensive) performance. In other words, BaseRuns is a context-neutral run estimator used to evaluate teams.

FanGraphs, BaseRuns

The important part here is context-neutral. This essentially does away with the variance of cluster luck and gives us an appropriate Runs Scored and Runs Allowed for a team based on their underlying peripherals.

After we figure that for every team, we can discover a true context-neutral (but league and scoring environment-sensitive) expected winning percentage by utilizing the Pythagenpat variation of Bill James' original Pythagorean Win-Loss estimation. 

After dealing with roster changes using projected WAR. But that's a story for another time.