Making Data Meaningful in Major League Baseball

Willie Kamm My father played professional baseball for the Chicago White Sox and Cleveland Indians in the 1920’s and 1930’s. This was in the days of Ty Cobb and Babe Ruth, and long before anyone ever heard of computers or even the word, data. However, that did not mean the ballplayers, coaches and managers did not try to “make data meaningful”.

My dad had a good understanding of the pitchers and what pitches they liked to throw in what situations. His managers adjusted lineups to take advantage of hitters’ strengths and pitchers’ weaknesses. This was done without computers or mathematical statistical analysis. Does that mean modern data mining and analysis techniques are over-rated, over-hyped, and over-sold? Hardly. What it means is the quantity of data is a key factor in the level of computing power and advanced data analysis tools and techniques needed.

Back in the golden age of baseball, there were 8 teams in each league, and until the World Series you only faced teams in your league. Each team had 4-5 starting pitchers, and 1-3 relief pitchers who combined accounted for 90% of the innings pitched. There were 154 games in a season, so you faced each team 22 times and each starting pitcher 4-6 times.

As a batter, you needed to collect and analyze data for approximately 56 pitchers over the course of a season, and for any given game there were only 1-3 pitchers you would possibly face. With 7 opposing teams, by mid-season you had already had a great number of at-bats against the few pitchers you would face that week. Even using the “primitive” first computer (the human brain), that does not present an overwhelming challenge.

So what about a current major league player? Are they not as brilliant at data analysis as my father? From a completely subjective, personal perspective, probably not, but objectively there are other factors that come into play.

Let’s compare the data volumes my father had to process to a batter today. There are 30 teams (29 opponents), and inter-league play throughout the season. As earlier, each team has 4-5 starting pitchers, but they also have highly-specialized relief pitchers. The typical team has 7 relief pitchers: long relievers, setup pitchers (left and right handed), short relievers (left and right), and a closer. Over the course of a season, a current major league hitter is required to analyze 348 pitchers: 145 starting pitchers, plus 203 relief pitchers. Even in a typical game, instead of facing an average of 2 pitchers per game as in the 1920’s and 30’s, the batter of today faces 3-4 pitchers per game. Finally, with the expansion of the number of opponents in a season from 7 to 29, a batter will usually face any given opposing pitcher only 2-3 times in a season instead of 4-6. Familiarity may breed contempt, but not in the case of a MLB hitter trying to decide on the next 90+ mph pitch about to be thrown from 60’ 6” away!

That is a look at data analysis from the batter’s perspective, but if today’s volume of data makes it so difficult on the batter, that must mean pitchers and fielders have an easy time getting them out, right? Au contraire, the same increase in data volumes make their analysis challenges just as difficult.

My father once said that the biggest difference between the majors and the minor leagues was the “thinking end” that is needed on the major-league diamond. “You have to find out where the hitters usually send a certain kind of pitched ball and remember what you learn. You can make hard-hit balls look easy by being ready and by watching the pitcher and knowing the batter. You can’t afford to go to sleep on a big league diamond.”

Dad was the best fielding third baseman in the major leagues throughout his career. For the seven seasons from 1923 through 1929, he led the league in fielding percentage six times and was second once. He was tops in putouts five times, and assists three times. In five seasons he made 15 or fewer errors. He studied the hitters, knew what his pitchers would throw in certain situations, and adjusted his positioning accordingly. So what did he need to analyze?

7 opposing teams with 9 starting players (including the pitchers who didn’t hit very well) meant studying 63 core players for an entire season. Sure, there were a few pinch-hitters on every team that you might see once per game, but that makes the total number around 75-80.

In today’s game, there are 29 opposing teams with 9 starting players. For half the teams (15), the ninth player is a designated hitter, which makes him a much more dangerous threat than a pitcher. There are also specialists who start or pinch hit against only right-handed or left-handed pitchers. This increases the number of hitters a defensive player must study to an average of 12 per team, or 348 per season. Multiply that number by the hitters’ tendencies against each of your team’s pitchers based upon the situation and the type of pitch being thrown, and the data volume becomes intimidating.

It is no wonder that in the MLB world of today, in addition to managers, coaches, players, and training staff, each team has either on-staff or access to data analysts who analyze every pitch of every game.

Pete Rose, the all-time MLB hits leader, once said, “See the ball, hit the ball.” And Babe Ruth, in my opinion the greatest baseball player and home run hitter of all time said, “All I can tell them is pick a good one and sock it. I get back to the dugout and they ask me what it was I hit and I tell them I don’t know except it looked good.”

They were both amazing doing just that. But can you imagine what they might have been able to do if they knew what pitch was coming? With the data analysis tools of today, the results could have been incredible.



« Back to blog