Key takeaways:
- Data analytics is transforming sports, from team strategies to measuring player value
- It is possible to prove a player’s monetary value by calculating their overall financial contribution to their team’s success
- We use a so-called ‘information ratio’ to quantify the predictability of scoring across various sports. The information ratio (IR) reflects scoring consistency: high-IR sports have more predictable outcomes. Outcomes for low-IR sports vary, reflecting competitive imbalances
Those familiar with the 2011 film Moneyball will know that the age-old quandary between strength and intelligence is a false dichotomy when building a winning sports team. You need both. The Pythagorean Won-Loss formula, introduced by Bill James in the 1980s for baseball, predicts a team’s winning percentage based on the points it scores (PS) and allows (PA). This formula has become central to sports analytics, helping assess team performance, managerial decisions, and player value. So why did we choose to revisit such well-trodden ground in our analysis?
The formula (explained in more detail below) relies on a sport-specific component, λ. Most studies of James’s formula we’ve read have focused on individual sports.1 As a result, λ feels not well-understood beyond the fact that each sport has its own specific value that makes the model ‘work’.2 We examined multiple sports simultaneously to shed light on what exactly the λ coefficient is capturing. In this, our proprietary tool, ArcticDB, helped us manage and analyse the large data volumes involved and uncover patterns. You can read the full research paper on our ArcticDB blog.
Data collection and ArcticDB
The study covered nine sports, with data collected over decades:
- Baseball: Major League Baseball (MLB) since 1871
- Basketball: National Basketball Association (NBA) since 1946
- Football: English Premier League (EPL) since 1888
- American football: National Football League (NFL)
- Ice hockey: National Hockey League (NHL)
- Rugby: Super Rugby Pacific (SUP)
- Cricket: Indian Premier League (IPL)
- Lacrosse: National Collegiate Athletic Association Men’s Division I (NCAA)
- Australian Rules Football: American Football League (AFL)
ArcticDB, a high-performance data platform, was used to store, process, and analyse over 853,000 rows of game-level data. Its schema-less structure, automatic versioning, and speed made it ideal for this large-scale analysis.
The formula’s core
Put simply, the formula for a team’s expected winning percentage is:
If points scored equals points allowed (PS=PA), the team is expected to win 50% of its games. This formula serves as a benchmark, revealing whether teams over- or under-perform expectations. For example, the 2020-21 Milwaukee Bucks scored 8,649 points and conceded 8,225, suggesting a higher winning percentage than their actual 63.9% record.
Beyond wins: player valuation
The formula also underpins metrics like Wins Above Replacement (WAR), which estimate a player’s contribution to the team’s success. For instance, Aaron Judge of the New York Yankees added 108.5 runs above a replacement-level player in 2024. Using the formula's derivative, we find that:
At US$8 million per win (based on free-agency market rates), Judge generated US$88-89 million in value, significantly exceeding his US$40 million salary.
Investigating λ: a scaling factor across sports
Each sport has its own λ, reflecting its scoring dynamics. We linked λ to each sport’s IR:
The IR reflects scoring consistency: higher IR sports (e.g., basketball) have more predictable outcomes, while lower IR sports (e.g., the EPL) have greater variability. The analysis shows that λ scales linearly with IR, serving as a ‘conversion factor’ that adjusts for different scoring systems.
Regression and results
To explore λ, the formula was transformed into a linear model for regression:
This approach was applied across decades of team-season data. Key findings include:
- Variation in λ: Sports with higher IR (e.g., NBA) have tighter scoring distributions and steeper regression slopes.
- Linear relationship with IR: λ scales linearly with IR, highlighting its role as a universal factor across diverse sports.
- Time variability: Rolling 10-year windows reveal that λ changes over time due to evolving game styles and rule changes. For example:
- The NBA’s introduction of the 24-second shot clock in 1954-55 increased scoring rates
- The NFL’s scoring has trended upward since its inception
Implications
The Pythagorean formula’s robustness across sports makes it a versatile tool for:
- Player valuation: Metrics like WAR depend on λ to quantify players' contributions
- Competitive balance: Low IR sports like the EPL exhibit greater variability, reflecting competitive imbalance
- Evolving strategies: Long-term trends in λ reveal how rule changes (e.g., shot clocks) impact scoring dynamics
ArcticDB’s role
ArcticDB played a critical role in this analysis by:
- Ability to handle large datasets: Scaling effortlessly from modest datasets like the 853,000-row sports data through to production grade datasets with billions of rows and hundreds of thousands of columns
- Versioning: Automatically tracking changes for reproducibility
- Flexibility: Simplifying data integration without traditional database overhead
Conclusion: Bridging theory and practice
By applying the Pythagorean formula across sports, we bridged theory and practice, offering new insights into team performance and scoring dynamics. The linear relationship between λ and IR demystifies the formula’s mechanics and showcases its adaptability. ArcticDB enabled efficient, scalable analysis, highlighting its potential for future sports analytics projects.
1. See the sports listed here for examples https://en.wikipedia.org/wiki/Pythagorean_expectation
2. An important caveat – we are not necessarily experts in this field and given the vast literature written by professionals, academics, and enthusiasts, there is a very high likelihood we’ve missed something along the way.
You are now leaving Man Group’s website
You are leaving Man Group’s website and entering a third-party website that is not controlled, maintained, or monitored by Man Group. Man Group is not responsible for the content or availability of the third-party website. By leaving Man Group’s website, you will be subject to the third-party website’s terms, policies and/or notices, including those related to privacy and security, as applicable.