STAT 4610X – Sports Analytics
Description
Sports analytics refers to the use of statistical, quantitative, and graphical techniques for analysis of sports data. Focus on head-to-head sports and the calculation of relevant statistics, e.g., plus-minus and adjusted plus-minus statistics that attempt to quantify individual contributions to a team's performance. Rating and ranking systems and their relationship to statistical models and estimation of these ratings will be described.
Prerequisite
A course covering multiple regression, e.g. STAT 3010, 3260, or 5870.
Recent Instructors
Textbooks
Here is a list of possible textbooks:
- Analytic Methods in Sports (2nd edition) by Thomas Severini. Chapman and Hall/CRC
- Regression Models for Data Science in R by Brian Caffo
- R for Data Science by Hadley Wickham and Garrett Grolemund
- Improving Your NCAA Bracket with Statistics by Tom Adams
Offerings
The course was most recently offered in Spring 2025. We hope to offer it every other spring.
Learning Outcomes
By the end of the course, students will be able to
- Classify sports data types, e.g. head-to-head vs point- or time-based competitions.
- Construct appropriate visualizations for sports data.
- Calculate player contribution statistics, e.g. plus-minus and adjusted plus-minus.
- Develop machine learning techniques to construct win probability calculators.
- Build statistical models to construct a team or player rating system.
Topics
Sports analytics encompasses a wide array of topics that leverage data to improve performance, strategy, and decision-making in sports. Key topics include player performance analysis, where individual statistics are scrutinized to evaluate and predict athlete contributions and career trajectories. Injury prediction and prevention use data to identify risk factors and develop training programs to mitigate injuries. Team strategy optimization involves analyzing game footage and in-game statistics to refine tactics and improve team performance. Another significant area is fan engagement analytics, which studies fan behavior and preferences to enhance marketing and engagement strategies. Additionally, sports economics focuses on financial aspects, such as salary cap management, ticket pricing, and revenue generation. Advanced metrics like Player Efficiency Rating (PER) in basketball or Expected Goals (xG) in soccer provide deeper insights than traditional statistics, helping teams make informed decisions on player acquisitions and game strategies. Overall, sports analytics integrates diverse data sources and methodologies to provide comprehensive insights into virtually every aspect of sports.
Visualizing Sports Data
From https://link.springer.com/article/10.1007/s12650-020-00687-2
Sports analytics visualizations play a crucial role in interpreting and communicating complex data in a clear and actionable manner. These visualizations leverage graphical representations such as bar charts, line graphs, scatter plots, and heat maps to highlight key patterns and trends within sports data. For instance, heat maps can be used to display player movement and shot locations on the field or court, providing insights into player behavior and team strategies. Advanced visualizations like network graphs can illustrate passing patterns in soccer or basketball, revealing the dynamics of team play and areas for improvement. Effective visualizations make it easier for coaches, analysts, and decision-makers to understand performance metrics, identify strengths and weaknesses, and develop strategies based on data-driven insights. By transforming raw data into visual stories, sports analytics visualizations enhance the ability to make informed decisions and drive performance improvements.
Player contributions
Wins Above Replacement (WAR) is a comprehensive statistic used in sports analytics to measure a player's total contributions to their team, quantified in terms of the number of additional wins they provide compared to a replacement-level player. A replacement-level player is typically defined as a marginal player who could be readily available from the minor leagues or free agency. WAR encompasses various aspects of a player's performance, including batting, baserunning, fielding, and pitching, and combines them into a single metric. This holistic approach allows for cross-positional and cross-era comparisons, offering a valuable tool for evaluating player value and informing decisions related to player acquisition, retention, and salary negotiations. The calculation of WAR involves complex formulas that integrate traditional statistics with advanced metrics, and while the exact computation can vary slightly among different organizations (such as Fangraphs or Baseball Reference), the core concept remains consistent: providing a singular, standardized measure of player performance relative to the baseline of a replacement player.
From https://library.fangraphs.com/misc/war/
Win probability
From https://lolesports.com/en-US/news/dev-diary-win-probability-powered-by-aws-at-worlds
Win probability calculators are analytical tools used in sports to estimate the likelihood of a team winning a game at any given point, based on the current state of the game and historical data. These calculators consider various factors such as the score, time remaining, possession, down and distance in football, or inning and base runner situation in baseball. By integrating real-time data with statistical models derived from past game outcomes, win probability calculators provide a dynamic and intuitive way to understand how specific events, like a touchdown or a home run, affect a team's chances of winning. They are particularly valuable for coaches, analysts, and broadcasters, offering insights into strategic decisions and game management. For fans, win probability graphs can enhance the viewing experience by visually representing the ebb and flow of the game's momentum. Overall, these calculators help quantify the impact of in-game events and support more informed decision-making by illustrating the probabilistic nature of sports outcomes.
Rating systems
Rating systems are essential tools in sports analytics that evaluate and compare the performance of different players/teams based on various metrics and algorithms. These systems incorporate a range of statistical data, including win-loss records, strength of schedule, and margin of victory, to generate a comprehensive assessment of a player’s/team's overall quality. Popular examples of team rating systems include the Elo rating system, often used in chess and adapted for various sports, and the Power Rankings, which are frequently updated by sports analysts and media outlets to reflect current team performance. By standardizing team evaluations, these systems facilitate objective comparisons, helping analysts predict outcomes, identify strengths and weaknesses, and make data-driven decisions about strategy and player management. Additionally, team rating systems can enhance fan engagement by providing accessible and quantifiable insights into how teams stack up against each other throughout the season.
From https://www.reddit.com/r/chess/comments/1662z59/fide_elo_percentiles/