Analysis of winning papers at the NFL's Big Data Bowl, and how they might influence the use of data in football. |
Adam Minthorne, Mar. 20th 2020
As collegiate stars tested their strength, speed, and yes, height, in the NFL combine a month ago you may have overlooked one part of the combine that did not make it on television: the results for NFL’s inaugural Big Data Bowl, the NFL’s open source data competition.
The structure of the Big Data Bowl was simple. For the first time the NFL released a large player tracking dataset to the public, and asked analysts and fans to submit a project on three themes: understanding speed, rule changes, and receiver route combinations. Eight finalists of the competition presented their papers at the NFL combine and two grand prize winners were selected. The NFL released these papers after the competition, and I was eager to read them with an eye toward finding ways that would help improve Tacklytics™ at Atavus.
2019: the NFL's inaugural Big Data Bowl
Crowd-Sourcing: The Birth of Sports Analytics
One of the reasons that the Big Data Bowl is significant is that open and crowd sourced work has a history of creating a positive impact in sport through analytics. Bill James published his first Baseball Abstract in 1977 by reviewing box scores by hand and his work allowed him to coin the term “Sabermetrics” and lay the foundation for Major League Baseball’s analytics revolution. In more dynamic sports like basketball and soccer tracking data like the kind released for the Big Data Bowl has led to analytics booms in those sports is well. For example, the research paper finalists at the MIT Sloan Analytics conference where papers used player tracking data for applications in basketball, soccer, and football. In the NFL, player tracking data has been available to teams since 2016 and has been used to enhance the fan experience since 2017, but a data release on this scale to the public is significant milestone in what will eventually be football’s analytics revolution.
A Brief Reflection on the State of Analytics in Football
Why has football been the last of the major sports to adopt analytics as a mainstream part of their front office? While there may be organizational or philosophical pushback from the old guard of the NFL—something that happened in every potential sport implementing analytics—there are also some analysis challenges inherent to the game of football. Players all move independently, but each of their actions are dependent on the actions of their opponent, play calls, and teammates. In addition, these actions occur in a confined space where each player may have very different roles on the team. These factors and more make it difficult to discern patterns from the data and isolate responsibilities and metrics. For example, when a running back runs for 200 yards, did he alone have a great game? Or was it the offensive line? Or did the offensive coordinator find a weakness in the defensive scheme? Any coach will tell you that it’s a combination of all of the above, but as analysts we want to know how to weight these different inputs, because, at least in the NFL, we want to be able to convince the general manager or head coach whether to hire the offensive coordinator or sign the running back in free agency.
Regrettably, most of the current work I have seen in football analytics is written from an offensive perspective, which impacts our ability to understand how analytics might be used in tackling. There are several reasons for the offensive skew, but my guess is that all the complexity around isolating players is a lot harder to do for defenses. Many defenses are designed to be flexible to defend a multitude of offensive plays, so determining the effectiveness of a cornerback’s coverage in Cover 3 might be hard if Cover 3 looks like man-to-man half of the time. Despite these challenges, there are ways we can look at these papers from a defensive perspective and learn about the potential applications to tackling.
Paper Review:
#1: “Expected Hypothetical Completion Probability” by Sameer Deshpande and Katherine Evans
The purpose of this paper was to determine a way to calculate the chance of a passing completion using Expected Hypothetical Completion Probability (EHCP). One of the reasons I liked this paper was because they acknowledge the challenges of the game of football and attempt to address them. From the paper:
“We are essentially trying to deduce what might have happened in a counterfactual world where the quarterback had thrown the ball to a different player at a different time, with the defense reacting differently. On such a counterfactual pass, we do not observe many factors that are predictive of completion probability.”
What Deshpande and Evans are describing in technical terms is that a good defense reacts not only to the receiver’s routes but also the quarterback and their teammates around them. Take for example a smash concept where the defense is playing a soft cover two (the paper illustrates a similar concept). The cornerback might float in between the hitch and the corner, partially covering both receivers. When the quarterback throws the corner route, Deshpande and Evans can observe the receiver’s separation from the corner and the safety at the time of the pass and when the pass arrives. But, what about the hitch? Since the quarterback did not actually throw the hitch, the separation between receivers and defensive backs is not observed, which makes calculating the completion percentage for the hitch more difficult. EHCP looks to overcome these challenges.
From a defensive perspective, we can look at the inverse of these completion probabilities to get an incompletion probability. For example, if the completion probability at a given time is 40% the incompletion probability is 60%. Building off these ideas, it could be possible to use Deshpande and Evan’s framework to calculate a tackle probability after the catch. Then, we could understand which tacklers tackle well due to their proximity to the ball carrier and which tacklers tackle well due to their technique.
#2: “Routes to Success” by Matthew Reyers, Ani Chu, Lucas Wu, and James Thomson @ Simon Fraser University
This paper by the Simon Fraser University (SFU) Analytics Club used the NFL’s tracking data to recognize routes, determine the most successful combinations, and then analyze and visualize how effective the routes were at creating control over certain areas of the field. As one of the grand prize winners, this paper does a lot of things well. One of the best things the paper does is build off the concept of Expected Points Added (EPA) to make sense of the mountain of tracking data provided to them. EPA is not a new concept for football analytics but using EPA to visualize zone of control via tracking data is something new. Essentially, it allows us to view where the offense is favored and where the defense is favored on any giving passing play.
This paper is interesting because it provided a roadmap for how Atavus may be able to generalize the effectiveness of an individual tackle. For example, Tacklytics uses Yards After Contact (YAC) as the primary metric for tackling performance. Building off the work of the SFU, there is a chance to look at individual tackles from an EPA perspective. This would allow Atavus to calculate the proportion of defensive performance linked to tackling and pin point how tackling contributes to winning football games.
#3: “Exploratory Data Analysis of Passing Plays Using NFL Tracking Data” by Adam Vonder Haar
This paper was an excellent explanatory paper about the data, and it presented a couple of methods that apply well to understanding defenses pre-snap and post-snap. The first is exploring where defensive players line up pre-snap and how that might affect their area of influence. This analysis could potentially allow us to recognize the pre-snap coverages of a defense without consulting a defensive coordinator. From a tackling perspective, I would be interested to know which of these pre-snap alignments gives defenders the best chance to tackle or if the pre-snap alignments have very little to do with the probability of completing a tackle. When paired with EHCP from Deshpande and Evans’ paper, these pre-snap alignments could inform defensive decisions from a tackling perspective.
Moving to post-snap, Vonder Haar used the concept of a “convex hull” to understand how offensive passing concepts stretch defensive coverage. From the paper, a convex hull “can be thought of as a rubber band stretched around the outer edges of the defensive pass coverage”. The larger the space instead the rubber band the more stress there is on a defense (see the paper for an excellent visualization of this concept). After the ball is caught however, I believe that there is a way we could measure effective pursuit to the ball carrier. The faster a defense closes this area, it may be an indicator of a superior tackling defense, which would allow us to develop an understanding of how to grade a defense’s tackling ability as opposed to an individual tackler.
Conclusion: Next Steps for Tacklytics
As analytics becomes more prevalent across all levels of football (I would be surprised if it is not adopted in some way by the NCAA soon), I hope that the NFL continues to leverage the power of open source work to contribute to the analytics movement in football. At Atavus, we will be following the latest news in football analytics closely, so that we can understand further understand the cost of the tackle and help your players become better and safer tacklers.