Technology and open knowledge in sports statistics


This presentation discussion concepts of technological change and open knowledge and how these things have contributed to the explosion of type and availability of sports data available to the public. This includes advanced and sabermeteric stats, API's, and applied open knowledge concepts in the modern day sports media world.

Published in: Sports
  1. 1. By David Wiederman
  2. 2. Old ◦ Simple statistics – easily computable ◦ Limited amount of stats ◦ Stats only seen in newspapers ◦ Information available even during sports broadcasts were limited New ◦ Extremely complex statistics ◦ The number of “advanced statistics” has exploded ◦ The internet has made any stat available in seconds ◦ Broadcasters have all of this information at their fingertips for any situation that comes up during a live event
  3. 3. Technological change – anything the average person wants to know is available online for free ◦ Free databases ◦ Most people with televisions can pay for services to watch every in whatever sport they choose ◦ Growing popularity and access to fantasy sports ◦ The increased visibility and spotlight due to technology has created a need for athletes values to need to be analyzed in new and exciting ways, and these new ways are now measurable due to the amount of data available
  4. 4. Defined as the search for objective knowledge in baseball, but has become well known as advanced statistics which go beyond basic statistics in order to measure true value of a player.i ◦ An example would be adjusting a baseball player’s hitting statistics based on the dimensions of their home stadium (some stadiums are easier to hit in). ◦ Largest open database is
  5. 5. Made famous through “Moneyball” ◦ Billy Beane, GM of the Oakland Athletics, had enormous success using sabermetrics to build a winning team with a low payroll ◦ Last 15 Years: .81 standard deviations below the average payroll (bottom 1/3), but a 54.8% win rate (top 3). ◦ In 10 of the past 15 seasons, they have had 10+ more wins than would be expected based on payroll ◦ The general concept for Beane is that batters take a lot of pitches (patient and selective), tend to walk a lot (get on base a lot even if they don’t have a great batting average), run the bases well, and play solid defense.ii
  6. 6. Technology used in Formula One racing to monitor physical well-being is being applied to performance across other sports.iii ◦ Can measure speed, efficiency in movements, and fatigue – can help pinpoint how athletes are wasting energy (e.g. hockey players performance during a 45 second shift). ◦ Possible end result is “smart jerseys” – would allow for real time data to be relayed to coaches and would give fans even more access to information for analysis.
  7. 7. An API is a software intermediary that makes it possible for application programs to interact with each other and share data. There are several API’s for sports including ESPN, SportsData, and Yahoo. Creates an enormous database of comprehensive data coverage including all major sports and leagues all over the world. The public benefits through apps and online databases, which are both mostly open sources. ◦ Real time reporting and analysis becoming the norm in API’s.
  8. 8. Data collected includes live play-by-play, scores/times, results/boxscores, standings, player stats, team stats, leaders, rosters, depth charts, profiles, transactions, splits (such as home vs away or day vs night), and historical stats. ◦ ESPN includes podcasts, headlines, draft data, and feeds from sportscenter. Creates a meeting of geeks and sports freaks
  9. 9. ESPN – 3 tiersiv ◦ Internal – ESPN employees and contractors using the API to build ESPN apps ◦ Strategic Partners working with ESPN to include ESPN content in their products/services ◦ Public – Independent, pre-approved developers using ESPN content. The coding is not open source such as Android or Mozilla, but the information generated gets distributed to the public ◦ Push notifications from the SportsCenter mobile application with breaking news ◦ Real time scores on and the SportsCenter application for almost any league in the world
  10. 10. SportsData ◦ Major customers include Bleacher Report, Bloomberg Sports, Google, IBM, and NBC Sports. ◦ These customers use SportsData in broadcasts, media, fantasy sports scouting, and in the case of Google, accurate and dependable data for Google’s vast customer base and search demands.v ◦ The cost to these companies can be tens of thousands of dollars per month, but the benefits can easily outweigh the costs. Instrumental in NBC’s online coverage of the Super Bowl.
  11. 11. Really only available to insiders, developers, strategic partners, and wealthy corporations with media & data needs. The benefits, however, are also received by the general public in information availability and in media. Not truly “open” in that the full use is not public and is not free, but
  12. 12. Citizen Journalism ◦ The availability of statistics and information has translated into the average person tweeting, blogging, and facebooking about sports. ◦ Many individuals with full time jobs can even get part-time gigs writing for larger blogs. ◦ People with knowledge of advanced statistics are in need because they provide interesting viewpoints on teams and players which can often captivate readers and bring the measures into the main public eye.
  13. 13. Data and Databases ◦ Massive amounts of information available. Any person can now see, for example, how possession stats and attempted shots correlate to wins and player efficiency in hockey. Information Overload and Filters ◦ Is it too much? For example, most people will not know the majority of the stats on baseball reference unless they take the time to read and comprehend the breakdown of the calculation ◦ Important to be able to filter the information so that we get find what we are seeking Most of the time a question is simple, such as is a player getting unlucky or is he/she simply performing poorly How to manage your time to find this answer is important, or else it can take a long time.
  14. 14. Equity – The benefits are only able to be gained if there is internet access and a mobile network. ◦ Similar to most other open knowledge sources in that the access is dependent on how privileged the individual is.
