Rallying Around Standards A Sports Data Processing Primer Alan Karben XML Team Solutions, Inc. December 6, 2005 alan (at) ...
Why Use Standards? <ul><li>Makes everyone’s lives easier </li></ul><ul><li>Allows for faster improvements </li></ul>
Why Use Open Standards? <ul><li>Take advantage of the works of others </li></ul><ul><li>Allow more vendors to supply more ...
Common Ground Across Sports <ul><li>Concept of schedules and games </li></ul><ul><li>Player stats and team stats </li></ul...
Three Standards Worth Evaluating <ul><li>SportsML </li></ul><ul><ul><li>For document normalization and interchange </li></...
SportsML <ul><li>Under the banner of the IPTC </li></ul><ul><ul><li>Global media technical trade association </li></ul></u...
RoSIN <ul><li>R etr oS heet  I ntra-play  N otation </li></ul><ul><li>A codified subset of Retrosheet’s play syntax </li><...
XTOSS <ul><li>X ML  T eam  O pen  S ports  S chema </li></ul><ul><ul><li>Under Development – currently at Draft 5 </li></u...
The Data Processing Lifecycle <ul><li>Challenges of Incoming Feeds: </li></ul><ul><ul><li>Many different formats </li></ul...
Normalization Tips (1) <ul><li>Get data into XML ASAP </li></ul><ul><li>Assign unique Doc-ID’s </li></ul><ul><li>Save all ...
Normalization Tips (2) <ul><li>Don’t  have independent programs that load various incoming formats right into your databas...
Data Processing Architecture <ul><li>Data Acquisition </li></ul><ul><li>ASCII Extraction </li></ul><ul><li>Normalization &...
Fundamentals & Derivables <ul><li>Incoming Data Best Scenario: </li></ul><ul><ul><li>RoSIN-esque for every play </li></ul>...
About XML Team <ul><li>Sports Data Integration Specialists </li></ul><ul><li>Sports Feed Distributor </li></ul><ul><ul><li...
About Me <ul><li>12 Years in data publishing and syndication </li></ul><ul><ul><li>WSJ.com founding member </li></ul></ul>...
Upcoming SlideShare
Loading in...5
×

Rallying Around Standards

807

Published on

case for XML standardization of sports statistics from teamXML

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
807
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Rallying Around Standards

  1. 1. Rallying Around Standards A Sports Data Processing Primer Alan Karben XML Team Solutions, Inc. December 6, 2005 alan (at) xmlteam (dot) com
  2. 2. Why Use Standards? <ul><li>Makes everyone’s lives easier </li></ul><ul><li>Allows for faster improvements </li></ul>
  3. 3. Why Use Open Standards? <ul><li>Take advantage of the works of others </li></ul><ul><li>Allow more vendors to supply more tools </li></ul><ul><li>The Sport-Agnostic Approach </li></ul><ul><ul><li>Allows tool providers to amortize work across sports markets </li></ul></ul><ul><ul><li>Facilitates cooperation with other leagues and events around the world, baseball and non-baseball </li></ul></ul><ul><ul><li>Obviously must include all needed baseball details </li></ul></ul>
  4. 4. Common Ground Across Sports <ul><li>Concept of schedules and games </li></ul><ul><li>Player stats and team stats </li></ul><ul><li>Stats broken down by season/game/context </li></ul><ul><li>Tournament structures </li></ul><ul><li>Standings </li></ul><ul><li>Personal player history of leagues, teams, injuries </li></ul>
  5. 5. Three Standards Worth Evaluating <ul><li>SportsML </li></ul><ul><ul><li>For document normalization and interchange </li></ul></ul><ul><li>RoSIN </li></ul><ul><ul><li>For detailed play descriptions </li></ul></ul><ul><li>XTOSS </li></ul><ul><ul><li>For storage in SQL database </li></ul></ul>
  6. 6. SportsML <ul><li>Under the banner of the IPTC </li></ul><ul><ul><li>Global media technical trade association </li></ul></ul><ul><ul><li>http://www.sportsml.org / http:// www.itpc.org </li></ul></ul><ul><li>Has a Core Schema for the Sport-Agnostic content </li></ul><ul><ul><li>Plug-in Schemas for 8 sports (so far), including baseball </li></ul></ul><ul><li>SportsML Example: Phillies In-Progress Box Score </li></ul>
  7. 7. RoSIN <ul><li>R etr oS heet I ntra-play N otation </li></ul><ul><li>A codified subset of Retrosheet’s play syntax </li></ul><ul><ul><li>Yacc grammar </li></ul></ul><ul><ul><li>Full functionality of Retrosheet syntax, but with fewer ways to skin same cats </li></ul></ul><ul><ul><li>Validatable and well-documented </li></ul></ul><ul><li>XML Team hired Ted Turocy to build it out </li></ul><ul><ul><li>Retrosheet guru – wrote Chadwick (open-source version of RetroSheet File parser) </li></ul></ul><ul><li>In the process of packaging up a release for Retrolist and for Open Source </li></ul>
  8. 8. XTOSS <ul><li>X ML T eam O pen S ports S chema </li></ul><ul><ul><li>Under Development – currently at Draft 5 </li></ul></ul><ul><li>A database schema designed to be parallel to SportsML </li></ul><ul><ul><li>Uses same terminology, same basic models </li></ul></ul><ul><ul><li>Not every SportsML attribute / RoSIN substring merits indexing </li></ul></ul><ul><li>Released as Gnu GPL at http://www.xtoss.org </li></ul>
  9. 9. The Data Processing Lifecycle <ul><li>Challenges of Incoming Feeds: </li></ul><ul><ul><li>Many different formats </li></ul></ul><ul><ul><li>Internal and External suppliers </li></ul></ul><ul><ul><li>Data often overlaps in coverage of games, players, teams </li></ul></ul>
  10. 10. Normalization Tips (1) <ul><li>Get data into XML ASAP </li></ul><ul><li>Assign unique Doc-ID’s </li></ul><ul><li>Save all incoming data in their original formats, using common filename convention as normalized content </li></ul><ul><li>In normalized file, store path/filename to original file </li></ul><ul><ul><li>/archive/source_name/yyyy/mm/dd or some such </li></ul></ul>
  11. 11. Normalization Tips (2) <ul><li>Don’t have independent programs that load various incoming formats right into your database </li></ul><ul><ul><li>Databases evolve </li></ul></ul><ul><ul><li>Other cool indexing tools come along </li></ul></ul>
  12. 12. Data Processing Architecture <ul><li>Data Acquisition </li></ul><ul><li>ASCII Extraction </li></ul><ul><li>Normalization & Validation </li></ul><ul><li>Monitoring & Quality Control </li></ul><ul><li>SQL Storage </li></ul><ul><li>Search and Retrieval </li></ul><ul><li>Conversion and Formatting </li></ul><ul><li>Report Generation </li></ul><ul><li>Interactivity Generation </li></ul>Normalize Store Format Feed Providers Output
  13. 13. Fundamentals & Derivables <ul><li>Incoming Data Best Scenario: </li></ul><ul><ul><li>RoSIN-esque for every play </li></ul></ul><ul><ul><li>Substitution records in context </li></ul></ul><ul><ul><li>Player-keys, team-keys, event-keys, etc. </li></ul></ul><ul><li>More Commonly: </li></ul><ul><ul><li>Cumulative game stats for each player </li></ul></ul><ul><ul><li>Ambiguous substitutions </li></ul></ul><ul><ul><li>No unique keys / IDs </li></ul></ul>
  14. 14. About XML Team <ul><li>Sports Data Integration Specialists </li></ul><ul><li>Sports Feed Distributor </li></ul><ul><ul><li>All Sources </li></ul></ul><ul><ul><li>Variety of clients: Fantasy, Portals, Wireless </li></ul></ul><ul><li>Sports Data Consulting </li></ul><ul><ul><li>Hired by NYTimes for new print production system for sports section </li></ul></ul><ul><ul><li>Hired by Associated Press to process IOC feed for Turin 2006 (likewise for Athens 2004) </li></ul></ul>
  15. 15. About Me <ul><li>12 Years in data publishing and syndication </li></ul><ul><ul><li>WSJ.com founding member </li></ul></ul><ul><ul><li>ScreamingMedia head of Product Development </li></ul></ul><ul><ul><li>Founded XML Team 3 years ago </li></ul></ul><ul><li>Founding chairman of SportsML </li></ul><ul><ul><li>Active in other news industry standards </li></ul></ul><ul><li>XML junkie / Spec writer / Other biz responsibilities, as I’m forced </li></ul>
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×