Your SlideShare is downloading. ×
Rallying Around Standards
Rallying Around Standards
Rallying Around Standards
Rallying Around Standards
Rallying Around Standards
Rallying Around Standards
Rallying Around Standards
Rallying Around Standards
Rallying Around Standards
Rallying Around Standards
Rallying Around Standards
Rallying Around Standards
Rallying Around Standards
Rallying Around Standards
Rallying Around Standards
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Rallying Around Standards

793

Published on

case for XML standardization of sports statistics from teamXML

case for XML standardization of sports statistics from teamXML

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
793
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Rallying Around Standards A Sports Data Processing Primer Alan Karben XML Team Solutions, Inc. December 6, 2005 alan (at) xmlteam (dot) com
  • 2. Why Use Standards?
    • Makes everyone’s lives easier
    • Allows for faster improvements
  • 3. Why Use Open Standards?
    • Take advantage of the works of others
    • Allow more vendors to supply more tools
    • The Sport-Agnostic Approach
      • Allows tool providers to amortize work across sports markets
      • Facilitates cooperation with other leagues and events around the world, baseball and non-baseball
      • Obviously must include all needed baseball details
  • 4. Common Ground Across Sports
    • Concept of schedules and games
    • Player stats and team stats
    • Stats broken down by season/game/context
    • Tournament structures
    • Standings
    • Personal player history of leagues, teams, injuries
  • 5. Three Standards Worth Evaluating
    • SportsML
      • For document normalization and interchange
    • RoSIN
      • For detailed play descriptions
    • XTOSS
      • For storage in SQL database
  • 6. SportsML
    • Under the banner of the IPTC
      • Global media technical trade association
      • http://www.sportsml.org / http:// www.itpc.org
    • Has a Core Schema for the Sport-Agnostic content
      • Plug-in Schemas for 8 sports (so far), including baseball
    • SportsML Example: Phillies In-Progress Box Score
  • 7. RoSIN
    • R etr oS heet I ntra-play N otation
    • A codified subset of Retrosheet’s play syntax
      • Yacc grammar
      • Full functionality of Retrosheet syntax, but with fewer ways to skin same cats
      • Validatable and well-documented
    • XML Team hired Ted Turocy to build it out
      • Retrosheet guru – wrote Chadwick (open-source version of RetroSheet File parser)
    • In the process of packaging up a release for Retrolist and for Open Source
  • 8. XTOSS
    • X ML T eam O pen S ports S chema
      • Under Development – currently at Draft 5
    • A database schema designed to be parallel to SportsML
      • Uses same terminology, same basic models
      • Not every SportsML attribute / RoSIN substring merits indexing
    • Released as Gnu GPL at http://www.xtoss.org
  • 9. The Data Processing Lifecycle
    • Challenges of Incoming Feeds:
      • Many different formats
      • Internal and External suppliers
      • Data often overlaps in coverage of games, players, teams
  • 10. Normalization Tips (1)
    • Get data into XML ASAP
    • Assign unique Doc-ID’s
    • Save all incoming data in their original formats, using common filename convention as normalized content
    • In normalized file, store path/filename to original file
      • /archive/source_name/yyyy/mm/dd or some such
  • 11. Normalization Tips (2)
    • Don’t have independent programs that load various incoming formats right into your database
      • Databases evolve
      • Other cool indexing tools come along
  • 12. Data Processing Architecture
    • Data Acquisition
    • ASCII Extraction
    • Normalization & Validation
    • Monitoring & Quality Control
    • SQL Storage
    • Search and Retrieval
    • Conversion and Formatting
    • Report Generation
    • Interactivity Generation
    Normalize Store Format Feed Providers Output
  • 13. Fundamentals & Derivables
    • Incoming Data Best Scenario:
      • RoSIN-esque for every play
      • Substitution records in context
      • Player-keys, team-keys, event-keys, etc.
    • More Commonly:
      • Cumulative game stats for each player
      • Ambiguous substitutions
      • No unique keys / IDs
  • 14. About XML Team
    • Sports Data Integration Specialists
    • Sports Feed Distributor
      • All Sources
      • Variety of clients: Fantasy, Portals, Wireless
    • Sports Data Consulting
      • Hired by NYTimes for new print production system for sports section
      • Hired by Associated Press to process IOC feed for Turin 2006 (likewise for Athens 2004)
  • 15. About Me
    • 12 Years in data publishing and syndication
      • WSJ.com founding member
      • ScreamingMedia head of Product Development
      • Founded XML Team 3 years ago
    • Founding chairman of SportsML
      • Active in other news industry standards
    • XML junkie / Spec writer / Other biz responsibilities, as I’m forced

×