Sports and-semantic-tech-v.public


Published on

A survey of issues and the state-of-the-art regarding sports information and the semantic web.

Presented at the IPTC Spring Meeting in Dubai, March 10, 2011.

Published in: Technology
  • Be the first to comment

Sports and-semantic-tech-v.public

  1. 1. Sports and Semantic Tech Paul Kelly XML Team Solutions Chair, SportsML Working Party (IPTC) Spring Meeting, IPTC Dubai, UAE / 9th March 2011
  2. 2. Lets Talk About This• Exploratory, not a didactic presentation• Purpose – gauge interest among members – brainstorm – guide SWP agenda• Explore – set of problems – possible solutions – or do we have that backwards? – business cases? © 2010 IPTC ( All rights reserved 2
  3. 3. Why Sports?• easy? a no-brainer? – Silver Oliver, BBC • "Silver says the BBC has started with sport, because it is simpler. The events and the actors taking part in those events are known in advance. For example, even this far ahead you know the fixture list, venues, teams and probably the majority of the players who are going to take part in the 2010 World Cup." – – relationships easy to understand • hierarchical • sport/league/event/team/player © 2010 IPTC ( All rights reserved 3
  4. 4. Sports News Biz• Business products – team rosters – schedules – pre-event reports (text and statistical) – live updates – post-event reports (text and statistical) – standings/tables – stat reports – injury reports – general news – wagering – multimedia – etc. © 2010 IPTC ( All rights reserved 4
  5. 5. What are the issues?• ID resolution or acquisition• data availability• what to capture? – everything rdfable? – permanent metadata – narrative – perishable metadata• implementing/architecture• marketing scenarios © 2010 IPTC ( All rights reserved 5
  6. 6. IDs, Concepts and relationships• IDs – player, team, event, league, etc.• concepts – player, team, event, league, etc. – also tournament-stage, season-type, etc. – goals-scored, shots-missed, shots-on-net, etc.• relationships – isCompetitiveSportingOrganisationOf – isGroupOf – isMatchOf – hasStat © 2010 IPTC ( All rights reserved 6
  7. 7. Data domains• within sports domain – eg. resolving player IDs between providers – player page with wikipedia content• within entire news domain – when news and sport intersect • doping, Beckhams, etc. • multi-domain events like Olympics • event management• broader marketing domain – personal data • location • favourite team • favourite gin © 2010 IPTC ( All rights reserved 7
  8. 8. Whats Out There?• Linked Data State of the art – dbpedia and freebase • compare rosters for Miami Heat – google calendar • schedules – Guardian medals spreadsheet – • code resolver • originally thought of as strictly external • but ties in with – internal metadata management – other apps that produce and consume metadata – Did I miss anything? © 2010 IPTC ( All rights reserved 8
  9. 9. Ontologies• BBC sport ontology – • The Sport Ontology is a simple lightweight ontology for publishing data about competitive sports events. The terms in this ontology allow data to be published about: – The structure of sports tournaments as a series of events – Agents competing in a competition – The type of discipline an event involves – The award associated with the competition – ...etc © 2010 IPTC ( All rights reserved 9
  10. 10. BBC Site• BBC World Cup Site – built on top of triple-store; dynamically produced via inference – Jem Rayfield: "The BBC World Cup 2010 site features 700-plus team, group and player pages, which are powered by a high- performance dynamic semantic publishing (DSP) architecture. Previously, BBC Sport would never have considered creating this number of indices in the CPS, as each index would need an editor to keep it up to date with the latest stories, even where automation rules had been set up. To put this scale of task into perspective, the World Cup site has more index pages than the rest of the BBC Sport site." © 2010 IPTC ( All rights reserved 10
  11. 11. BBC Site• "This framework facilitates the publication of automated metadata-driven web pages that are light-touch, requiring minimal journalistic management, as they automatically aggregate and render links to relevant stories."• "The foundation of these dynamic aggregations is a rich ontological domain model. The ontology describes entity existence, groups and relationships between the things/ concepts that describe the World Cup. For example, "Frank Lampard" is part of the "England Squad" and the "England Squad" competes in "Group C" of the "FIFA World Cup 2010" • bbc_world_cup_2010_dynamic_sem.html © 2010 IPTC ( All rights reserved 11
  12. 12. BBC Site• John O Donovan: "Another way to think about all this, is that we are not publishing pages, but publishing content as assets which are then organised by the metadata dynamically into pages"• "We believe this is the first large scale, mass media site to be using concept extraction, RDF and a Triple store to deliver content." – the_world_cup_and_a_call_to_ac.html• entire BBC sports site will cut over to this architecture for 2012 Olympics. © 2010 IPTC ( All rights reserved 12
  13. 13. What to Capture?• everything in rdf? – where to draw the line between flat and deep data? • vertical (sportsml) and horizontal (rdf)• kinds of data – stable metadata • permanent – player, team, event, league • fixed – schedules – unpredictable permanent (meta)data • historical post-event results – scores – highlights – outcome » historical interest, such as last time England won the World Cup – the 0-goals, 0-assists guy? © 2010 IPTC ( All rights reserved 13
  14. 14. Perishable Metadata• perishable metadata – the pre-event narrative • why should I follow this game? • where should I watch it? • who should I watch it with? • more of a marketing opportunity? © 2010 IPTC ( All rights reserved 14
  15. 15. Pre-event significance• What makes a sports event significant? – decisive game • Cup Final • avoid relegation – top teams – matchup history – rivalries – top players – streaks • winning • scoring • losing – interesting players – news intersection (New Orleans Saints @ Super Bowl) © 2010 IPTC ( All rights reserved 15
  16. 16. Pre-Event Metadata• These are all narratives – all of it would be in the prose of a match preview• Contrast – structure and predictability of schedule – unpredictability of narrative --> essential – Winter Olympics narrative • controllable? – Georgian Luger – "Own the Podium" © 2010 IPTC ( All rights reserved 16
  17. 17. Next Steps• What should SportsML Working Party do? – just SportsML? – what about codes, concepts and ontologies • map SportsML to ontologies – rename to Sports News and Data Management? © 2010 IPTC ( All rights reserved 17