IPTC and the Semantic Web: Two Paths and Seven Lessons

5,829 views

Published on

IPTC is exploring the use of Semantic Web for the news industry.

This is the report I gave to the IPTC's 2010 AGM in San Francisco. We decided that there are two paths into the Semantic Web - creating Linked Data (using SKOS) and creating a news ontology based on NewsML-G2 (using OWL). Here are the seven lessons we've learned so far in our exploration.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,829
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
30
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

IPTC and the Semantic Web: Two Paths and Seven Lessons

  1. 1. IPTC and The Semantic Web:Two Paths and Seven Lessons<br />Stuart Myles<br />Associated Press<br />29th June 2010<br />
  2. 2. Semantic Web News Vocabularies<br />© 2010 IPTC (www.iptc.org) All rights reserved<br />2<br />IPTC decided to experiment with semantic web and linked data<br />Best known RDF vocabularies are<br />FOAF = Friend of a Friend<br />http://xmlns.com/foaf/spec/<br />DCMI Terms = Dublin Core Metadata Initiative Terms<br />http://dublincore.org/<br />Other examples at http://vocab.org/<br />New York Times, Dow Jones and others have identified a need for a news vocabulary<br />Held a series to teleconferences to make rapid progress<br />
  3. 3. Two Paths to the Semantic Web<br />We identified two paths into the Semantic Web world:<br />Create a news ontology, based on NewsML-G2<br />Formal semantics for news, specified using OWL<br />“RDFization” of IPTC’s family of news standards<br />Turn IPTC subject codes into Linked Data<br />Connect related data across the web using URIs, HTTP & RDF<br />A set of principles from Tim Berners Lee http://www.w3.org/DesignIssues/LinkedData.html<br />We decided to pursue the Linked Data path first<br />© 2010 IPTC (www.iptc.org) All rights reserved<br />3<br />
  4. 4. Following the Linked Data Path<br />The Linked Data principles, as specified by TBL<br />Use URIs as names for things <br />Use HTTP URIs so that people can look up those names. <br />When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) <br />Include links to other URIs, so that they can discover more things<br />Apply the principles to IPTC’s subject codes<br />Already published as XML (G2 Knowledge Items)<br />And as HTML<br />The plan: convert XML into RDF<br />© 2010 IPTC (www.iptc.org) All rights reserved<br />4<br />
  5. 5. Lesson #1One Model, Multiple Vocabularies<br />RDF is a single model - Subject Predicate Object<br />With multiple syntaxes<br />We selected RDF/XML and RDF/Turtle<br />And multiple “vocabularies”<br />Such as SKOS, Dublin Core<br />SKOS = Simple Knowledge Organization System<br />http://www.w3.org/2004/02/skos/<br />Designed for representing thesauri and classification schemes<br />The Semantic Web “way” is<br />Use existing vocabularies as much as possible<br />When you invent a new term, link it to existing terms<br />We decided to use SKOS and DC as the main vocabs<br />© 2010 IPTC (www.iptc.org) All rights reserved<br />5<br />
  6. 6. Lesson #2Tool Support<br />The approach:<br />Use RDF in general<br />Reuse existing vocabularies in particular<br />The benefit:<br />Tools “just work”<br />We learnt that this is mostly true…<br />We played with Protogee, TopBraid, Sesame<br />Most things worked well in all tools<br />But “transitive” versions of SKOS broader, narrower aren’t supported well<br />Late additions to SKOS standard<br />© 2010 IPTC (www.iptc.org) All rights reserved<br />6<br />
  7. 7. Lesson #3Basics Well Documented<br />In general, IPTC KnowledgeItems map well to RDF<br />SKOS concepts<br />Dublin Core properties<br />Certain KI properties don’t have a direct mapping<br />Created and updated timestamps of KnowledgeItem properties<br />Difficult to determine more advanced mappings<br />SKOS wiki had some documentation<br />http://esw.w3.org/SkosCoreGuideToc/SectionVersioning<br />SKOS email list seems dormant<br />SemanticOverflow a great way to get questions answered<br />http://www.semanticoverflow.com/questions/902/adding-created-modified-properties-to-skos-do-i-need-to-reify<br />© 2010 IPTC (www.iptc.org) All rights reserved<br />7<br />
  8. 8. Lesson #4Pull is Better than Push<br />One possibility is to “push” our model into RDF<br />Try to preserve all the original semantics<br />But you don’t gain as much in out-of-the-box tool support<br />The other possibility is to “pull” the model into RDF<br />May lose some nuances<br />But you gain in reuse – of modeling patterns, vocabularies and tool support<br />(In fact, there was some dispute over the intended model of the IPTC KnowledgeItem properties)<br />© 2010 IPTC (www.iptc.org) All rights reserved<br />8<br />
  9. 9. Lesson #5Linking and Mapping<br />“Include links to other URIs, so that they can discover more things”<br />Linking is the heart of linked data<br />But linking is more like mapping<br />owl:sameas seems to have unintended consequences<br />SKOS’s mapping properties offer a range of options<br />closeMatch, exactMatch, broadMatch, narrowMatch, relatedMatch<br />http://www.w3.org/TR/skos-reference/#mapping<br />We decided to map the 17 top level IPTC subject codes to DBPedia<br />Some top level terms are really “umbrella” terms – difficult to map to a single equivalent<br />© 2010 IPTC (www.iptc.org) All rights reserved<br />9<br />
  10. 10. Lesson #6There’s More to be Done<br />Although we rapidly produced a Linked Data prototype, it is incomplete<br />Content negotiation requires work from the APA hosting<br />We need to think through and approve the details of the mapping<br />The other path remains unexplored<br />Building a news ontology, based on NewsML-G2<br />Can we leverage the work that EBU have already done?<br />What about other formats?<br />Particularly RDFa<br />© 2010 IPTC (www.iptc.org) All rights reserved<br />10<br />
  11. 11. Lesson #7There’s a Lot of Interest<br />High attendance at the Semantic Web IPTC calls<br />Even though the topic is a bit complex and unfamiliar to most<br />Participation was brisk<br />We rapidly developed RDF/XML and RDF/Turtle representations<br />Occasional mentions on Twitter generated a lot more retweets and replies than other IPTC-related tweets<br />There’s a lot of interest inside and outside the IPTC<br />© 2010 IPTC (www.iptc.org) All rights reserved<br />11<br />
  12. 12. IPTC and Semantic Web:Next Steps<br />Complete Linked Data mapping of IPTC Subject Codes and Media Codes<br />Explore creating a News Ontology<br />Find out more about EBU’s work<br />Start RDFa representation of news metadata<br />Reach out to the broader Semantic Web and news communities for feedback and collaboration<br />REQUEST to Standards Chair:<br />Can we formalize this effort into an official IPTC Working Group?<br />© 2010 IPTC (www.iptc.org) All rights reserved<br />12<br />

×