An overview of rNews a new standard for embedding publishing metadata into HTML documents using RDFa.
These slides are from a presentation given by Evan Sandhaus, Lead Architect Semantic Platforms at The New York Times, on behalf of The International Press Telecommunications Council, at The Lotico New York Semantic Web Meetup on April 21, 2011.
Video of this presentation is available at: http://vimeo.com/22891051
1. rNews
Embedded Data For
The News Industry
Prepared By
Evan Sandhaus
For
The New York
Semantic Web Meetup
1
2. Agenda
Why we need semantic markup.
The Key Technologies
• RDF - Resource Description Framework
• RDFa - Resource Description Framework in Attributes
rNews
• Class Diagram
• Classes & Properties
• Simple Example
Call To Action
Discussion
2
3. Why we need
Semantic
Markup
The Burning Question
4. The Problem of Structured Data
Modern Web Sites
Built with 3 Tier
Display
Architecture
• Data Tier: Database
Tier
Where Content Lives.
• Presentation Tier:
HTML Document that is
sent to user.
• Logic Tier: Software
Logic
that reads from the Data
Tier and outputs the Tier
Presentation Tier.
Data
Tier
4
5. Linked Data
Data Tier Logic Tier Display Tier
Label Type Value <html>
<head>
id number 1248069162607 <title>
New Web Code Draws Concern...
Headline text New Web Code Draws Concern...
</title>
Byline text By TANZINA VEGA </head>
<body>
Date date 20101010
<div>
Body text In the next few years, a powerful... New Web Code Draws Concern...
</div>
Length number 1123
<div>
Tag text Privacy By TANZINA VEGA
</div>
Tag text Computers and the Internet
<div>
Tag text Web Browsers October 10, 2010
</div>
<div>
In the next few years, a powerful...
</div>
</body>
Content very well structured on Data </html>
Tier, but all of this structure is lost in
translation to presentation tier.
5
6. Linked Data
<html>
Display Tier ?
<head>
<title>
New Web Code Draws Concern...
</title>
</head>
<body>
<div>
=
New Web Code Draws Concern...
</div>
<div>
By TANZINA VEGA
</div>
<div>
Search engines, social
October 10, 2010 networks, aggregators and
</div>
<div> other sites only see the
In the next few years, a powerful...
</div> Display Tier, and cannot
</body>
</html>
leverage the underlying
structure of the data.
6
7. Linked Data
With Structured
Data
No Structured
Data
Without structured data search engines, social
networks and other sites cannot attractively format
links back to our site, potentially decreasing referral
traffic.
7
8. The Case of the Missing Structured
Data
How do we solve it?
Two major approaches:
• Microformats - Tomorrow
• RDF/RDFa - Coming Right up
8
10. RDF
The Resource Description Framework is a data
model for expressing (almost) any concept in the
world.
Developed to Facilitate
• Data portability and interoperability
• Deep reasoning
W3C Recommendation
Represents concepts as “graphs” of triples
13. Deputy
Director
Schema
Standards
Person
Stuart
Myles
14. The
Associated
Press
Deputy
Director
Schema
Standards
Person
Stuart
Myles
15. The
Associated
Press
Deputy
Director Manhattan,
Schema New York
Standards
Person
Stuart
Myles
16. The
Associated
Press
Deputy
Director Manhattan,
Schema New York
Standards
Person
Stuart Northeastern
United States
Myles
17. The
Associated
Press
Deputy
Director Manhattan,
Schema New York
Standards
Person
Stuart Northeastern
United States
Myles
xxx@ap.org
18. The
Associated
Press
Deputy
Director Manhattan,
Schema New York
Standards
Person
Stuart Northeastern
United States
Myles
xxx@ap.org
@mr_awesome
19. The
Associated
Press
Deputy
Director Manhattan,
Schema New York
Standards
Person
Stuart Northeastern
United States
Myles
http:// xxx@ap.org
facebook.com/
stuart_myles
@mr_awesome
20. The
Associated
Press
Deputy
Director Manhattan,
Schema New York
Standards
Person
Stuart Northeastern
United States
Myles
http:// xxx@ap.org
facebook.com/
stuart_myles
@mr_awesome
21. RDF - The True Nature of A Triple
Stuart
Subject
Myles
Verb
Object @mr_awesome
12
22. RDF - The True Nature of A Triple
Stuart http://www.iptc.org/authority/per/
Subject
Myles stuart_myles
Verb
Object @mr_awesome
12
23. RDF - The True Nature of A Triple
Stuart http://www.iptc.org/authority/per/
Subject
Myles stuart_myles
Verb http://www.iptc.org/demo/ns/twitterHandle
Object @mr_awesome
12
24. RDF - The True Nature of A Triple
Stuart http://www.iptc.org/authority/per/
Subject
Myles stuart_myles
Verb http://www.iptc.org/demo/ns/twitterHandle
Object @mr_awesome “@mr_awesome”
12
26. RDF - The True Nature of A Triple
Stuart
Subject
Myles
Verb
Object @mr_awesome
14
27. RDF - The True Nature of A Triple
Stuart http://iptc.org/authority/per/stuart_myles
Subject
Myles
Verb
Object @mr_awesome
14
28. RDF - The True Nature of A Triple
Stuart http://iptc.org/authority/per/stuart_myles
Subject
Myles
Verb demo:twitterHandle
Object @mr_awesome
14
29. RDF - The True Nature of A Triple
Stuart http://iptc.org/authority/per/stuart_myles
Subject
Myles
Verb demo:twitterHandle
Object @mr_awesome “@mr_awesome”
14
30. The
Associated
Press
Deputy
Director Manhattan,
demo:worksFor
Schema New York
Standards
n
de edI
c at
m o
o: c eL
is ffi
A o
o:
m
de
http://
rdf:type demo:residesIn
demo: www.iptc.org/ Northeastern
person authority/per/ United States
stuart_myles
de
m
o:
em
s
eA
ai
demo:twitterHandle
lA
am
dd
l:s
re
ow
ss
http:// xxx@ap.org
facebook.com/
stuart_myles
@mr_awesome
32. RDF - Writing it All Down
Remember - RDF is a data model not a file format.
Several file formats for serializing RDF
• Turtle
• N-Triples
• RDF/XML
17
35. RDFa - Resource Description
Framework in Attributes
In theory RDF is a flexible extensible format for
expressing and sharing knowledge.
Problem with the theory - The Web shares
“knowledge” using HTML.
Solution: RDFa - a W3C standard for embedding
RDF triples within standard HTML documents.
+ <HTML/> = RDFa
20
36. RDFa - Facts
Embed structured data into HTML by extending
HTML standard by overloading and introducing
attributes.
• overloaded attributes: src, href, rel, rev
• new attributes: about, content, typeof
Format used by Facebook for open graph.
RDFa distillers extract structured data from HTML
as RDF.
21
53. rNews - Timeline
September 2010 - rNews proposed to IPTC at fall
meeting
March 2011 - rNews draft version 0.1 approved by
IPTC at summer meeting.
March - May 2011 - IPTC solicits feedback on draft
standard.
June 2011 - IPTC to vote on revised standard at
summer meeting
June - September - Implementation testing of
rNews.
September - Final vote on rNews
38
54. rNews - Class Diagram
TickerSymbol Person Location
subclassOf subclassOf
hasTickerSymbol
subclassOf subclassOf
Concept
Organization Party
createdBy
Article
createdBy
providedBy taggedBy
copyrightedBy
hasSource
contributedBy
hasAccountableParty
Comment discussedBy subclassOf
depictedBy
NewsItem
Media
Headline
Hed headlinedBy subclassOf
39
55. rNews - News Item
NewsItem
dateCreated
copyrightNoticeUri
dateModified
genre
description
genreUri
language
guid
thumbnailUri
version
title
commentCount
usageTerms
commentCountURI
usageTermsUri
discussionUri
copyrightNotice
40
60. rNews - Tag, Party, Person,
Organization, Location
additionalName honorificSuffix latitude
givenName lastName longitude
honorificPrefix title altitude
countryName
email
fax
locality
su postalCode
Person bcl region Location
ass
Of streetAddress subclassOf
tel
url
Organization Concept
subclassOf subclassOf
name
45
62. rNews - Design Goals & Principles
Guiding vision - Einstein’s Corollary to Occam’s Razor:
• “Everything should be kept as simple as possible, but no simpler.”
Goals:
• Decision makers should see rNews implementation as an extremely
minor time commitment (1-2 days).
• Developers should be able to implement rNews without becoming
semantic web experts.
• Semantic Web Experts should be able to easily leverage rNews-
annotated documents.
Strategy:
• Unified Namespace
• Reuse existing IPTC standards
• Use controlled vocabularies to minimize number of objects and
properties.
47
66. rNews Design Principle: Unified
Namespace
Problems With Multiple Namespaces
• Dramatically increases learning curve.
• Dramatically increases probability of implementation errors.
• Negative impact on implementation time.
Problems With Single Namespace
• Reduces utility of rNews-annotated documents to many
existing semantic web tools.
Solution
• Single namespace, but next version will include a machine-
readable mapping from rNews objects and properties to
external vocabularies.
51
67. rNews Design Principle: Reuse
Existing IPTC Standards
rNews is designed with extensive reference to existing
IPTC standards.
That is why, for instance, we use the term copyrightNotice
(from NewsML G2) instead of rights (from Dublin Core).
IPTC Standards are widely deployed in the online
publishing world
• NewsML-G2
• EventsML-G2
• SportsML-G2
• NewsML
• News Industry Text Format
• SportsML
• IPTC 7901
• Familiar to implementors
• Familiar to IPTC
52
68. rNews Design Principle: Minimize Objects
& Properties With Controlled Vocabularies
Why This
Media
53
69. rNews Design Principle: Minimize Objects
& Properties With Controlled Vocabularies
Instead of this
Audio Image Video
54
70. rNews Design Principle: Minimize Objects
& Properties With Controlled Vocabularies
Controlled Vocabulary Eliminates Need for Multiple
Classes Media
MediaType Encoding
MediaTypeUri EncodingUri
http://cv.iptc.org/newscodes/mediatype/ http://cv.iptc.org/newscodes/format/
http://cv.iptc.org/newscodes/audiocodec/
http: //cv.iptc.org/newscodes/videocodec/
55
72. Call To Action
Threefold Request
• What you can do now.
• What you can do this
summer.
• What you can do this
fall.
57
73. Call To Action: What you can do Now
Review the draft standard at http://dev.iptc.org/
rnews (ec2 permitting).
Post your feedback to the forum or email us:
• Stuart (smyles@ap.org)
• Andreas (Andreas.Gebhard@gettyimages.com)
• Evan (evan@nytimes.com)
If you want to see a standard like this supported,
encourage your vendors and providers to review
and support rNews.
58
74. Call To Action: What (else) you can do
Now
Have an idea for an rNews use case tell us.
Test your code on sample articles that we will
provide (as soon as ec2 is working again).
59
75. Call To Action: What (else) you can do
This Summer
• Try implementing the next version of rNews on all or some
your site.
• Continue to provide feedback about how well the standard
is working for you: what works well, what needs
improvement.
This Fall
• Implement version 1.0 of rNews on your site.
60