rNews: Embedding Metadata in On-line News Jayson Lorenzen, Business Wire
[email_address]   @jays0n
SemTech June 5-9 2011, San Francisco, CA
Outline Introduction to the IPTC
Outline Introduction to the IPTC Why semantic markup for
news is needed
Outline Introduction to the IPTC Why semantic markup for
news is needed Overview of the rNews standard
Outline Introduction to the IPTC Why semantic markup for
news is needed Overview of the rNews standard How the standard was created
Outline Introduction to the IPTC Why semantic markup for
news is needed Overview of the rNews standard How the standard was created Demos and Use Cases
IPTC: Background http://www.iptc.org International Press  Telecommunications Council IPTC
IPTC: Background A consortium of the world's major news
organizations,  associations
and industry vendors . . .
IPTC: Background There are  currently about
70 member organizations in
the IPTC
IPTC: Background Founded in 1965  to safeguard
and develop the
telecommunications
interests of the press
IPTC: Background Creates and maintains standards for the exchange of news
IPTC: 7901 http://www.iptc.org/site/News_Exchange_Formats/IPTC_7901 In the 70s, IPTC 7901 is  the first IPTC standard for news
IPTC: 7901 REU1143 4 OEC 114 (reecr) F1001481 BC-CUSTOMS-COUNTERFEITING Commission to press Internal Market Council on pirated goods BRUSSELS, Nov 10 (Reuter) - Customs commissioner Fred Bloggs today announced that pirated goods would be available in all European markets by the end of the millennium The commission's proposal was welcomed by all member states however, customs authorities declined to comment due to the possible implications on staffing  Levels.  REUTER 1548 101193 GMT Structured text format, using mainly whitespace delimiters
IPTC: IIM http://www.iptc.org/site/News_Exchange_Formats/IIM/ In the 80s and 90s, IIM  (Information Interchange Model)
is the first IPTC multimedia format
"IPTC fields” / "IPTC header” in Digital Image Files
IPTC: XML Standards Also in the 90s NITF   (News Industry Text Format)  Structure for independent news articles  <?xml version=&quot;1.0&quot;?>  <nitf> <head> <title>Norfolk Weather and Tide Updates</title> <tobject tobject.type=&quot;news&quot;>/body.content> http://www.nitf.org
IPTC: XML Standards NewsML Structure for packages of  news items containing  text, photo, graphics, and video components <NewsML Version=&quot;1.2&quot;> ... <NewsItem> <Identification> <NewsIdentifier> <ProviderId>businesswire.com</ProviderId><NewsItemId>20100809006755 ... http://www.newsml.org
IPTC: XML Standards G2 Family of standards   XML Schema based components shared by entire family:  (NewsML-G2, EventsML-G2 & SportsML-G2) <newsItem guid=&quot;urn :newsml:acmenews.com:20081125T1205:US-FINANCE-PAULSON&quot; version=&quot;3&quot; xmlns=&quot;http://iptc.org/std/nar/2006-10-01/&quot;  standard=&quot;NewsML-G2&quot;  http://iptc.org/site/News_Exchange_Formats/IPTC_G2-Standards_- _about_which_one_do_you_want_to_know_more
IPTC:  NewsCodes http://www.newscodes.org Taxonomies for the News Industry Linkable Data  available in RDF/XML
& Turtle
rNews:  Why we need semantic markup for news
rNews: why The Problem of Structured Data Modern Web Sites
Built with 3 Tier Architecture Data Tier :
Database where content lives
Logic Tier :
Software that reads from the data tier and outputs the presentation tier
Presentation Tier :
HTML document sent
to the user
rNews: why Though the data is carried forward, most of the structure is lost
rNews: why With structured data search engines,  social networks and aggregators  can better link to news  With structured data No structured data
rNews: why
rNews: why <body> <div id=&quot;bwNews&quot;> <div id=&quot;bwNewsRelease&quot;> <div id=&quot;bwNewsBody&quot;> <div id=&quot;bwCompanyLogos&quot;> <img... </div> <div id=&quot;bwStory&quot;> <div class=&quot;bwStoryDateline&quot;>May 5, 2011 12:52 UTC</div> <h1>Retail Report on April 2011 Retail Sales Figures</h1> <div id=&quot;bwStoryBody&quot;>... Scrape?
rNews: why APIs? API and Technology different for each company contractual agreement might be required May not always be an option for search engines, and social networks
How do we solve these problems rNews:
rNews:  Three major approaches
rNews:  Microformats: hNews  (microformats.org)
rNews:  Microdata: schema.org/NewsArticle
rNews:  RDF / RDFa:
rNews:  http://www.rnews.org rNews A model for news-specific metadata in HTML
rNews:  Design Goals
rNews: Design Goals Decision makers  should see rNews implementation as a minor  time commitment
rNews: Design Goals Developers should not have to be Semantic Web experts to implement rNews
rNews: Design Goals Semantic Web  experts should be able to Leverage  rNews-annotated documents
rNews:  Design Strategy
rNews: Design Strategy Unified Namespace Instead of:
rNews: Design Strategy xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot; xmlns:dcterm=&quot;http://purl.org/dc/terms/&quot; xmlns:ctag=&quot;http://commontag.org/ns#&quot; xmlns:c=&quot;http://s.opencalais.com/1/pred/&quot; xmlns:v=&quot;http://rdf.data-vocabulary.org/#&quot; xmlns:foaf=&quot;http://xmlns.com/foaf/0.1/&quot; xmlns:sioc=&quot;http://rdfs.org/sioc/ns#&quot; xmlns:sioctypes=&quot;http://rdfs.org/sioc/types#&quot; xmlns:fb=&quot;http://developers.facebook.com/schema/&quot; xmlns:og=&quot;http://opengraphprotocol.org/schema/&quot;
rNews: Design Strategy Reuse existing IPTC Standards IPTC standards widely  used in the industry Familiar to implementors Familiar to the IPTC
rNews: Design Strategy Use Controlled  Vocabularies to Minimize number Of objects and properties cv.iptc.org/newscodes/ format / cv.iptc.org/newscodes/ audiocodec / cv.iptc.org/newscodes/ videocodec / cv.iptc.org/newscodes/ mediatype /
rNews:  Classes & Properties http://dev.iptc.org/rnewsowl
rNews: Class Diagram
rNews: Classes
rNews: Classes
rNews: Classes
rNews: Classes
rNews: Classes
rNews: Classes
rNews: Classes
rNews: Classes
rNews: Classes
rNews:  how rNews was created
rNews:how it was created IIM   IPTC 7901 The IPTC have done this before
rNews:how it was created

Jayson lorenzen iptc_rnews_overview

Editor's Notes

  • #8 -IPTC (The International Press Telecommunications Council) Creates and maintains standards for the exchange of news. a non-profit consortium of the world&apos;s major news agencies and news industry vendors. Creats and maintains standards for the exchange of news (and for embedding metadata in photos) used by virtually every major news organization in the world. -Established in 1965 by a group of news organisations to safeguard the telecommunications interests of the World&apos;s Press. Group included the Alliance Européenne des Agences de Presse, ANPA (now NAA), FIEJ (now WAN) and the North American News Agencies (a joint committee of Associated Press, Canadian Press and United Press International) . -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) -Since the 1990&apos;s IPTC&apos;s standardization work is based on open standards (first SGML, then the XML family of standards, MIME, Unicode, and so on.) -NITF News Industry Text Format, SGML Standard Generalized Markup Language first XML (stated as SGML) to exchange news -2001 NewsML IPTC&apos;s first standard to exchange multimedia news (XML)
  • #9 -IPTC (The International Press Telecommunications Council) Creates and maintains standards for the exchange of news. a non-profit consortium of the world&apos;s major news agencies and news industry vendors. Creats and maintains standards for the exchange of news (and for embedding metadata in photos) used by virtually every major news organization in the world. -Established in 1965 by a group of news organisations to safeguard the telecommunications interests of the World&apos;s Press. Group included the Alliance Européenne des Agences de Presse, ANPA (now NAA), FIEJ (now WAN) and the North American News Agencies (a joint committee of Associated Press, Canadian Press and United Press International) . -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) -Since the 1990&apos;s IPTC&apos;s standardization work is based on open standards (first SGML, then the XML family of standards, MIME, Unicode, and so on.) -NITF News Industry Text Format, SGML Standard Generalized Markup Language first XML (stated as SGML) to exchange news -2001 NewsML IPTC&apos;s first standard to exchange multimedia news (XML)
  • #10 -IPTC (The International Press Telecommunications Council) Creates and maintains standards for the exchange of news. a non-profit consortium of the world&apos;s major news agencies and news industry vendors. Creats and maintains standards for the exchange of news (and for embedding metadata in photos) used by virtually every major news organization in the world. -Established in 1965 by a group of news organisations to safeguard the telecommunications interests of the World&apos;s Press. Group included the Alliance Européenne des Agences de Presse, ANPA (now NAA), FIEJ (now WAN) and the North American News Agencies (a joint committee of Associated Press, Canadian Press and United Press International) . -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) -Since the 1990&apos;s IPTC&apos;s standardization work is based on open standards (first SGML, then the XML family of standards, MIME, Unicode, and so on.) -NITF News Industry Text Format, SGML Standard Generalized Markup Language first XML (stated as SGML) to exchange news -2001 NewsML IPTC&apos;s first standard to exchange multimedia news (XML)
  • #11 -IPTC (The International Press Telecommunications Council) Creates and maintains standards for the exchange of news. a non-profit consortium of the world&apos;s major news agencies and news industry vendors. Creats and maintains standards for the exchange of news (and for embedding metadata in photos) used by virtually every major news organization in the world. -Established in 1965 by a group of news organisations to safeguard the telecommunications interests of the World&apos;s Press. Group included the Alliance Européenne des Agences de Presse, ANPA (now NAA), FIEJ (now WAN) and the North American News Agencies (a joint committee of Associated Press, Canadian Press and United Press International) . -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) -Since the 1990&apos;s IPTC&apos;s standardization work is based on open standards (first SGML, then the XML family of standards, MIME, Unicode, and so on.) -NITF News Industry Text Format, SGML Standard Generalized Markup Language first XML (stated as SGML) to exchange news -2001 NewsML IPTC&apos;s first standard to exchange multimedia news (XML)
  • #12 -IPTC (The International Press Telecommunications Council) Creates and maintains standards for the exchange of news. a non-profit consortium of the world&apos;s major news agencies and news industry vendors. Creats and maintains standards for the exchange of news (and for embedding metadata in photos) used by virtually every major news organization in the world. -Established in 1965 by a group of news organisations to safeguard the telecommunications interests of the World&apos;s Press. Group included the Alliance Européenne des Agences de Presse, ANPA (now NAA), FIEJ (now WAN) and the North American News Agencies (a joint committee of Associated Press, Canadian Press and United Press International) . -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) -Since the 1990&apos;s IPTC&apos;s standardization work is based on open standards (first SGML, then the XML family of standards, MIME, Unicode, and so on.) -NITF News Industry Text Format, SGML Standard Generalized Markup Language first XML (stated as SGML) to exchange news -2001 NewsML IPTC&apos;s first standard to exchange multimedia news (XML)
  • #13 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #14 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #15 -Since the 1990&apos;s IPTC&apos;s standardization work is based on open standards (first SGML, then the XML family of standards, MIME, Unicode, and so on.) -NITF News Industry Text Format, SGML Standard Generalized Markup Language first XML (stated as SGML) to exchange news
  • #16 -Since the 1990&apos;s IPTC&apos;s standardization work is based on open standards (first SGML, then the XML family of standards, MIME, Unicode, and so on.) -NITF News Industry Text Format, SGML Standard Generalized Markup Language first XML (stated as SGML) to exchange news
  • #17 -Since the 1990&apos;s IPTC&apos;s standardization work is based on open standards (first SGML, then the XML family of standards, MIME, Unicode, and so on.) -NITF News Industry Text Format, SGML Standard Generalized Markup Language first XML (stated as SGML) to exchange news
  • #18 -Since the 1990&apos;s IPTC&apos;s standardization work is based on open standards (first SGML, then the XML family of standards, MIME, Unicode, and so on.) -NITF News Industry Text Format, SGML Standard Generalized Markup Language first XML (stated as SGML) to exchange news
  • #19 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #20 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #21 IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters) The format is composed of four sections: preheader information, message header, message text, post-text information
  • #22 IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters) The format is composed of four sections: preheader information, message header, message text, post-text information
  • #23 IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters) The format is composed of four sections: preheader information, message header, message text, post-text information
  • #24 IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters) The format is composed of four sections: preheader information, message header, message text, post-text information
  • #25 IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters) The format is composed of four sections: preheader information, message header, message text, post-text information
  • #26 IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters) The format is composed of four sections: preheader information, message header, message text, post-text information
  • #32 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #33 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #34 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #35 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #36 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #37 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #38 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #39 If the IPTC would decide to use metadata from different namespaces we have to ask; which one to choose? The W3C (Media Annotations group) is working on a news ontology and lists twenty (20), different existing metadata schemas. And there are others not listed there. That was another good reason to start with only a single namespace, some mapping would be required in this but also in a multi-namespace case. So ... A: why we haven&apos;t sub-classed or aligned to existing vocabularies. A: we want to, but felt that this work would best be carried out in collaboration with the broader semantic web community -- you. If anybody would like to propose such an alignment we&apos;d be happy to consider
  • #40 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #41 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #42 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #43 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #53 -IPTC 7901, internationalized version of ANPA 1312 from Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA) IPTC 7901 for the transmission of text content to newspapers, news agencies and other recipients. Initially released in the early eighties and last updated in 1995 though it is still in use my many news organizations around the world. Structured text format, using mainly whitespace delimiters (Space and CR LF characters)
  • #55 - rNews start in or around Sept of 2010 via conference calls. (20 on 1 st call, incl. from: AP, BBC, BW, EBU, Getty, IFRA, NYT, PA, TR, Xinhua Work was begun to make the IPTC controlled vocabularies linkable and to align them with other sources of linked data (DBPedia). Hosting and formats for the data were discussed An effort to create an ontology for news, using NewsML G2, the latest IPTC standard for the exchange of news, as a starting point, was begun. RDFa was chosen as the vehicle
  • #64 If the IPTC would decide to use metadata from different namespaces we have to ask; which one to choose? The W3C (Media Annotations group) is working on a news ontology and lists twenty (20), different existing metadata schemas. And there are others not listed there. That was another good reason to start with only a single namespace, some mapping would be required in this but also in a multi-namespace case. So ... A: why we haven&apos;t sub-classed or aligned to existing vocabularies. A: we want to, but felt that this work would best be carried out in collaboration with the broader semantic web community -- you. If anybody would like to propose such an alignment we&apos;d be happy to consider
  • #65 TinyMCE WYSIWYG Editor - Open Source Software project Ontos rNews Solution is based on TinyMCE and adds rNews tagging
  • #66 TinyMCE WYSIWYG Editor - Open Source Software project Ontos rNews Solution is based on TinyMCE and adds rNews tagging
  • #71 TinyMCE WYSIWYG Editor - Open Source Software project Ontos rNews Solution is based on TinyMCE and adds rNews tagging
  • #72 TinyMCE WYSIWYG Editor - Open Source Software project Ontos rNews Solution is based on TinyMCE and adds rNews tagging