SlideShare a Scribd company logo
1 of 17
AP Metadata Services

Amy Sweigert
SemTechBiz
June 6, 2012
About the Associated Press
– AP is a not-for-profit news cooperative, owned by US
  newspaper and broadcast members, founded in 1846

– AP news content is seen by half the world’s
  population on any given day

– We process and deliver 100k+ content items daily

  – AP, member and third-party content

  – Text, photos, audio, multimedia interactives, and
    broadcast and online quality video

  – Primarily B2B
Evolution of AP Metadata Services
                                                      2011
                                                      • RDF modeling
                                                      • API development
                                                      • Pilot offering

           2008
           • Automated tagging of
             Companies, Organizations,
             Geography, Events starts
                                                                          2012
                                                                          • AP Metadata
                                               2009-2010                    Services Launch
                                               • Scope and depth of
2006                                             coverage increases
• Initial taxonomy                             • Platform stabilized
  and rule              2007
  development
                        • Automated tagging of
  starts
                          Subjects, People, Compani
                          es starts
Introducing AP Metadata Services
– Semantic Web services to drive the next generation of
  news delivery and consumption:

  – AP News Taxonomy

  – AP Tagging Service

– B2B service with continuing investment and human
  curation

  – Ongoing and frequent updates to tagging
    rules, entities, concepts and their semantic relationships

– Designed to meet AP’s exacting needs for its own content
What Does Rich Metadata Do for Publishers?
– Connect customers with more relevant content through:

  – Improved search and discovery

  – Automated aggregation, syndication and distribution of related
    content

  – Richer and more relevant content products and services

  – Reduced time to market for new products and services

  – Reduces editorial workload, creates efficiencies

  – Content interoperability
• Site delivered ~5,000
  articles and ~20,000 photos
  over 2 months
• Routing and display of
  content by team and
  conference is automated
• Editorial resources are
  focused on curating only
  the most important parts of
  the site
• Enables user experience
  that would not be possible
  without automated,
  standard metadata
The AP News Taxonomy
– Breadth and depth to support news and current events

– Defines rich semantic metadata specific to news

  – Generic subjects and hierarchy

  – Named entities

  – Relationships, synonyms, additional entity data

– Delivers automated notifications of taxonomy changes
  – New terms, deprecated terms, name changes, etc.
The AP Tagging Service
– Software as a Service

  – Leverages AP investment and expertise

– Tags concepts; more than entity extraction

– Automated tagging tied to AP News Taxonomy ensures
  more consistent, comprehensive results
Top Level Subject Areas:
                                      • Arts and Entertainment
Coverage                              • Business
                                      • Demographic groups
– 4200 Subjects                       • Environment and Nature
                                      • Events
– 2100 Geographic locations           • General News
                                      • Government and Politics
– 1200 Organizations                  • Health
                                      • Lifestyle
– 91,000 People                       • Living Things
                                      • Media
– 41,000 Publicly-traded              • Science
  Companies                           • Social Affairs
                                      • Sports
                                      • Technology

– Supports English language content
A Foundation of Semantic Web Standards
– URIs for all entities and topics
– Taxonomy modeled in RDF
– SKOS Ontology
  – Supplemented with other ontologies
    (Dublin Core, DBPedia, FOAF, GeoNames, etc.)
  – Some AP-specific properties
– Taxonomy and Tagging Service accessible via
  RESTful APIs
– Using a SPARQL end-point internally to provide
  views of the taxonomy
Supported Formats
AP Tagging Service                              AP Taxonomy
– Input formats                                 – Taxonomy Output Format
  – Plain Text                                    – RDF/XML

  – Simple XML: XML encoded content               – RDF/Turtle
         e.g. XHTML, NITF, NewsML, NewsML-G2
                                                  – RDF/JSON
– Output formats
                                                  – NewsML-G2
  – RDF/XML
                                                – Taxonomy Change Log
  – RDF/JSON                                      Output formats
  – RDF/Turtle
                                                  – XML
  – Simple XML
                                                  – CSV
  – NewsML-G2
Metadata Services in AP’s Content Lifecycle

                                                                Content Repository

 3rd party
  content                                Products defined
                                           based on rich        Distribution methods:
                                             metadata             Internet syndication
                                                                  Web portals
                                                                  APIs


AP Editorial
 Content
  (Input)        AP Tagging Service
               applies standard values                      Metadata Services
                  and related data
                                                            •    Taxonomy fed to
                                                                 editorial tools
                                                            •    Automated tagging
                                                                 applies subject and
                                                                 entity metadata from
                                                                 taxonomy
                                                            •    Rich relationships
                                                                 between
                     Standard AP                                 subjects, entities
                        News                                •    Metadata used to
                                                                 deliver targeted
                      Taxonomy
                                                                 feeds, auto-publish and
                        values                                   improve search and
                                                                 browse experience
RDF/XML representation of Scott Walker, Governor of Wisconsin
<skos:Concept rdf:about="http://cv.ap.org/id/11AD96CF0A5149C5B3909F5BE9A5494A">

  <skos:prefLabel xml:lang="en">Scott Walker</skos:prefLabel>

  <ap:associatedState rdf:resource="http://cv.ap.org/id/1BC1BC3082C81004896CDF092526B43E" />

  <ap:entryTerm xml:lang="en">Scott K. Walker</ap:entryTerm>

  <ap:entryTerm xml:lang="en">Scott Kevin Walker</ap:entryTerm>

  <ap:isPlaceholder rdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">false</ap:isPlaceholder>

  <dbpedia-owl:party rdf:resource="http://cv.ap.org/id/BF6E2E80760D10048F8AE6E7A0F4673E" />

  <dbprop:birthdate rdf:datatype="http://www.w3.org/2001/XMLSchema#date">1967-11-02</dbprop:birthdate>

  <dcterms:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009-11-01T10:23:29-
05:00</dcterms:created>

  <dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-02-26T10:14:13-
05:00</dcterms:modified>

  <rdf:type rdf:resource="http://cv.ap.org/c/Politician" />

  <skos:altLabel xml:lang="en">Scott K. Walker</skos:altLabel>

  <skos:altLabel xml:lang="en">Scott Kevin Walker</skos:altLabel>

  <skos:broader rdf:resource="http://cv.ap.org/id/C9D7FA107E4E1004847ADF092526B43E" />

  <skos:definition xml:lang="en">45th Governor of Wisconsin. Milwaukee, Wisconsin County Executive. US
Republican member of the Wisconsin State Assembly.</skos:definition>

  <skos:inScheme rdf:resource="http://cv.ap.org/a#person" />

</skos:Concept>
- <ClassificationResults>

    <DocumentId>C495D353258440B487279767F9A16D02</DocumentId>

    <DocumentDate>2012-06-06T15:59:46-05:00</DocumentDate>

- <Entities>

- <Entity>

    <Authority>AP Person</Authority>

    <AuthorityVersion>3420</AuthorityVersion>
                                                                    Subset of tags returned for
    <Name>LeBron James</Name>                                       article about NBA Finals
                                                                    game, in Simple XML format
    <Id>http://cv.ap.org/id/7c05129d1a1741af8bcc326c9459545c</Id>

- <Properties>

    <PersonType>Professional Athlete</PersonType>

    <PersonType>Sports Figure</PersonType>

    <Team>Miami Heat</Team>

    </Properties>

    </Entity>

-
- <Entity>

    <Authority>AP Organization</Authority>

    <AuthorityVersion>3412</AuthorityVersion>

    <Name>Miami Heat</Name>

    <Id>http://cv.ap.org/id/8a85be975bf94cd18836b6eb5a1f6391</Id>
                                                                    Subset of tags returned for
    </Entity>                                                       article about NBA Finals
                                                                    game, in Simple XML
- <Entity>                                                          format, cont.
    <Authority>AP Organization</Authority>

    <AuthorityVersion>3412</AuthorityVersion>

    <Name>NBA Eastern Conference</Name>

    <Id>http://cv.ap.org/id/4a653a1806bc49518c5e667120a283e3</Id>

    </Entity>

- </Entities>

-
<Subjects>

- <Subject>

 <Authority>AP Subject</Authority>              Subset of tags returned for
                                                article about NBA Finals
 <AuthorityVersion>3415</AuthorityVersion>      game, in Simple XML
                                                format, cont.
 <Name>NBA basketball</Name>

 <Id>http://cv.ap.org/id/6c01c3e08c8010048288a13d9888b73e</Id>

 </Subject>

- <Subject>

 <Authority>AP Subject</Authority>

 <AuthorityVersion>3415</AuthorityVersion>

 <Name>NBA Finals</Name>

 <Id>http://cv.ap.org/id/fd862c8beea14e189c9a5617cf5c379c</Id>

 </Subject>
Thank You!
http://developer.ap.org/AP_Metadata_Services

apmetadata@ap.org

More Related Content

Similar to AP Metadata Services, SemTechBiz 2012

Hypermedia for Machine APIs
Hypermedia for Machine APIsHypermedia for Machine APIs
Hypermedia for Machine APIsMichael Koster
 
Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010Chris McNulty
 
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakeseccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data LakesLinked Enterprise Date Services
 
Architecting solutions connecting to lob applications
Architecting solutions connecting to lob applicationsArchitecting solutions connecting to lob applications
Architecting solutions connecting to lob applicationsmicrosoftasap
 
Architecting solutions connecting to lob applications
Architecting solutions connecting to lob applicationsArchitecting solutions connecting to lob applications
Architecting solutions connecting to lob applicationsmicrosoftasap
 
Mesh-ing around with Streams across the Enterprise | Phil Scanlon, Solace
Mesh-ing around with Streams across the Enterprise | Phil Scanlon, SolaceMesh-ing around with Streams across the Enterprise | Phil Scanlon, Solace
Mesh-ing around with Streams across the Enterprise | Phil Scanlon, SolaceHostedbyConfluent
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!Pedro Azevedo
 
Datamatics Digital Publishing & Retail Services - Overview [EN]
Datamatics Digital Publishing & Retail Services - Overview [EN]Datamatics Digital Publishing & Retail Services - Overview [EN]
Datamatics Digital Publishing & Retail Services - Overview [EN]Datamatics Global Services GmbH
 
Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...
Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...
Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...InSync2011
 
MetadataTheory: Metadata Tools (7th of 10)
MetadataTheory: Metadata Tools (7th of 10)MetadataTheory: Metadata Tools (7th of 10)
MetadataTheory: Metadata Tools (7th of 10)Nikos Palavitsinis, PhD
 
Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012John Domingue
 
Pingar App for SharePoint
Pingar App for SharePointPingar App for SharePoint
Pingar App for SharePointChris Riley ☁
 
Semantic Web Media Summit - Keynote
Semantic Web Media Summit - KeynoteSemantic Web Media Summit - Keynote
Semantic Web Media Summit - Keynotemike dunn
 
Mike Dunn Presentation
Mike Dunn PresentationMike Dunn Presentation
Mike Dunn PresentationMediabistro
 
RDA Web service discoverability workshop
RDA Web service discoverability workshopRDA Web service discoverability workshop
RDA Web service discoverability workshopNiall Beard
 

Similar to AP Metadata Services, SemTechBiz 2012 (20)

Hypermedia for Machine APIs
Hypermedia for Machine APIsHypermedia for Machine APIs
Hypermedia for Machine APIs
 
SharePoint 2010: ECM-ready?
SharePoint 2010: ECM-ready?SharePoint 2010: ECM-ready?
SharePoint 2010: ECM-ready?
 
Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010
 
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakeseccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
 
Architecting solutions connecting to lob applications
Architecting solutions connecting to lob applicationsArchitecting solutions connecting to lob applications
Architecting solutions connecting to lob applications
 
Architecting solutions connecting to lob applications
Architecting solutions connecting to lob applicationsArchitecting solutions connecting to lob applications
Architecting solutions connecting to lob applications
 
Mesh-ing around with Streams across the Enterprise | Phil Scanlon, Solace
Mesh-ing around with Streams across the Enterprise | Phil Scanlon, SolaceMesh-ing around with Streams across the Enterprise | Phil Scanlon, Solace
Mesh-ing around with Streams across the Enterprise | Phil Scanlon, Solace
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!
 
Datamatics Digital Publishing & Retail Services - Overview [EN]
Datamatics Digital Publishing & Retail Services - Overview [EN]Datamatics Digital Publishing & Retail Services - Overview [EN]
Datamatics Digital Publishing & Retail Services - Overview [EN]
 
Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...
Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...
Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...
 
Ims and qti assessment
Ims and qti assessmentIms and qti assessment
Ims and qti assessment
 
MetadataTheory: Metadata Tools (7th of 10)
MetadataTheory: Metadata Tools (7th of 10)MetadataTheory: Metadata Tools (7th of 10)
MetadataTheory: Metadata Tools (7th of 10)
 
Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012Linking Services and Linked Data: Keynote for AIMSA 2012
Linking Services and Linked Data: Keynote for AIMSA 2012
 
Pingar App for SharePoint
Pingar App for SharePointPingar App for SharePoint
Pingar App for SharePoint
 
MECBOT
MECBOTMECBOT
MECBOT
 
Semantic Web Media Summit - Keynote
Semantic Web Media Summit - KeynoteSemantic Web Media Summit - Keynote
Semantic Web Media Summit - Keynote
 
Mike Dunn Presentation
Mike Dunn PresentationMike Dunn Presentation
Mike Dunn Presentation
 
Marcalyc: XML JATS Markup System
Marcalyc: XML JATS Markup SystemMarcalyc: XML JATS Markup System
Marcalyc: XML JATS Markup System
 
RDA Web service discoverability workshop
RDA Web service discoverability workshopRDA Web service discoverability workshop
RDA Web service discoverability workshop
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

AP Metadata Services, SemTechBiz 2012

  • 1. AP Metadata Services Amy Sweigert SemTechBiz June 6, 2012
  • 2. About the Associated Press – AP is a not-for-profit news cooperative, owned by US newspaper and broadcast members, founded in 1846 – AP news content is seen by half the world’s population on any given day – We process and deliver 100k+ content items daily – AP, member and third-party content – Text, photos, audio, multimedia interactives, and broadcast and online quality video – Primarily B2B
  • 3. Evolution of AP Metadata Services 2011 • RDF modeling • API development • Pilot offering 2008 • Automated tagging of Companies, Organizations, Geography, Events starts 2012 • AP Metadata 2009-2010 Services Launch • Scope and depth of 2006 coverage increases • Initial taxonomy • Platform stabilized and rule 2007 development • Automated tagging of starts Subjects, People, Compani es starts
  • 4. Introducing AP Metadata Services – Semantic Web services to drive the next generation of news delivery and consumption: – AP News Taxonomy – AP Tagging Service – B2B service with continuing investment and human curation – Ongoing and frequent updates to tagging rules, entities, concepts and their semantic relationships – Designed to meet AP’s exacting needs for its own content
  • 5. What Does Rich Metadata Do for Publishers? – Connect customers with more relevant content through: – Improved search and discovery – Automated aggregation, syndication and distribution of related content – Richer and more relevant content products and services – Reduced time to market for new products and services – Reduces editorial workload, creates efficiencies – Content interoperability
  • 6. • Site delivered ~5,000 articles and ~20,000 photos over 2 months • Routing and display of content by team and conference is automated • Editorial resources are focused on curating only the most important parts of the site • Enables user experience that would not be possible without automated, standard metadata
  • 7. The AP News Taxonomy – Breadth and depth to support news and current events – Defines rich semantic metadata specific to news – Generic subjects and hierarchy – Named entities – Relationships, synonyms, additional entity data – Delivers automated notifications of taxonomy changes – New terms, deprecated terms, name changes, etc.
  • 8. The AP Tagging Service – Software as a Service – Leverages AP investment and expertise – Tags concepts; more than entity extraction – Automated tagging tied to AP News Taxonomy ensures more consistent, comprehensive results
  • 9. Top Level Subject Areas: • Arts and Entertainment Coverage • Business • Demographic groups – 4200 Subjects • Environment and Nature • Events – 2100 Geographic locations • General News • Government and Politics – 1200 Organizations • Health • Lifestyle – 91,000 People • Living Things • Media – 41,000 Publicly-traded • Science Companies • Social Affairs • Sports • Technology – Supports English language content
  • 10. A Foundation of Semantic Web Standards – URIs for all entities and topics – Taxonomy modeled in RDF – SKOS Ontology – Supplemented with other ontologies (Dublin Core, DBPedia, FOAF, GeoNames, etc.) – Some AP-specific properties – Taxonomy and Tagging Service accessible via RESTful APIs – Using a SPARQL end-point internally to provide views of the taxonomy
  • 11. Supported Formats AP Tagging Service AP Taxonomy – Input formats – Taxonomy Output Format – Plain Text – RDF/XML – Simple XML: XML encoded content – RDF/Turtle  e.g. XHTML, NITF, NewsML, NewsML-G2 – RDF/JSON – Output formats – NewsML-G2 – RDF/XML – Taxonomy Change Log – RDF/JSON Output formats – RDF/Turtle – XML – Simple XML – CSV – NewsML-G2
  • 12. Metadata Services in AP’s Content Lifecycle Content Repository 3rd party content Products defined based on rich Distribution methods: metadata Internet syndication Web portals APIs AP Editorial Content (Input) AP Tagging Service applies standard values Metadata Services and related data • Taxonomy fed to editorial tools • Automated tagging applies subject and entity metadata from taxonomy • Rich relationships between Standard AP subjects, entities News • Metadata used to deliver targeted Taxonomy feeds, auto-publish and values improve search and browse experience
  • 13. RDF/XML representation of Scott Walker, Governor of Wisconsin <skos:Concept rdf:about="http://cv.ap.org/id/11AD96CF0A5149C5B3909F5BE9A5494A"> <skos:prefLabel xml:lang="en">Scott Walker</skos:prefLabel> <ap:associatedState rdf:resource="http://cv.ap.org/id/1BC1BC3082C81004896CDF092526B43E" /> <ap:entryTerm xml:lang="en">Scott K. Walker</ap:entryTerm> <ap:entryTerm xml:lang="en">Scott Kevin Walker</ap:entryTerm> <ap:isPlaceholder rdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">false</ap:isPlaceholder> <dbpedia-owl:party rdf:resource="http://cv.ap.org/id/BF6E2E80760D10048F8AE6E7A0F4673E" /> <dbprop:birthdate rdf:datatype="http://www.w3.org/2001/XMLSchema#date">1967-11-02</dbprop:birthdate> <dcterms:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009-11-01T10:23:29- 05:00</dcterms:created> <dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-02-26T10:14:13- 05:00</dcterms:modified> <rdf:type rdf:resource="http://cv.ap.org/c/Politician" /> <skos:altLabel xml:lang="en">Scott K. Walker</skos:altLabel> <skos:altLabel xml:lang="en">Scott Kevin Walker</skos:altLabel> <skos:broader rdf:resource="http://cv.ap.org/id/C9D7FA107E4E1004847ADF092526B43E" /> <skos:definition xml:lang="en">45th Governor of Wisconsin. Milwaukee, Wisconsin County Executive. US Republican member of the Wisconsin State Assembly.</skos:definition> <skos:inScheme rdf:resource="http://cv.ap.org/a#person" /> </skos:Concept>
  • 14. - <ClassificationResults> <DocumentId>C495D353258440B487279767F9A16D02</DocumentId> <DocumentDate>2012-06-06T15:59:46-05:00</DocumentDate> - <Entities> - <Entity> <Authority>AP Person</Authority> <AuthorityVersion>3420</AuthorityVersion> Subset of tags returned for <Name>LeBron James</Name> article about NBA Finals game, in Simple XML format <Id>http://cv.ap.org/id/7c05129d1a1741af8bcc326c9459545c</Id> - <Properties> <PersonType>Professional Athlete</PersonType> <PersonType>Sports Figure</PersonType> <Team>Miami Heat</Team> </Properties> </Entity> -
  • 15. - <Entity> <Authority>AP Organization</Authority> <AuthorityVersion>3412</AuthorityVersion> <Name>Miami Heat</Name> <Id>http://cv.ap.org/id/8a85be975bf94cd18836b6eb5a1f6391</Id> Subset of tags returned for </Entity> article about NBA Finals game, in Simple XML - <Entity> format, cont. <Authority>AP Organization</Authority> <AuthorityVersion>3412</AuthorityVersion> <Name>NBA Eastern Conference</Name> <Id>http://cv.ap.org/id/4a653a1806bc49518c5e667120a283e3</Id> </Entity> - </Entities> -
  • 16. <Subjects> - <Subject> <Authority>AP Subject</Authority> Subset of tags returned for article about NBA Finals <AuthorityVersion>3415</AuthorityVersion> game, in Simple XML format, cont. <Name>NBA basketball</Name> <Id>http://cv.ap.org/id/6c01c3e08c8010048288a13d9888b73e</Id> </Subject> - <Subject> <Authority>AP Subject</Authority> <AuthorityVersion>3415</AuthorityVersion> <Name>NBA Finals</Name> <Id>http://cv.ap.org/id/fd862c8beea14e189c9a5617cf5c379c</Id> </Subject>

Editor's Notes

  1. Historically AP content had minimal descriptive metadata. Starting in 2005, AP began working on applying standard metadata across all content in order to improve access and enable new product development. The system needed to provide high accuracy and a high degree of control; scale to handle large volumes of content of different types; not slow down editorial’s ability to get the news out quickly. We built a standard set of taxonomies, and a rules-based automated classification system. As the service evolved, members asked us about the possibility of using AP’s systems for their own content. By 2011, the platform was mature enough, and we started the work to make our internal service available externally, using Semantic Web standards.
  2. Targeted search and granular products: Allows customers to follow a company, person, or topic over timeEnables us to deliver specific products, e.g. Green technology, the Royal Wedding, 2012 Olympics hometown athletes, etc.Aggregate AP and member content
  3. Really a lightweight ontology – we model hierarchy, synonyms, relationships between entities, additional entity properties.
  4. Rules-based system – each term in the taxonomy has an associated rule.People and Company tagging is based on mention, with significant disambiguation rules to ensure accuracy.Subject, Geo and Organization tagging is based on more complex rules, and strives for “aboutness.”
  5. Entity coverage continues to grow.Results in ~1.7 million triples
  6. High-level view of how services are integrated into AP’s pipeline. Taxonomy data is exposed in editorial tools, and for web site curation.Automated tagging happens downstream of editorial.Taxonomy data (i.e. synonyms and other) is integrated into search and browse within our portals. Because the services are available as APIs, it’s easy for publishers to integrate into their own workflows in whatever way makes the most sense – APIs offer flexibilty.