The document discusses the challenges of extracting value from vast amounts of content created quickly by The Press Association. It proposes using a content and metadata pattern to simplify abstracting the different types of content, like text, images, video and data. Standardizing identifiers for entities and using "wild" data could help increase the sustainability of the Semantic Web by furthering standards development within communities like the W3C and IPTC.
ABSTRACT: To address the demand for 'fast, fair and accurate' content and data, the Press Association committed to a large-scale project, which put semantic web technologies at its core. By the adoption of established standards such as IPTC's G2, RDF, a preference for existing ontologies such as geonames, FOAF, Dublin Core, etc. and the use of technologies such as cloud-based computing, open and RESTful approaches, XML-driven dBs and triple stores, the Press Association addresses real business needs such as, the managing of enormous, disparate and heterogeneous sets of data, reducing silos and streamlining content and data interchange within the organisation and between PA and its customers. The ontologies PA uses, driven by a business need, must be able to deal with the complexity of knowledge in the real world without recreating the difficulties of the earlier Cyc project. Where previous Linked Data successes have been limited to a single domain such as the pharmaceutical industry, the encyclopedic nature of news datasets and ontologies make it likely they will have an impact beyond the industry. Due to its centrality in the wider news industry, the impact of a national wire service committing to the use of semantic technologies should not be underestimated.
PA sends out approximately million news stories sent out each year. Our photographers take around 350,000 photos per year. Archive of 50 million assets (text, image and video). Planned triple store scoped for 6 billion triples of RDF statements.
Our clients being interpreted as journalists as well as wire customers. Talk about the importance of integrating new approaches without disturbing existing editorial workflows. I.e. editors are not employed to be metadata entry. Get them to tag and exploit an ontological approach to get the added value.
Content and Metadata should be separate Otherwise you muddle up your content and metadata, neither of them is generic The best way to do this is to focus your: Content stores on storing content Metadata stores on storing metadata You may have more than one content store, but only one metadata store The best type of metadata store is a semantic one Benefits: No lock in to content or data products Future proofed ability to adapt content model to different requirements Metadata focussed on how you want to structure it Your separate metadata abstracts your content, and links it together, wherever it is stored Abstracting out your metadata means... You can reference content wherever it is Reorganise and package content semantically You can change content products and are not locked in – your content value is stored separately Benefits: Lower risk Manage migration more gracefully Keep systems that provide value but are inflexible Deliver value earlier, rather than waiting to migrate
This slides address the point from the Industry Track call for presentations, (3) describe the innovative plans for their products and services that lead to the adoption of Semantic Web standards. We are leading the W3C community group for ‘Semantic News’. We are involved with IPTC semantic web working group. We are hosting data.press.net. Initially a public repository for our ontologies, but will also host instance data about newsworthy people, places, organisations and events. We are working with government and non-government activists around open data initiatives.
Helps news providers to unambiguously refer to news worthy entities: people place and organisations. Ontology are just instructions that describe the world to computers. It creates
We utilise ‘wild’ data sets such as geonames, because we’ve design the ontology to be able to bring these types of data reliably and predictably.
Newspapers don ’t compete on paper size or structural metadata standards (e.g. IPTC newsml). Ontologies are a standard for representing the ‘news worthy’ world. It facilitates exchange. Spend money on content not the transfer of it.
These slides will demonstrate the ontologies used for representing news assets and real world entities.
An example of what can be done. If you want to do more for less... Make the most of all the content, data and other assets you have No need to buy an analytics or data mining product to find out what happened to your stuff after the event You can reduce costs in managing content overall You can create more high value roles in creating content