Open Access Publishing  on the  Semantic Web
the Public Library of Science (PLoS) <ul><li>non-profit, Open Access STM ( scientific, technical and medical)   publisher ...
why Open Access? <ul><li>taxpayers pay for research but print and online journals are available only to subscribers </li><...
PLoS Journals <ul><li>publish seven peer-reviewed journals </li></ul><ul><ul><li>PLoS Biology, PLoS Medicine (flagship) </...
state of STM publishing platforms <ul><li>publishing platforms are proprietary or hosted by a third party (PLoS) </li></ul...
big ideas for transforming journal publishing <ul><li>open source publishing platform </li></ul><ul><li>semantic repositor...
… embarked down the path <ul><li>Topaz non-profit development team funded by the Moore Foundation </li></ul><ul><li>intend...
Ambra / Topaz journal publishing platform Apache Ambra Fedora + Mulgara RDF Store Topaz OTM Topaz Files CAS Fedora is used...
under the hood of Topaz (1) <ul><li>an Object-Triples-Mapping (OTM) library  </li></ul><ul><ul><li>modeled after Hibernate...
why Objects to Triples Mapping (OTM)? don’t walk a tree to retrieve objects (slow) instead, retrieve collections of object...
under the hood of Topaz (2) <ul><li>defines Java classes maps the classes into RDF  </li></ul><ul><ul><li>Ambra defines mo...
Ambra <ul><li>first application built on Topaz </li></ul><ul><li>journal publishing platform with “Web 2.0” features </li>...
Ambra features Ambra article ingestion search annotations discussions security mgmt ratings user profile/ preferences atom...
Ambra <-> Mulgara interaction <ul><li>Ambra inserts data into Mulgara in the following cases </li></ul><ul><ul><li>article...
article ingest (1) <ul><li>Ambra expects an article package that contains an XML file in NLM-DTD format ( http://dtd.nlm.n...
article ingest (2) <ul><li>Ambra transforms the XML into an OTM object that Topaz pushes into Mulgara. </li></ul><info:doi...
Ambra – future development <ul><li>article level metrics </li></ul><ul><ul><li>impact of the article above and beyond cita...
semantic enhancement of content <ul><li>add value to the content of a research article </li></ul><ul><li>highlight text fo...
©  by  David Shotton
 
system requirements <ul><li>minimum - single server (Linux) with 8 Gb RAM  </li></ul><ul><li>… better (based on PLoS journ...
resources <ul><li>Ambra website  </li></ul><ul><li>http:// www.ambraproject.org / </li></ul><ul><li>Ambra mailing lists: <...
Upcoming SlideShare
Loading in …5
×

Open Access Publishing on the Semantic Web

3,142 views

Published on

Slideshow given at the San Francisco Meetup in August, 2009. A review of PLoS, the Ambra Open Source publishing platform, the Mulgara RDF triple store and future feature.

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,142
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
47
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Open Access Publishing on the Semantic Web

  1. 1. Open Access Publishing on the Semantic Web
  2. 2. the Public Library of Science (PLoS) <ul><li>non-profit, Open Access STM ( scientific, technical and medical) publisher focused on life-sciences </li></ul><ul><li>mission: open the doors to the world's library of scientific knowledge by giving any scientist, physician, patient, or student - anywhere in the world - unlimited access to the latest scientific research </li></ul><ul><li>all research articles are published under the Creative Commons Attribution License </li></ul>
  3. 3. why Open Access? <ul><li>taxpayers pay for research but print and online journals are available only to subscribers </li></ul><ul><li>traditional publishers own the copyright to all the researchers published materials </li></ul><ul><li>licensing is complex and restrictive </li></ul><ul><li>libraries are struggling to provide access to all required journals because of subscription fees </li></ul>
  4. 4. PLoS Journals <ul><li>publish seven peer-reviewed journals </li></ul><ul><ul><li>PLoS Biology, PLoS Medicine (flagship) </li></ul></ul><ul><ul><li>PLoS Pathogens, PLoS Computational Biology, PLoS NTDs, PLoS Genetics (community) </li></ul></ul><ul><ul><li>PLoS ONE (disruptive force) </li></ul></ul><ul><li>largest journal is PLoS ONE </li></ul><ul><ul><li>high volume, very efficient workflow </li></ul></ul><ul><ul><li>~6500 articles as of July 24, 2009 </li></ul></ul><ul><ul><li>publish >400 articles a month (and growing) </li></ul></ul><ul><li>using semantic platform since December ‘06 </li></ul><ul><ul><li>PLoS ONE first journal on new platform </li></ul></ul><ul><ul><li>all journals migrated to platform as of May 12, ‘09 </li></ul></ul><ul><ul><li>~13,000 articles published on semantic platform </li></ul></ul>
  5. 5. state of STM publishing platforms <ul><li>publishing platforms are proprietary or hosted by a third party (PLoS) </li></ul><ul><li>most publishers treat online journals as digital repositories for research articles </li></ul><ul><ul><li>“ end of the road” for research articles </li></ul></ul><ul><ul><li>online backseat to print journals ($$) </li></ul></ul><ul><li>the internet changes everything </li></ul><ul><ul><li>cheap and fast </li></ul></ul><ul><ul><li>global </li></ul></ul><ul><ul><li>quick search and retrieval </li></ul></ul><ul><li>open source solutions exist today (e.g. Open Journal Systems/Drupal, Rhaptos/Zope) but limited features in 2006 </li></ul>
  6. 6. big ideas for transforming journal publishing <ul><li>open source publishing platform </li></ul><ul><li>semantic repository to mine the unknown </li></ul><ul><li>(semantic) relationships in research articles </li></ul><ul><li>a “Web 2.0” user interface </li></ul><ul><li>provide features for post-publication annotation </li></ul><ul><li>and discussion allowing for a “living” document </li></ul><ul><ul><li>notes inline with the content </li></ul></ul><ul><ul><li>comments and discussions </li></ul></ul><ul><ul><li>ratings </li></ul></ul>© by wales.nhs.uk
  7. 7. … embarked down the path <ul><li>Topaz non-profit development team funded by the Moore Foundation </li></ul><ul><li>intended as a journal publishing system for many types of publishing </li></ul><ul><ul><li>scholarly communications / Open Access </li></ul></ul><ul><ul><li>eScience / eScholarship </li></ul></ul><ul><ul><li>education </li></ul></ul><ul><ul><li>libraries / museums </li></ul></ul><ul><li>semantic publishing platform based on Fedora </li></ul><ul><li>institutional repository and Mulgara triple-store </li></ul><ul><li>Topaz (back-end glue) </li></ul><ul><ul><li>Object to Triple Mapping (OTM) </li></ul></ul><ul><ul><li>Object Query Language (OQL) </li></ul></ul><ul><ul><ul><li>© by Michael James </li></ul></ul></ul><ul><li>Ambra journal publishing system (front-end user interface) </li></ul>
  8. 8. Ambra / Topaz journal publishing platform Apache Ambra Fedora + Mulgara RDF Store Topaz OTM Topaz Files CAS Fedora is used to store digital objects (XML, PDF, images, etc.) article metadata, annotations (annotea) and user information (foaf) is stored as triples in Mulgara Topaz is used for storage and retrieval of the digital objects and triple stores through the Objects to Triples Mapping (OTM) Ambra (user interface) CAS single sign-on service Apache webhead
  9. 9. under the hood of Topaz (1) <ul><li>an Object-Triples-Mapping (OTM) library </li></ul><ul><ul><li>modeled after Hibernate Object-Relational Mapping (ORM) </li></ul></ul><ul><ul><li>except the database is made of RDF triples instead of a relational database. </li></ul></ul><ul><li>provides a query language based on objects (OQL) </li></ul><ul><ul><li>an &quot;object&quot; based query syntax </li></ul></ul><ul><ul><li>makes life a bit easier for developers </li></ul></ul><ul><li>OQL example </li></ul><ul><ul><li>select all articles with a given title: </li></ul></ul><ul><ul><li>select a.id, a.author from Article a where a.title = 'Hello Dolly'; </li></ul></ul>
  10. 10. why Objects to Triples Mapping (OTM)? don’t walk a tree to retrieve objects (slow) instead, retrieve collections of objects with one query (fast) as an online-only publisher, we need fast
  11. 11. under the hood of Topaz (2) <ul><li>defines Java classes maps the classes into RDF </li></ul><ul><ul><li>Ambra defines models which are mapped into sets of triples in various graphs </li></ul></ul><ul><ul><li>such as “article”, “annotation”, etc. models defined in Ambra </li></ul></ul><ul><li>provides support for storing files to a separate blob store (Fedora and/or Akubra) </li></ul><ul><li>provides storage and retrieval of files and triples in a single transaction </li></ul><ul><ul><li>necessary to render an article with associated metadata (e.g. notes, ratings, etc.) </li></ul></ul>
  12. 12. Ambra <ul><li>first application built on Topaz </li></ul><ul><li>journal publishing platform with “Web 2.0” features </li></ul><ul><ul><li>uses the FreeMarker templating engine to display the content received from Topaz service. </li></ul></ul><ul><ul><li>uses the DOJO JavaScript toolkit to handle complex user interactions like annotations, ratings, etc. </li></ul></ul><ul><ul><li>provides social networking features (in-line notes, comments, trackbacks) </li></ul></ul><ul><ul><li>turns a reader of scientific articles into a knowledge contributor, knowledge that can be used by other users </li></ul></ul><ul><ul><li>living document! </li></ul></ul>
  13. 13. Ambra features Ambra article ingestion search annotations discussions security mgmt ratings user profile/ preferences atom feeds multiple journals trackbacks SignOn Server article publication CrossRef registration DOI resolver Cache for web content and digital objects CAS single sign-on
  14. 14. Ambra <-> Mulgara interaction <ul><li>Ambra inserts data into Mulgara in the following cases </li></ul><ul><ul><li>article Ingest </li></ul></ul><ul><ul><li>post-publication annotations (comment, note, rating, trackback) </li></ul></ul><ul><ul><li>admin actions (volume and issue collections, annotation moderation, etc.) </li></ul></ul><ul><ul><li>user actions (create or edit a user profile) </li></ul></ul><ul><li>Mulgara uses OTM to pull data from Fedora and Mulgara </li></ul><ul><ul><li>Ambra transforms XML to HTML </li></ul></ul><ul><ul><li>displays notes, comments, ratings, etc. </li></ul></ul>
  15. 15. article ingest (1) <ul><li>Ambra expects an article package that contains an XML file in NLM-DTD format ( http://dtd.nlm.nih.gov/publishing/ ) </li></ul><?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <!DOCTYPE article PUBLIC &quot;-//NLM//DTD Journal Publishing DTD v2.0 20040830//EN&quot; &quot;http://dtd.nlm.nih.gov/publishing/2.0/journalpublishing.dtd&quot;> <article xmlns:xlink=&quot;http://www.w3.org/1999/xlink&quot; xmlns:mml=&quot;http://www.w3.org/1998/Math/MathML&quot; article-type=&quot;research-article&quot; dtd-version=&quot;2.0&quot; xml:lang=&quot;EN&quot;> <front> <journal-meta> <journal-id journal-id-type=&quot;nlm-ta&quot;>PLoS ONE</journal-id> <journal-id journal-id-type=&quot;publisher-id&quot;>plos</journal-id> <journal-id journal-id-type=&quot;pmc&quot;>plosone</journal-id> <journal-title>PLoS ONE</journal-title> <issn pub-type=&quot;epub&quot;>1932-6203</issn>...
  16. 16. article ingest (2) <ul><li>Ambra transforms the XML into an OTM object that Topaz pushes into Mulgara. </li></ul><info:doi/10.1371/journal.pone.0000000> <rdf:type> <http://rdf.plos.org/RDF/articleType/Research%20Article> <info:doi/10.1371/journal.pone.0000000> <rdf:type> <http://rdf.plos.org/RDF/articleType/research-article> <info:doi/10.1371/journal.pone.0000000> <rdf:type> <topaz:Article> <info:doi/10.1371/journal.pone.0000000> <rdf:type> <topaz:ObjectInfo> <info:doi/10.1371/journal.pone.0000000> <http://prismstandard.org/namespaces/1.2/basic/eIssn> '1932-6203' <info:doi/10.1371/journal.pone.0000000> <dc:creator> 'Bonnie Real' <info:doi/10.1371/journal.pone.0000000> <dc:creator> 'Richard Cave' <info:doi/10.1371/journal.pone.0000000> <dc:creator>...
  17. 17. Ambra – future development <ul><li>article level metrics </li></ul><ul><ul><li>impact of the article above and beyond citations </li></ul></ul><ul><li>RDFa </li></ul><ul><li>automatic article relationships </li></ul><ul><li>semantic enhancement </li></ul><ul><li>REST-based API </li></ul><ul><li>ingest and publish many types of content / data </li></ul><ul><ul><li>structured and unstructured </li></ul></ul><ul><li>tags </li></ul><ul><li>enhance search and browse </li></ul><ul><li>direct access to Mulgara’s triple store </li></ul><ul><ul><li>sparql endpoint, RDFa </li></ul></ul>
  18. 18. semantic enhancement of content <ul><li>add value to the content of a research article </li></ul><ul><li>highlight text for selected terms </li></ul><ul><ul><li>protein names </li></ul></ul><ul><ul><li>genus / species </li></ul></ul><ul><ul><li>disease </li></ul></ul><ul><ul><li>location / habitat </li></ul></ul><ul><ul><li>etc. </li></ul></ul><ul><li>provide links to external sources to create new user interactions </li></ul>
  19. 19. © by David Shotton
  20. 21. system requirements <ul><li>minimum - single server (Linux) with 8 Gb RAM </li></ul><ul><li>… better (based on PLoS journals): </li></ul><ul><ul><li>1 server for Fedora and Mulgara with 8Gb RAM </li></ul></ul><ul><ul><li>1 server for Ambra and Topaz with 8Gb RAM </li></ul></ul><ul><ul><li>1 server for Apache and CAS with 4Gb RAM </li></ul></ul><ul><li>PLoS journals on Ambra / Topaz </li></ul><ul><ul><li>800k visits / month </li></ul></ul><ul><ul><li>~2 million pageviews / month </li></ul></ul><ul><li>Amazon AMI to test Ambra / Topaz available </li></ul>
  21. 22. resources <ul><li>Ambra website </li></ul><ul><li>http:// www.ambraproject.org / </li></ul><ul><li>Ambra mailing lists: </li></ul><ul><li>http:// lists.topazproject.org/mailman/listinfo/ambra -users </li></ul><ul><li>http://lists.topazproject.org/mailman/listinfo/ambra-dev </li></ul><ul><li>Topaz website </li></ul><ul><li>http:// www.topazproject.org / </li></ul><ul><li>Fedora Commons website </li></ul><ul><li>http://fedoracommons.org/ </li></ul><ul><li>Richard Cave – rcave at plos.org </li></ul>

×