Introduction to  the Topaz OTM framework  and  the Ambra publishing platform Richard Cave – rcave at plos.org Russell Uman...
the Public Library of Science (PLoS) <ul><li>non-profit, Open Access publisher  </li></ul><ul><li>mission:  open the doors...
journey down the yellow Topaz road… <ul><li>originally intended to be an end-to-end online publishing system </li></ul><ul...
big ideas for transforming journal publishing <ul><li>open source! </li></ul><ul><li>provide features for post-publication...
freedom of choice <ul><li>open source “publishing” platforms available in 2006 </li></ul><ul><ul><li>- Rhaptos / Connexion...
behind door #1 – Fedora + Kowari (Mulgara) <ul><li>  </li></ul><ul><li>© by  That Guy Called Ben  </li></ul>
… at the end of the road <ul><li>semantic publishing platform based on Fedora and Mulgara </li></ul><ul><li>Topaz (back-en...
Ambra / Topaz journal publishing platform Apache Ambra Fedora + Mulgara RDF Store Topaz OTM Topaz Files CAS Fedora is used...
under the hood of Topaz (1) <ul><li>an Object-Triples-Mapping (OTM) library  </li></ul><ul><ul><li>modeled after Hibernate...
under the hood of Topaz (2) <ul><li>defines Java classes maps the classes into RDF  </li></ul><ul><ul><li>Ambra defines mo...
why Objects to Triples Mapping (OTM)? allows for retrieving collections of objects (fast) with one query instead of a sing...
Ambra <ul><li>first application built on Topaz application framework </li></ul><ul><li>“ Web 2.0” features </li></ul><ul><...
Ambra features Ambra article ingestion search annotations discussions security mgmt ratings user profile/ preferences atom...
Ambra – what it’s not <ul><li>cms </li></ul><ul><ul><li>only NLM DTD XML </li></ul></ul><ul><li>workflow engine </li></ul>...
Ambra – new development <ul><li>article level metrics </li></ul><ul><li>article usage data (COUNTER) </li></ul><ul><li>tag...
article level metrics <ul><li>“ impact” of an article outside of citations </li></ul><ul><ul><li>notes, comments, ‘star ra...
 
article usage data (COUNTER) <ul><li>in mid-2009, we will add usage data to every article  </li></ul><ul><ul><li>HTML Page...
 
semantic enhancement of content <ul><li>add value to the content of a research article </li></ul><ul><li>highlight text fo...
©  by  David  Shotton
future development…? ©  by  David  Shotton <ul><li>ingest and publish many types of content / data </li></ul><ul><ul><li>s...
future development…? ©  by  David  Shotton
system requirements <ul><li>minimum - single server (Linux) with 8 Gb RAM  </li></ul><ul><li>… better (based on PLoS journ...
where are they now? <ul><li>funding for Topaz project has concluded but it’s still an active project  </li></ul><ul><li>(n...
interested? <ul><li>Ambra project site launched </li></ul><ul><ul><li>www.ambraproject.org </li></ul></ul><ul><ul><li>docu...
resources <ul><li>Topaz website </li></ul><ul><li>http://www.topazproject.org/   </li></ul><ul><li>Topaz manual </li></ul>...
Upcoming SlideShare
Loading in …5
×

Introduction to the Topaz OTM framework and the Ambra publishing system

3,736 views

Published on

This presentation is an introduction to Topaz, an Open Source content modeling and storage framework that uses the Fedora Service Framework and Mulgara semantic technology as the core engine, and Ambra, a publishing application built on the Topaz framework.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,736
On SlideShare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
45
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Introduction to the Topaz OTM framework and the Ambra publishing system

  1. 1. Introduction to the Topaz OTM framework and the Ambra publishing platform Richard Cave – rcave at plos.org Russell Uman – ruman at plos.org
  2. 2. the Public Library of Science (PLoS) <ul><li>non-profit, Open Access publisher </li></ul><ul><li>mission: open the doors to the world's library of scientific knowledge by giving any scientist, physician, patient, or student - anywhere in the world - unlimited access to the latest scientific research </li></ul><ul><li>currently publish seven journals </li></ul><ul><ul><li>PLoS Biology, PLoS Medicine, PLoS Pathogens, PLoS Computational Biology, PLoS NTDs, PLoS Genetics, PLoS ONE </li></ul></ul><ul><li>largest journal is PLoS ONE </li></ul><ul><ul><li>5532 articles as of May 15, 2009 </li></ul></ul><ul><li>using Ambra / Topaz since December ‘06 </li></ul><ul><ul><li>PLoS ONE first journal on Ambra / Topaz </li></ul></ul><ul><ul><li>all journals migrated to Ambra / Topaz as of May 12, ‘09 </li></ul></ul><ul><ul><li>~13,000 articles published on Ambra / Topaz platform </li></ul></ul>
  3. 3. journey down the yellow Topaz road… <ul><li>originally intended to be an end-to-end online publishing system </li></ul><ul><ul><li>peer review system </li></ul></ul><ul><ul><li>composition system </li></ul></ul><ul><ul><li>journal publishing system </li></ul></ul><ul><li>open source platform for many types of publishing </li></ul><ul><ul><li>scholarly communications / Open Access </li></ul></ul><ul><ul><li>eScience / eScholarship </li></ul></ul><ul><ul><li>education </li></ul></ul><ul><ul><li>libraries / museums </li></ul></ul>
  4. 4. big ideas for transforming journal publishing <ul><li>open source! </li></ul><ul><li>provide features for post-publication annotation and discussion allowing for a “living” document </li></ul><ul><li>mine the unknown (semantic) relationships in research articles </li></ul><ul><li>journal publishing system as first application </li></ul><ul><ul><li>publish a high volume of research articles </li></ul></ul><ul><ul><li>stability </li></ul></ul><ul><ul><li>high performance </li></ul></ul>© by wales.nhs.uk
  5. 5. freedom of choice <ul><li>open source “publishing” platforms available in 2006 </li></ul><ul><ul><li>- Rhaptos / Connexions (based on Zope/Plone) </li></ul></ul><ul><ul><li>- Open Journal System (based on Drupal) </li></ul></ul><ul><ul><li>- DPubS (based on Fedora) </li></ul></ul><ul><ul><li>- ePrints </li></ul></ul><ul><ul><li>- Apache Lenya </li></ul></ul><ul><li>© by David Pescovitz </li></ul><ul><li>but no system offered a high performance semantic repository with a “Web 2.0” user interface… </li></ul>
  6. 6. behind door #1 – Fedora + Kowari (Mulgara) <ul><li> </li></ul><ul><li>© by That Guy Called Ben </li></ul>
  7. 7. … at the end of the road <ul><li>semantic publishing platform based on Fedora and Mulgara </li></ul><ul><li>Topaz (back-end glue) </li></ul><ul><ul><li>Object to Triple Mapping (OTM) </li></ul></ul><ul><ul><li>Object Query Language (OQL) </li></ul></ul><ul><ul><ul><li>© by Michael James </li></ul></ul></ul><ul><li>Ambra journal publishing system (front-end user interface) </li></ul>
  8. 8. Ambra / Topaz journal publishing platform Apache Ambra Fedora + Mulgara RDF Store Topaz OTM Topaz Files CAS Fedora is used to store digital objects (XML, PDF, images, etc.) article metadata, annotations (annotea) and user information (foaf) is stored as triples in Mulgara Topaz is used for storage and retrieval of the digital objects and triple stores through the Objects to Triples Mapping (OTM) Ambra (user interface) CAS single sign-on service Apache webhead
  9. 9. under the hood of Topaz (1) <ul><li>an Object-Triples-Mapping (OTM) library </li></ul><ul><ul><li>modeled after Hibernate Object-Relational Mapping (ORM) </li></ul></ul><ul><ul><li>except the database is made of RDF triples instead of a relational database. </li></ul></ul><ul><li>provides a query language based on objects (OQL) </li></ul><ul><ul><li>an &quot;object&quot; based query syntax </li></ul></ul><ul><ul><li>makes life a bit easier for developers </li></ul></ul><ul><li>OQL example </li></ul><ul><ul><li>select all articles with a given title: </li></ul></ul><ul><ul><li>select a.id, a.author from Article a where a.title = 'Hello Dolly'; </li></ul></ul>
  10. 10. under the hood of Topaz (2) <ul><li>defines Java classes maps the classes into RDF </li></ul><ul><ul><li>Ambra defines models which are mapped into sets of triples in various graphs </li></ul></ul><ul><ul><li>such as “article”, “annotation”, etc. models defined in Ambra </li></ul></ul><ul><li>provides support for storing files to a separate blob store (Fedora and/or Akubra) </li></ul><ul><li>provides storage and retrieval of files and triples in a single transaction </li></ul><ul><ul><li>necessary to render an article with associated metadata (e.g. notes, ratings, etc.) </li></ul></ul>
  11. 11. why Objects to Triples Mapping (OTM)? allows for retrieving collections of objects (fast) with one query instead of a single object at a time (slow) as an online-only publisher, we need fast
  12. 12. Ambra <ul><li>first application built on Topaz application framework </li></ul><ul><li>“ Web 2.0” features </li></ul><ul><ul><li>uses the FreeMarker templating engine to display the content received from Topaz service. </li></ul></ul><ul><ul><li>uses the DOJO JavaScript toolkit to handle complex user interactions like annotations, ratings, etc. </li></ul></ul><ul><ul><li>provides social networking features (in-line notes, comments, trackbacks) </li></ul></ul><ul><ul><li>turns a reader of scientific articles into a knowledge contributor, knowledge that can be used by other users </li></ul></ul><ul><ul><li>living document! </li></ul></ul>
  13. 13. Ambra features Ambra article ingestion search annotations discussions security mgmt ratings user profile/ preferences atom feeds multiple journals trackbacks SignOn Server article publication CrossRef registration DOI resolver Cache for web content and digital objects CAS single sign-on
  14. 14. Ambra – what it’s not <ul><li>cms </li></ul><ul><ul><li>only NLM DTD XML </li></ul></ul><ul><li>workflow engine </li></ul><ul><li>peer-review system </li></ul><ul><li>scientific social site </li></ul><ul><li>out-of-the box solution for journal publishing </li></ul>© by roflrazzi.com
  15. 15. Ambra – new development <ul><li>article level metrics </li></ul><ul><li>article usage data (COUNTER) </li></ul><ul><li>tags and better discoverability </li></ul><ul><li>semantic enhancement </li></ul><ul><li>automatic file transfer to external sources </li></ul><ul><ul><li>PubMed Central </li></ul></ul><ul><ul><li>other repostories </li></ul></ul>
  16. 16. article level metrics <ul><li>“ impact” of an article outside of citations </li></ul><ul><ul><li>notes, comments, ‘star ratings’ and trackbacks </li></ul></ul><ul><li>in March ‘09, we launched: </li></ul><ul><li>1. number of Citations </li></ul><ul><ul><li>PubMedCentral and Scopus </li></ul></ul><ul><li>2. amount of Blog coverage </li></ul><ul><ul><li>Postgenomic, Nature Blogs and Bloglines </li></ul></ul><ul><li>3. number of Social Bookmarks </li></ul><ul><ul><li>CiteULike and Connotea </li></ul></ul>
  17. 18. article usage data (COUNTER) <ul><li>in mid-2009, we will add usage data to every article </li></ul><ul><ul><li>HTML Page Views </li></ul></ul><ul><ul><li>PDF Downloads </li></ul></ul><ul><ul><li>XML Downloads </li></ul></ul><ul><li>article usage data to be displayed numerically and graphically </li></ul><ul><ul><li>includes historical data </li></ul></ul><ul><ul><li>in the context of other articles within the journal </li></ul></ul>
  18. 20. semantic enhancement of content <ul><li>add value to the content of a research article </li></ul><ul><li>highlight text for selected terms </li></ul><ul><ul><li>protein names </li></ul></ul><ul><ul><li>genus / species </li></ul></ul><ul><ul><li>disease </li></ul></ul><ul><ul><li>location / habitat </li></ul></ul><ul><ul><li>etc. </li></ul></ul><ul><li>provide links to external sources to create new user interactions </li></ul>
  19. 21. © by David Shotton
  20. 22. future development…? © by David Shotton <ul><li>ingest and publish many types of content / data </li></ul><ul><ul><li>structured and unstructured </li></ul></ul><ul><li>access triple store </li></ul><ul><ul><li>sparql endpoint </li></ul></ul><ul><li>REST-based API </li></ul>
  21. 23. future development…? © by David Shotton
  22. 24. system requirements <ul><li>minimum - single server (Linux) with 8 Gb RAM </li></ul><ul><li>… better (based on PLoS journals): </li></ul><ul><ul><li>1 server for Fedora and Mulgara with 8Gb RAM </li></ul></ul><ul><ul><li>1 server for Ambra and Topaz with 8Gb RAM </li></ul></ul><ul><ul><li>1 server for Apache and CAS with 4Gb RAM </li></ul></ul><ul><li>PLoS journals on Ambra / Topaz </li></ul><ul><ul><li>800k visits / month </li></ul></ul><ul><ul><li>1.8 million pageviews / month </li></ul></ul><ul><li>Amazon AMI to test Ambra / Topaz available soon! </li></ul>
  23. 25. where are they now? <ul><li>funding for Topaz project has concluded but it’s still an active project </li></ul><ul><li>(not a deathstar) </li></ul><ul><li>Topaz moved to Fedora Commons </li></ul><ul><ul><li>Paul Gearon, Fedora Commons, technical lead of semantic technologies projects </li></ul></ul><ul><li>Ambra stewardship moved to PLoS </li></ul><ul><ul><li>PLoS fully committed to Ambra / Topaz platform </li></ul></ul><ul><ul><li>small development team working on new features </li></ul></ul>© by nydailynews.com
  24. 26. interested? <ul><li>Ambra project site launched </li></ul><ul><ul><li>www.ambraproject.org </li></ul></ul><ul><ul><li>documentation in progress </li></ul></ul><ul><ul><li>we need your input! </li></ul></ul><ul><ul><li>we need models for other types of content </li></ul></ul><ul><li>Open Access Publishing solution community at Fedora Commons </li></ul><ul><ul><li>“ steward” = Richard Cave, PLoS </li></ul></ul><ul><ul><li>“ knowledgebase gardner” = Chris Freeland, Biodiversity Heritage Library </li></ul></ul>
  25. 27. resources <ul><li>Topaz website </li></ul><ul><li>http://www.topazproject.org/ </li></ul><ul><li>Topaz manual </li></ul><ul><li>http:// www.topazproject.org/trac/wiki/Topaz/Manual </li></ul><ul><li>Ambra website </li></ul><ul><li>http:// www.ambraproject.org / </li></ul><ul><li>Ambra mailing lists: </li></ul><ul><li>http:// lists.topazproject.org/mailman/listinfo/ambra -users </li></ul><ul><li>http://lists.topazproject.org/mailman/listinfo/ambra-dev </li></ul><ul><li>Richard Cave – rcave at plos.org </li></ul>

×