Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RDAP 033111


Published on

Presentation on the RCSB PDB at the 2nd Research Data Access and Preservation (RDAP) Summit Denver Colorado March 31, 2011

Published in: Education
  • Be the first to comment

  • Be the first to like this

RDAP 033111

  1. 1. A Few RDAP Thoughts Based on Experience with The RCSB Protein Data Bank Philip E. Bourne UCSD [email_address] 3/31/11 RDAP Summit 2011
  2. 2. Disclaimer <ul><li>I am not an expert in institutional repositories </li></ul><ul><li>I happen to have helped develop and oversee a resource that I use for my own research </li></ul>3/31/11 RDAP Summit 2011
  3. 3. What is the Protein Data Bank (PDB)? <ul><li>The single community owned worldwide repository containing structures of publically accessible biological macromolecules </li></ul><ul><li>A resource used by ~ 200,000 individuals per month </li></ul><ul><li>A resource distributing worldwide the equivalent to ¼ the National Library of Congress each month </li></ul><ul><li>A bicoastal resource </li></ul><ul><li>1TB </li></ul>3/31/11
  4. 4. PDB Total Contents by Year Number of released entries Year 3/31/11
  5. 5. Why We Think We Are Successful? <ul><li>Number of visits and page views is growing faster than number of unique visitors </li></ul>
  6. 6. Metric of Success - A Research Tool for Influenza * Structure Summary page activity for H1N1 Influenza related structures * 3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir Jan. 2008 Jan. 2009 Jan. 2010 Jul. 2009 Jul. 2008 Jul. 2010 1RUZ: 1918 H1 Hemagglutinin
  7. 7. Looking Back Over the Past 12 Years – In General <ul><li>Everything was harder and took longer than we thought </li></ul><ul><li>There are a lot of politics associated with data </li></ul><ul><li>Emphasis has shifted from archive to + analytical tool to + educational tool </li></ul><ul><li>Consequently outreach is our most important yet least understood activity today </li></ul><ul><li>Staff needed to change accordingly </li></ul><ul><li>Policy has changed as well – some support for non-generic tools </li></ul><ul><li>Prorated our budget has decreased </li></ul>
  8. 8. Looking Back Over the Past 12 Years – Infrastructure <ul><li>It took about 5 years to achieve and subsequently sustain 99.99% uptime </li></ul><ul><li>We have gone through 3 distinct architectural changes </li></ul><ul><ul><li>Object model / Perl CGI </li></ul></ul><ul><ul><li>Object-relational model Enterprise Java </li></ul></ul><ul><ul><li>Redesign same model widget based UI </li></ul></ul>3/31/11 RDAP Summit 2011 Bluhm et al. 2011 Quality Assurance doi: 10.1093/database/bar003
  9. 9. Looking Back Over the Past 12 Years – Data & Data Management <ul><li>About 25% of our budget has been spent on data remediation </li></ul><ul><li>Support yearly snapshots and versioning </li></ul><ul><li>Our ontology/data model has been a critical component of our workflow and data accuracy </li></ul><ul><li>The same model is too complex to facilitate wide adoption by others that use our data </li></ul><ul><li>Our data are such that we can retain redundant copies </li></ul><ul><li>Data objects are discreet and we assign DOIs </li></ul><ul><li>Constantly striving to have the user distinguish raw from derived data </li></ul>3/31/11 RDAP Summit 2011
  10. 10. Trends Today <ul><li>Constant demand for better performance </li></ul><ul><li>Use of Web services (SOAP and now RESTful) are increasing </li></ul><ul><li>The uptake on the use of widgets has been slower than I hoped </li></ul><ul><li>Users are hankering after additional annotations of the data – working on database-literature integration </li></ul><ul><li>Mobile use is increasing </li></ul><ul><li>Web 2.0 services are in demand </li></ul>3/31/11 RDAP Summit 2011
  11. 11. Website Performance Improvements <ul><li>Back End </li></ul><ul><ul><li>Back-end tuning and use of multilevel caching in the areas of searches, query results, explorer pages and hierarchical views </li></ul></ul><ul><ul><li>Better performance and a more robust and scalable system </li></ul></ul><ul><li>Front End </li></ul><ul><ul><li>Cleaner JavaScript and CSS </li></ul></ul><ul><ul><li>Inline Image Data </li></ul></ul><ul><ul><li>Compressed Content (Gzip + Base 64) </li></ul></ul><ul><ul><li>Result: 25% - 40% increase in render performance </li></ul></ul>
  12. 12. Literature Integration – The Dream <ul><li>User clicks on content </li></ul><ul><li>Metadata and webservices to data provide an interactive view that can be annotated </li></ul><ul><li>Selecting features provides a data/knowledge mashup </li></ul><ul><li>Analysis leads to new content I can share </li></ul>1. A link brings up figures from the paper 0. Full text of PLoS papers stored in a database 2. Clicking the paper figure retrieves data from the PDB which is analyzed 3. A composite view of journal and database content results 4. The composite view has links to pertinent blocks of literature text and back to the PDB 1. 2. 3. 4. The Knowledge and Data Cycle PLoS Comp. Biol. 2005 1(3) e34
  13. 13. Example of Interoperability: The Database View BMC Bioinformatics 2010 11:220
  14. 14. Example of Interoperability – The Literature View From Anita de Waard, Elsevier
  15. 15. Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that Data, But as Yet Not Used Much Will Widgets and Semantic Tagging Change Computational Biology? PLoS Comp. Biol. 6(2) e1000673
  16. 16. Semantic Tagging of Database Content in The Literature or Elsewhere PLoS Comp. Biol. 6(2) e1000673 Semantic Tagging
  17. 17. PDBMobile <ul><li>Fast, low bandwidth data access </li></ul><ul><li>First version supports iPhone OS </li></ul><ul><li>Future versions will support Android, Blackberry OS6 and others. </li></ul><ul><li>HTML 5-based web application </li></ul><ul><li>Client-side database stores data for offline-access </li></ul><ul><li>Tight integration with MyPDB </li></ul>Objective: PDB Data Access On-The-Go
  18. 18. PDBMobile <ul><li>Access to saved queries </li></ul><ul><li>Add/delete queries </li></ul><ul><li>Flag interesting entries </li></ul><ul><li>Add personal structure annotations </li></ul>Tight Integration with MyPDB
  19. 19. Future <ul><li>New views on the data for subclasses of user </li></ul><ul><li>New data deposition system – increase speed and accuracy while reducing costs </li></ul><ul><li>New types of analysis </li></ul>
  20. 20. Acknowledgements Funding Agencies: NSF, NIGMS, DOE, NLM, NCI, NCRR, NIBIB, NINDS, NIDDK 3/31/11 RDAP Summit 2011