RDAP 033111
Upcoming SlideShare
Loading in...5
×
 

RDAP 033111

on

  • 1,265 views

Presentation on the RCSB PDB at the 2nd Research Data Access and Preservation (RDAP) Summit Denver Colorado March 31, 2011

Presentation on the RCSB PDB at the 2nd Research Data Access and Preservation (RDAP) Summit Denver Colorado March 31, 2011

Statistics

Views

Total Views
1,265
Views on SlideShare
1,265
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

RDAP 033111 RDAP 033111 Presentation Transcript

  • A Few RDAP Thoughts Based on Experience with The RCSB Protein Data Bank www.rcsb.org Philip E. Bourne UCSD [email_address] 3/31/11 RDAP Summit 2011
  • Disclaimer
    • I am not an expert in institutional repositories
    • I happen to have helped develop and oversee a resource that I use for my own research
    3/31/11 RDAP Summit 2011
  • What is the Protein Data Bank (PDB)?
    • The single community owned worldwide repository containing structures of publically accessible biological macromolecules
    • A resource used by ~ 200,000 individuals per month
    • A resource distributing worldwide the equivalent to ¼ the National Library of Congress each month
    • A bicoastal resource
    • 1TB
    3/31/11
  • PDB Total Contents by Year Number of released entries Year 3/31/11
  • Why We Think We Are Successful?
    • Number of visits and page views is growing faster than number of unique visitors
  • Metric of Success - A Research Tool for Influenza * http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm Structure Summary page activity for H1N1 Influenza related structures * 3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir Jan. 2008 Jan. 2009 Jan. 2010 Jul. 2009 Jul. 2008 Jul. 2010 1RUZ: 1918 H1 Hemagglutinin
  • Looking Back Over the Past 12 Years – In General
    • Everything was harder and took longer than we thought
    • There are a lot of politics associated with data
    • Emphasis has shifted from archive to + analytical tool to + educational tool
    • Consequently outreach is our most important yet least understood activity today
    • Staff needed to change accordingly
    • Policy has changed as well – some support for non-generic tools
    • Prorated our budget has decreased
  • Looking Back Over the Past 12 Years – Infrastructure
    • It took about 5 years to achieve and subsequently sustain 99.99% uptime
    • We have gone through 3 distinct architectural changes
      • Object model / Perl CGI
      • Object-relational model Enterprise Java
      • Redesign same model widget based UI
    3/31/11 RDAP Summit 2011 Bluhm et al. 2011 Quality Assurance doi: 10.1093/database/bar003
  • Looking Back Over the Past 12 Years – Data & Data Management
    • About 25% of our budget has been spent on data remediation
    • Support yearly snapshots and versioning
    • Our ontology/data model has been a critical component of our workflow and data accuracy
    • The same model is too complex to facilitate wide adoption by others that use our data
    • Our data are such that we can retain redundant copies
    • Data objects are discreet and we assign DOIs
    • Constantly striving to have the user distinguish raw from derived data
    3/31/11 RDAP Summit 2011
  • Trends Today
    • Constant demand for better performance
    • Use of Web services (SOAP and now RESTful) are increasing
    • The uptake on the use of widgets has been slower than I hoped
    • Users are hankering after additional annotations of the data – working on database-literature integration
    • Mobile use is increasing
    • Web 2.0 services are in demand
    3/31/11 RDAP Summit 2011
  • Website Performance Improvements
    • Back End
      • Back-end tuning and use of multilevel caching in the areas of searches, query results, explorer pages and hierarchical views
      • Better performance and a more robust and scalable system
    • Front End
      • Cleaner JavaScript and CSS
      • Inline Image Data
      • Compressed Content (Gzip + Base 64)
      • Result: 25% - 40% increase in render performance
  • Literature Integration – The Dream
    • User clicks on content
    • Metadata and webservices to data provide an interactive view that can be annotated
    • Selecting features provides a data/knowledge mashup
    • Analysis leads to new content I can share
    1. A link brings up figures from the paper 0. Full text of PLoS papers stored in a database 2. Clicking the paper figure retrieves data from the PDB which is analyzed 3. A composite view of journal and database content results 4. The composite view has links to pertinent blocks of literature text and back to the PDB 1. 2. 3. 4. The Knowledge and Data Cycle PLoS Comp. Biol. 2005 1(3) e34
  • Example of Interoperability: The Database View www.rcsb.org/pdb/explore/literature.do?structureId=1TIM BMC Bioinformatics 2010 11:220
  • Example of Interoperability – The Literature View From Anita de Waard, Elsevier
  • Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that Data, But as Yet Not Used Much Will Widgets and Semantic Tagging Change Computational Biology? PLoS Comp. Biol. 6(2) e1000673
  • Semantic Tagging of Database Content in The Literature or Elsewhere http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jsp PLoS Comp. Biol. 6(2) e1000673 Semantic Tagging
  • PDBMobile
    • Fast, low bandwidth data access
    • First version supports iPhone OS
    • Future versions will support Android, Blackberry OS6 and others.
    • HTML 5-based web application
    • Client-side database stores data for offline-access
    • Tight integration with MyPDB
    Objective: PDB Data Access On-The-Go
  • PDBMobile
    • Access to saved queries
    • Add/delete queries
    • Flag interesting entries
    • Add personal structure annotations
    Tight Integration with MyPDB
  • Future
    • New views on the data for subclasses of user
    • New data deposition system – increase speed and accuracy while reducing costs
    • New types of analysis
  • Acknowledgements Funding Agencies: NSF, NIGMS, DOE, NLM, NCI, NCRR, NIBIB, NINDS, NIDDK 3/31/11 RDAP Summit 2011