The BioAssay Research Database


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The BioAssay Research Database

  1. 1. The  BioAssay  Research  Database   A  Pla4orm  to  Support  the  Collec:on,  Management  and   Analysis  of  Chemical  Biology  Data     hCp://  ACS  Naonal  Meeng  New  Orleans   @AskTheBARD  April  7,  2013  
  2. 2. Direct  Contributors  NIH Molecular Libraries – Glenn McFadden, Ajay PillaiNIH Chemical Genomics Center – Chris Austin (PI), John Braisted, MarcFerrer, Rajarshi Guha, Ajit Jadhav, Dac-Trung Nguyen, Tyler Peryea, NoelSouthall, Henrike VeithBroad Institute – Benjamin Alexander, Jacob Asiedu, Kay Aubrey, JoshuaBittker, Steve Brudz, Simon Chatwin, Paul Clemons, Vlado Dancik, SivaDandapani, Andrea DeSouza, Dan Durkin, David Lahr, Jeri Levine, JudyMcGloughlin, Phil Montgomery, Jose Perez, Stuart Schreiber (PI), GilWalzer, Xiaorong XiangUniversity of New Mexico – Cristian Bologa, Steve Mathias, Tudor Oprea,Larry Sklar, Oleg Ursu, Anna Waller, Jeremy YangUniversity of Miami – Saminda Abeyruwan, Hande Küküc, VanceLemmon, Ahsan Mir, Magdalena Przydzial, Kunie Sakurai, StephanSchürer, Uma Vempati, Ubbo VisserVanderbilt University – Eric Dawson, Bill Graham, Craig Lindsley, ShaunStaufferSanford-Burnham Medical Research Institute – “T.C.” Chung, JenaDiwan, Michael Hedrick, Gavin Magnuson, Siobhan Malany, Ian Pass,Anthony Pinkerton, Derek StonichScripps Research Institute – Yasel Cruz, Mark Southern
  3. 3. BARD: BioAssay Research DatabaseBARD’s mission is to enable novice and expert scientists toeffectively utilize MLP data to generate new hypotheses•  Unique collaboration amongst NIH and academic centers with expertise in screening and software development•  Developed as an open-source, industrial-strength platform to support public translational research.•  Provides opportunity to address existing cheminformatics barriers o  Deploy predictive models o  Foster new methods to interpret chemical biology data o  Enable private data sharing o  Develop and adopt a Assay Data Standard with tools to: o  Annotate assays to a minimum standards and definitions o  Integrate and extend existing ontologies for meaningful experiment descriptions o  Enable assay creation, registration and modification o  Provide an easy-to-use portal and an advanced desktop client
  4. 4. Engagement  &  Milestones  Summer  2011   MLP issues administrative supplement and call for proposals to create the Molecular Libraries Biological DatabaseJanuary    2012   Inaugural  meeng  of  MLPCN  Stakeholders  &  NIH  MLP  PT  February  2012   Update  on  progress-­‐  data  extracon  &  annotaon,  test  plaKorm   selecon,  GUI  design  &  test,  Outreach  March  2012   BARD  Program  Kick-­‐off  April  2012   Outreach  strategy  &  tacc  session  at  UNM  w/  subteam  May  –  July  2012   Discussions  with  and  reviews  of  Amgen,  Vertex,  Novars,  Sanofi  assay   registraon  and  chem-­‐bio  informaon  query  systems  November  2012   Conducted  mul-­‐level  usability  interviews  on  BARD  GUI  &  funcon  w/   Dir.  Computaon,  Informacs/Lab  Mgr,  TA  Lead,  Dir.  Chem,  Med  chem,   Db  developer,  Cmpd  curator  January    2013   BARD  Review  by  Ext.  Sci  Panel  &  Public  alpha  release  (CAP,  REST  API,  Web   &  Desktop  clients)  March  2013   BARD  limited  beta-­‐release  –  then  transion  to  enabling  science  
  5. 5. BARD  Technology  Components   Define & Register Assays Enable Hypothesis Generation Data Dictionary – std terms Catalog of Assay Protocols High Quality Data & Result Deposition Calculations & Results Project-experiment association Query & Interpret Information Intuitive Guided Queries Cross Assay & SAR centric views Advance applicationsNovice   Expert  
  6. 6. Where  Are  We  today?  CAP, Data Dictionary, Dictionary defined asand Results OWL using ProtégéDeposition Datamodel created & Annotations for 85%populated of MLPCN experiments &CAP UI with View and projects loaded viabasic editing spreadsheetWarehouse loaded Manual annotation ofwith all PubChem AIDs ~70% completedAIDs and results by centers ~95% of PubChemWarehouse loaded result types mappedwith GO terms, KEGG to BARD dictionaryterms, and DrugBankannotations ~70% of PubChem columns mapped to BARD result types
  7. 7. The  BARD  Data  Warehouse  •  Running on MySQL with replication•  0.85 TB of data… –  151M result rows –  46M compound rows•  Locally deployed at UNM•  Planning to build better packaging –  VM based deployment
  8. 8. Open  Source  As  Far  as  Possible Jersey Webapps deployed on HA Application Server Cluster Caching LayerETL Database Text Search Engine Structure Search Engine
  9. 9. The  BARD  Public  API  •  Java, REST-like, read-only, deployed on Glassfish cluster•  Different functionality hosted in different containers API Plugins –  Maintenance, security –  Stability Text Struct –  Performance Search Search•  Versioned Data Warehouse•  Fully documented
  10. 10. API  Resources  •  Extensive list of resources covering many data types•  Each resource supports a variety of sub-resources –  Usually linked to other resources
  11. 11. API  Level  of  Detail  •  Supports different levels of detail•  Allows clients to trade- off detail for speed•  Good for mobile apps
  12. 12. API  Caching    &  Storage  •  Caching is enabled at resource level•  The API supports ETags –  Every request returns an ETag in the header –  With If-None-Match, supports web caching•  We also abuse ETags to support persistent references to collections•  An ETag can refer to other ETags recursively –  Allows clients to create and store arbitrarily complex collections•  Not permanent, not infinite!
  13. 13. Annota:ng  Data  •  To best exploit the current data set, and encourage discoverability, we need to better structure the data –  Annotate all assays to a minimum standard –  Integrate and extend existing ontologies to support meaningful experiment descriptions –  Develop processes BARD  Assay  Definition   Hierarchy and tools to BARD Dictionary & Term Hierarchy enable assay BioAssay Ontology BioAssay Ontology Gene Ontology BioAssay Ontology Gene Ontology BioAssay Ontology registration Uniprot Uniprot Uniprot Chemical Ontology Entrez Disease Ontology Unit Ontology Unit Ontology
  14. 14. (Pseudo)  Linked  Data  •  Full text search enabled by Solr –  Enables filtering, faceting, auto-suggest –  Key entry point for users –  Type ahead suggestions provide guidance•  By virtue of manual associations of data types, we enable “linked data” –  Allows searches to indicate what matched the query and how –  Solr supports sophisticated scoring schemes•  Doesn’t yet take advantage of ontologies
  15. 15. Desktop  Client  •  Support large datasets•  Merge private & public data•  Examine SAR
  16. 16. Web  Client   Google-­‐like  searching  of:  4,000+  assays,  35M+  compounds,  300+  projects   Amazon-­‐like  Query  Cart   Save  items  of   interest  for  further   analysis  Filter  on  annotaons,  such  as  detecon  method  type  
  17. 17. Community  Engagement  •  Sustained outreach efforts –  7 MLPCN sites participating•  Facilitate access, driven by compelling use- cases and stakeholder feedback –  Assay definition standard is collaboration with industrial partners in addition to MLPCN•  Publish APIs for data access, first-adopters•  A ‘BARD App Store’: Enabling new approaches to data integration, mining –  Promiscuity calculations –  CYP450 prediction
  18. 18. Extending  BARD  with  Plugins  •  BARD supports deployment of external code as part of core API•  Plugins can access the data warehouse via direct calls –  No need to go via REST API•  Plugin resources can accept anything –  Text, JSON, files, links, …•  Plugin responses can be anything –  Plain text, JSON, HTML, SVG, …
  19. 19. BARD  Plugin  Development  Plugins  have  to     be  deployable    on  the  JVM  
  20. 20. BARD  -­‐  SMARTCyp  •  Predicts site of metabolism by CYP450 isoforms using 2D structures•  Developed by Patrik Rydberg and co- workers•  Released under LGPL•  BARD plugin exposes two resources –  Summary HTML view –  Data view (JSON)
  21. 21. BARD  -­‐  SMARTCyp  P.  Rydberg  et  al,  hgp://  
  22. 22. BARD - BADAPPLE •  BioActivity Data Associative Promiscuity Pattern Learning Engine •  Associations via scaffolds for chemical space navigation. Example  URI*   descripon  <base>/badapple/prom/cid/ For  compound  with  specified  ID,  752424   return  scaffold  IDs  and  scores.  <base>/badapple/prom/cid/ Addional  stascs,  scaffold  smiles,  752424?expand=true   and  inDrug  flag.    <base>/badapple/prom/ For  scaffold  with  specified  ID,  scafid/233   return  stascs  and  smiles.  
  23. 23. On the Horizon •  Reproducibility –  Be honest with me … •  Private data in the context of public data –  Local installs, molecule hashes •  Mobile –  Compounds as funny looking QR tags23  
  24. 24. Long-Term Path Forward•  BARD is not just a data store – it’s a platform –  Seamlessly interact with users’ preferred tools –  Allows the community to tailor it to their needs –  Serve as a meeting ground for experimental and computational methods –  Enhance collaboration opportunities –  Consider cloud deployment•  Enhance the ability to translate data from individual experiments to systems level insight