The BioAssay Research Database A Pla4orm to Support the Collec:on, Management and Analysis of Chemical Biology Data hCp://bard.nih.gov ACS Naonal Meeng New Orleans @AskTheBARD April 7, 2013
Direct Contributors NIH Molecular Libraries – Glenn McFadden, Ajay PillaiNIH Chemical Genomics Center – Chris Austin (PI), John Braisted, MarcFerrer, Rajarshi Guha, Ajit Jadhav, Dac-Trung Nguyen, Tyler Peryea, NoelSouthall, Henrike VeithBroad Institute – Benjamin Alexander, Jacob Asiedu, Kay Aubrey, JoshuaBittker, Steve Brudz, Simon Chatwin, Paul Clemons, Vlado Dancik, SivaDandapani, Andrea DeSouza, Dan Durkin, David Lahr, Jeri Levine, JudyMcGloughlin, Phil Montgomery, Jose Perez, Stuart Schreiber (PI), GilWalzer, Xiaorong XiangUniversity of New Mexico – Cristian Bologa, Steve Mathias, Tudor Oprea,Larry Sklar, Oleg Ursu, Anna Waller, Jeremy YangUniversity of Miami – Saminda Abeyruwan, Hande Küküc, VanceLemmon, Ahsan Mir, Magdalena Przydzial, Kunie Sakurai, StephanSchürer, Uma Vempati, Ubbo VisserVanderbilt University – Eric Dawson, Bill Graham, Craig Lindsley, ShaunStaufferSanford-Burnham Medical Research Institute – “T.C.” Chung, JenaDiwan, Michael Hedrick, Gavin Magnuson, Siobhan Malany, Ian Pass,Anthony Pinkerton, Derek StonichScripps Research Institute – Yasel Cruz, Mark Southern
BARD: BioAssay Research DatabaseBARD’s mission is to enable novice and expert scientists toeffectively utilize MLP data to generate new hypotheses• Unique collaboration amongst NIH and academic centers with expertise in screening and software development• Developed as an open-source, industrial-strength platform to support public translational research.• Provides opportunity to address existing cheminformatics barriers o Deploy predictive models o Foster new methods to interpret chemical biology data o Enable private data sharing o Develop and adopt a Assay Data Standard with tools to: o Annotate assays to a minimum standards and definitions o Integrate and extend existing ontologies for meaningful experiment descriptions o Enable assay creation, registration and modification o Provide an easy-to-use portal and an advanced desktop client
Engagement & Milestones Summer 2011 MLP issues administrative supplement and call for proposals to create the Molecular Libraries Biological DatabaseJanuary 2012 Inaugural meeng of MLPCN Stakeholders & NIH MLP PT February 2012 Update on progress-‐ data extracon & annotaon, test plaKorm selecon, GUI design & test, Outreach March 2012 BARD Program Kick-‐oﬀ April 2012 Outreach strategy & tacc session at UNM w/ subteam May – July 2012 Discussions with and reviews of Amgen, Vertex, Novars, Sanoﬁ assay registraon and chem-‐bio informaon query systems November 2012 Conducted mul-‐level usability interviews on BARD GUI & funcon w/ Dir. Computaon, Informacs/Lab Mgr, TA Lead, Dir. Chem, Med chem, Db developer, Cmpd curator January 2013 BARD Review by Ext. Sci Panel & Public alpha release (CAP, REST API, Web & Desktop clients) March 2013 BARD limited beta-‐release – then transion to enabling science
BARD Technology Components Define & Register Assays Enable Hypothesis Generation Data Dictionary – std terms Catalog of Assay Protocols High Quality Data & Result Deposition Calculations & Results Project-experiment association Query & Interpret Information Intuitive Guided Queries Cross Assay & SAR centric views Advance applicationsNovice Expert
Where Are We today? CAP, Data Dictionary, Dictionary defined asand Results OWL using ProtégéDeposition Datamodel created & Annotations for 85%populated of MLPCN experiments &CAP UI with View and projects loaded viabasic editing spreadsheetWarehouse loaded Manual annotation ofwith all PubChem AIDs ~70% completedAIDs and results by centers ~95% of PubChemWarehouse loaded result types mappedwith GO terms, KEGG to BARD dictionaryterms, and DrugBankannotations ~70% of PubChem columns mapped to BARD result types
The BARD Data Warehouse • Running on MySQL with replication• 0.85 TB of data… – 151M result rows – 46M compound rows• Locally deployed at UNM• Planning to build better packaging – VM based deployment
Open Source As Far as Possible http://bard.nih.gov/api Jersey Webapps deployed on HA Application Server Cluster Caching LayerETL Database Text Search Engine Structure Search Engine
The BARD Public API • Java, REST-like, read-only, deployed on Glassfish cluster• Different functionality hosted in different containers API Plugins – Maintenance, security – Stability Text Struct – Performance Search Search• Versioned Data Warehouse• Fully documented
API Resources • Extensive list of resources covering many data types• Each resource supports a variety of sub-resources – Usually linked to other resources
API Level of Detail • Supports different levels of detail• Allows clients to trade- off detail for speed• Good for mobile apps
API Caching & Storage • Caching is enabled at resource level• The API supports ETags – Every request returns an ETag in the header – With If-None-Match, supports web caching• We also abuse ETags to support persistent references to collections• An ETag can refer to other ETags recursively – Allows clients to create and store arbitrarily complex collections• Not permanent, not infinite!
Annota:ng Data • To best exploit the current data set, and encourage discoverability, we need to better structure the data – Annotate all assays to a minimum standard – Integrate and extend existing ontologies to support meaningful experiment descriptions – Develop processes BARD Assay Definition Hierarchy and tools to BARD Dictionary & Term Hierarchy enable assay BioAssay Ontology BioAssay Ontology Gene Ontology BioAssay Ontology Gene Ontology BioAssay Ontology registration Uniprot Uniprot Uniprot Chemical Ontology Entrez Disease Ontology Unit Ontology Unit Ontology
(Pseudo) Linked Data • Full text search enabled by Solr – Enables filtering, faceting, auto-suggest – Key entry point for users – Type ahead suggestions provide guidance• By virtue of manual associations of data types, we enable “linked data” – Allows searches to indicate what matched the query and how – Solr supports sophisticated scoring schemes• Doesn’t yet take advantage of ontologies
Desktop Client • Support large datasets• Merge private & public data• Examine SAR
Web Client Google-‐like searching of: 4,000+ assays, 35M+ compounds, 300+ projects Amazon-‐like Query Cart Save items of interest for further analysis Filter on annotaons, such as detecon method type
Community Engagement • Sustained outreach efforts – 7 MLPCN sites participating• Facilitate access, driven by compelling use- cases and stakeholder feedback – Assay definition standard is collaboration with industrial partners in addition to MLPCN• Publish APIs for data access, first-adopters• A ‘BARD App Store’: Enabling new approaches to data integration, mining – Promiscuity calculations – CYP450 prediction
Extending BARD with Plugins • BARD supports deployment of external code as part of core API• Plugins can access the data warehouse via direct calls – No need to go via REST API• Plugin resources can accept anything – Text, JSON, files, links, …• Plugin responses can be anything – Plain text, JSON, HTML, SVG, …
BARD Plugin Development Plugins have to be deployable on the JVM
BARD -‐ SMARTCyp • Predicts site of metabolism by CYP450 isoforms using 2D structures• Developed by Patrik Rydberg and co- workers• Released under LGPL• BARD plugin exposes two resources – Summary HTML view – Data view (JSON)
BARD -‐ SMARTCyp P. Rydberg et al, hgp://www.farma.ku.dk/smartcyp/
BARD - BADAPPLE • BioActivity Data Associative Promiscuity Pattern Learning Engine • Associations via scaffolds for chemical space navigation. Example URI* descripon <base>/badapple/prom/cid/ For compound with speciﬁed ID, 752424 return scaﬀold IDs and scores. <base>/badapple/prom/cid/ Addional stascs, scaﬀold smiles, 752424?expand=true and inDrug ﬂag. <base>/badapple/prom/ For scaﬀold with speciﬁed ID, scaﬁd/233 return stascs and smiles.
On the Horizon • Reproducibility – Be honest with me … • Private data in the context of public data – Local installs, molecule hashes • Mobile – Compounds as funny looking QR tags23
Long-Term Path Forward• BARD is not just a data store – it’s a platform – Seamlessly interact with users’ preferred tools – Allows the community to tailor it to their needs – Serve as a meeting ground for experimental and computational methods – Enhance collaboration opportunities – Consider cloud deployment• Enhance the ability to translate data from individual experiments to systems level insight
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.