Pushing	
  Chemical	
  Biology	
  Data	
  
Through	
  the	
  Pipes	
  
Architec(ng	
  and	
  Extending	
  the	
  BARD	
  A...
Outline	
  
MoIvaIons	
  
for	
  the	
  BARD	
  
InteracIng	
  
with	
  the	
  BARD	
  
Extensibility	
  &	
  
Community	
...
The	
  BioAssay	
  Research	
  Database	
  
•  Originated	
  from	
  the	
  NIH	
  Molecular	
  Libraries	
  
Program	
  
...
Goals	
  of	
  the	
  BARD	
  
BARD’s	
  mission	
  is	
  to	
  enable	
  novice	
  and	
  expert	
  
scien7sts	
  to	
  e...
Components	
  of	
  the	
  BARD	
  Pla=orm	
  
CAP, Data Dictionary,
and Results
Deposition Data
model created &
populated...
Interac?ng	
  with	
  the	
  BARD	
  
Searching	
  the	
  BARD	
  
•  Full	
  text	
  search	
  via	
  Lucene/Solr	
  
– Key	
  entry	
  point	
  for	
  new	
  ...
The	
  BARD	
  API	
  
•  A	
  RESTful	
  applicaIon	
  programming	
  interface	
  
•  Access	
  to	
  individual	
  &	
 ...
API	
  Architecture	
  
•  Java,	
  read-­‐only,	
  deployed	
  on	
  Glassfish	
  cluster	
  
•  Different	
  funcIonality	...
ETL Database Text Search Engine Structure Search Engine
Caching Layer
http://bard.nih.gov/api
Jersey Webapps
deployed on H...
API	
  Resources	
  
•  Covers	
  many	
  data	
  types	
  
•  Each	
  resource	
  supports	
  a	
  	
  
variety	
  of	
  ...
API	
  Resources	
  
En7ty	
   Count	
  	
  
/assays! 990	
  
/biology! 1080	
  
/cap! 1942	
  
/compounds! 42,572,799	
  ...
Data	
  Warning	
  
We’re	
  s7ll	
  in	
  the	
  process	
  of	
  data	
  
cura7on	
  and	
  QC	
  so	
  while	
  you	
  ...
Summary	
  Resources	
  
•  A	
  number	
  of	
  enIIes	
  have	
  a	
  /summary sub-­‐
resource	
  (Compounds,	
  Project...
JSON	
  Responses	
  
•  All	
  responses	
  are	
  	
  
currently	
  JSON	
  
•  EnIIes	
  can	
  include	
  	
  
other	
...
Extending	
  the	
  API	
  
•  Concept	
  of	
  plugins	
  
– Expands	
  the	
  resource	
  hierarchy	
  
– Has	
  to	
  b...
Plugins	
  have	
  to	
  	
  
be	
  deployable	
  
	
  on	
  the	
  JVM	
  
Plugin	
  Development	
  Workflow	
  
What	
  Can	
  a	
  Plugin	
  Expect?	
  
•  Direct	
  access	
  to	
  the	
  database	
  via	
  JDBC	
  
•  Faster	
  acc...
Plugin	
  Valida?on	
  
•  Run	
  a	
  series	
  of	
  checks	
  on	
  a	
  plugin	
  before	
  
deployment	
  
– Catches	...
What	
  Plugins	
  are	
  Available?	
  
•  The	
  plugin	
  registry	
  provides	
  
a	
  list	
  of	
  deployed	
  plugi...
Exemplar	
  Plugins	
  
•  Look	
  at	
  the	
  plugin	
  repository	
  on	
  Github	
  
•  Ranges	
  from	
  	
  
– trivi...
P.	
  Rydberg	
  et	
  al,	
  hp://www.farma.ku.dk/smartcyp/	
  
BARD-­‐	
  SMARTCyp	
  
More	
  Than	
  Just	
  Data	
  
BARD is not just a data store – it’s a platform
•  Seamlessly interact with users’ prefer...
Links	
  
@AskTheBARD	
  
hNp://bard.nih.gov	
  
API	
  Source	
  Code	
  Repository	
  
Plugin	
  Source	
  Code	
  Repos...
Upcoming SlideShare
Loading in...5
×

Pushing Chemical Biology Through the Pipes

518

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
518
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Pushing Chemical Biology Through the Pipes

  1. 1. Pushing  Chemical  Biology  Data   Through  the  Pipes   Architec(ng  and  Extending  the  BARD  API   Rajarshi  Guha,  John  Braisted,  Ajit   Jadhav,  Dac-­‐Trung  Nguyen,  Tyler   Peryea,  Noel  Southall     September  9,  2014   Indianapolis,  IN  
  2. 2. Outline   MoIvaIons   for  the  BARD   InteracIng   with  the  BARD   Extensibility  &   Community  
  3. 3. The  BioAssay  Research  Database   •  Originated  from  the  NIH  Molecular  Libraries   Program   •  MoIvated  to  make  the  bioassay  data  generated   by  the  MLP  more  accessible  and  amenable  to   exploraIon  and  hypothesis  generaIon   •  Joint  effort  between  NCGC,  Broad,  UNM,   UMiami,  Vanderbilt,  Scripps,  Burnham  
  4. 4. Goals  of  the  BARD   BARD’s  mission  is  to  enable  novice  and  expert   scien7sts  to  effec7vely  u7lize  MLP  data  to   generate  new  hypotheses     •  Developed  as  an  open-­‐source,    industrial-­‐strength   plaWorm  to  support  public  translaIonal  research   •  Foster  new  methods  to  interpret  &  analyze  chemical   biology  data   •  Develop  and  adopt  an  Assay  Data  Standard   •  Enable  co-­‐loca7on  of  data  and  methods  
  5. 5. Components  of  the  BARD  Pla=orm   CAP, Data Dictionary, and Results Deposition Data model created & populated CAP UI with View and basic editing Dictionary defined as OWL using Protégé Warehouse loaded with all PubChem AIDs and results Warehouse loaded with GO terms, KEGG terms, and DrugBank annotations
  6. 6. Interac?ng  with  the  BARD  
  7. 7. Searching  the  BARD   •  Full  text  search  via  Lucene/Solr   – Key  entry  point  for  new  users   – Search  code  runs  queries  in  parallel   – Fast  faceIng,  auto  suggest,  filters   •  EnIIes  are  linked  manually  via  Solr  schema   – Allows  us  to  pick  up  enIIes  related  to  the  query   – Supply  matching  context   •  Custom  NCATS  code  for  fast  structure  searches  
  8. 8. The  BARD  API   •  A  RESTful  applicaIon  programming  interface   •  Access  to  individual  &  collecIons  of  enIIes   •  Documented  via  the  wiki   •  Versioned   •  Hit  a  URL,  get  a  JSON  response   – Every  language  supports  parsing  JSON  documents   – Easy  to  inspect  via  the  browser  or  REST  clients   hp://bard.nih.gov/api/v18/  
  9. 9. API  Architecture   •  Java,  read-­‐only,  deployed  on  Glassfish  cluster   •  Different  funcIonality   hosted  in  different   containers   – Maintenance,  security   – Stability   – Performance     API Text Search Struct Search Data Warehouse Plugins
  10. 10. ETL Database Text Search Engine Structure Search Engine Caching Layer http://bard.nih.gov/api Jersey Webapps deployed on HA Application Server Cluster Open  Source  as  Far  as  Possible  
  11. 11. API  Resources   •  Covers  many  data  types   •  Each  resource  supports  a     variety  of  sub-­‐resources   – Usually  linked  to  other     resources   •  Use  /_info  to  see  what     sub-­‐resources  are  available   /assays/_info!
  12. 12. API  Resources   En7ty   Count     /assays! 990   /biology! 1080   /cap! 1942   /compounds! 42,572,799   /documents! 10,499   /experiments! 1314   /exptdata! 32,754,242   /projects! 144   /substances! 113,751,456  
  13. 13. Data  Warning   We’re  s7ll  in  the  process  of  data   cura7on  and  QC  so  while  you  can   access  all  BARD  data,  keep  in  mind  that   much  of  it  will  be  undergoing  review   and  cura7on  and  so  may  change  
  14. 14. Summary  Resources   •  A  number  of  enIIes  have  a  /summary sub-­‐ resource  (Compounds,  Projects)   •  Aggregates  informaIon,  suitable  for  dashboards  
  15. 15. JSON  Responses   •  All  responses  are     currently  JSON   •  EnIIes  can  include     other  enIIes  (recursively)   •  JSON  Schema  is  available     via    /_schema 
   /assays/_schema!
  16. 16. Extending  the  API   •  Concept  of  plugins   – Expands  the  resource  hierarchy   – Has  to  be  wrien  in  Java   •  What  can  a  plugin  accept?   – Anything   •  What  can  a  plugin  provide?   – Anything  –  plain  text,  XML,  JSON,  HTML,  Flash   •  Plugin  manifest  –  describe  available  resources,   argument  types    
  17. 17. Plugins  have  to     be  deployable    on  the  JVM   Plugin  Development  Workflow  
  18. 18. What  Can  a  Plugin  Expect?   •  Direct  access  to  the  database  via  JDBC   •  Faster  access  to  REST  API  via  co-­‐localizaIon   •  No  local  storage  in  the  BARD  warehouse   – But  the  plugin  can  use  its  own  storage  (such  as  an   embedded  database)   •  Plugins  have  access  to  system  JARs  (e.g.,  XOM,   JChem)  but  should  bundle  their  own  required   dependencies  
  19. 19. Plugin  Valida?on   •  Run  a  series  of  checks  on  a  plugin  before   deployment   – Catches  manifest/resource  errors   – Doesn’t  check  for  correctness  (not  our  job)   – Examines  plugin  Java  class   – Examines  final  plugin  package   •  Run  as  a  command  line  tool  or  from  your  code   •  Could  be  made  into  an  Ant/Maven  plugin  
  20. 20. What  Plugins  are  Available?   •  The  plugin  registry  provides   a  list  of  deployed  plugins     – Path  to  the  plugin   – Version   – Availability   •  Long  term  goal  is  to     have  a  plugin  store   /plugins/registry/list!
  21. 21. Exemplar  Plugins   •  Look  at  the  plugin  repository  on  Github   •  Ranges  from     – trivial  calls  to  external  service   – Incorporate  external  tools/programs   •  Current  set  of  plugins  highlight  plugin   development   •  Plugins  let  us  move  non-­‐essenIal  funcIonality   out  of  the  core  API    
  22. 22. P.  Rydberg  et  al,  hp://www.farma.ku.dk/smartcyp/   BARD-­‐  SMARTCyp  
  23. 23. More  Than  Just  Data   BARD is not just a data store – it’s a platform •  Seamlessly interact with users’ preferred tools •  Allows the community to tailor it to their needs •  Serve as a meeting ground for experimental and computational methods •  Enhance collaboration opportunities
  24. 24. Links   @AskTheBARD   hNp://bard.nih.gov   API  Source  Code  Repository   Plugin  Source  Code  Repository   Development  Wiki  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×