Exploration of multidimensional biomedical data in pub chem, Presented by Lianyi Han at Solr Exchange DC

900 views

Published on

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
900
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Exploration of multidimensional biomedical data in pub chem, Presented by Lianyi Han at Solr Exchange DC

  1. 1. Exploration of multidimensional biomedical data in PubChem Lianyi Han
  2. 2. National Center for Biotechnology Information Advances science and health by providing access to biomedical and genomic information. Literatures • PubMed • PMC • PubMed Health • … Sequences • Proteins • Genes & Expression • Genome & Maps • … Chemicals & Bioassays • PubChem Databases • BioSystems • … Software & tools • Blast • Structure Search • Entrez/Eutils Structure & Domains • Structure • CDD • …
  3. 3. Provides information on the biological activities of small molecules and beyond PubChemSubstance Compound Bioactivities Literatures (link) Target Patent Pathways 23 million citations
  4. 4. The Challenge •Varityheterogeneous documents with many- to-many relationships •Volume 200M+ bioactivity data 40M+ compounds 600K+ bioassays 20K+ pathways 9k targets •Velocityquery wide quickly, query deep quickly, facet search quickly Answers
  5. 5. The Direction Velocity Volume Existing Search Systems • ASN.1, XML schema • RDMS(SQL) • In-house NoSQL Search Engine • Specialized Search Engine • Homebrewed messaging system • Queue systems A new search system • Features? • Scalability? • Accessibility? • Maintenance? • Reusability? • Extensibility? • Cost effective? Archive Analysis
  6. 6. The feature requirements for the new search system • Full text search • Highlighting • Faceting • Molecule formula search • 2D similarity search • Molecule superstructure/substructure search • Joins, cascading joins to search wide and deep • Transfer search result effectively across services
  7. 7. We can make the feature complete in SOLR! • Full text search(SOLR) • Highlighting(SOLR) • Faceting(SOLR) • Molecule formula search (implement MF search in SOLR) • 2D similarity search (implement 2D fingerprint search in SOLR) • Molecule superstructure/substructure search (SOLR-5244) • Joins, cascading joins to search wide and deep (SOLR-4787) • Transfer search result effectively across services(SOLR-4787, SOLR-5244)
  8. 8. Architecture
  9. 9. The Backend • Backend Components (SOLR+SQL+ Specialized search engine) – Configuration – Importing pipeline • Dumping & Importing (SGE Farm) • DIH (jdbc) – Replication – Warm up • Web API – Encapsulate the backend implementation – Load balancing and throttling – Generic data model for heterogeneous document – Query language
  10. 10. The Frontend • Easier to develop or expand based on modern web technologies. – One backend, multiple frontends – One data model, multiple presentations • UI/UX design – MVC – Reusability – Mobile browser friendly – Interactivity & Accessibility
  11. 11. The Frontends • PubChem widgets (beta) – A reusable UI components • PubChem new search (beta) – A new search system that delivers multiple search features
  12. 12. Briefly on UI architecture • PubChem widgets as an example
  13. 13. Demo : PubChem widget • http://jsfiddle.net/Gtbg7/ PubChem.widget.CreateGridTable({ gridtabletype: 'pcassay', cid: 2244, renderTo: ‘table’, width: "90%", height: 400});
  14. 14. More PubChem widgets
  15. 15. Demo : PubChem Search • https://pubchem.ncbi.nlm.nih.gov/search/ Desktop Mobile
  16. 16. Faceting Molecular Formula Search Super/sub Structure Search Full-text Search Brief Summary on PubChem Search Demo
  17. 17. Thanks • Yu Bo • Renata Geer • Asta Gindulyte • Siqian He • Paul Thiessen • Jiyao Wang • Jeff Zhang • Steve Bryant • Lewis Geer • Evan Bolton • Yanli Wang • NCBI IEB and IRB This research was supported [in part] by the Intramural Research Program of the NIH, National Library of Medicine.
  18. 18. Questions About this talk: hanl@mail.nih.gov PubChem: https://www.facebook.com/pubchem NCBI: https://www.facebook.com/ncbi.nlm

×