Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. BioMart 0.8 offers new tools, moreinterfaces, and increased flexibilitythrough plugins Junjun Zhang BOSC 2011, Vienna, Austria July 15, 2011
  2. 2. BioMart: an open source federateddata management system•  Widely used by public/private biological databases•  Quickly bring in-house data accessible online•  User friendly and flexible querying interfaces: web GUI and programmatic access API (REST, Perl, biomaRt etc)•  Automated data conversion tool•  Effortlessly federate in-house datasets with existing public BioMart datasets www.biomart.org 2  
  3. 3. BioMart 0.8 new features •  Integrated Java application makes it possible to build a BioMart data source, configure querying and presentation interfaces, and deploy a BioMart server from a single tool (MartConfigurator) •  Support more RDBMS (MS SQL Server, DB2, in addition to MySQL, PostgreSQL, and Oracle) •  Create ‘virtual mart’ from 3NF normalized source database without materialization •  New diverse Web GUIs and APIs provide added flexibility and ease of use •  Link indexing and parallel querying optimizations •  Support several security features (HTTPS, OpenID and oAuth protocols) for managing sensitive data •  Extendable plugin framework for analysis and visualization 3
  4. 4. Basic BioMart Concepts – thePower of SimplicityBuilding  or  querying  a  BioMart  data  source  only  requires  understanding  of  a  few  basic  concepts:  •  DataSource  •  DataMart  •  DataSet  •  A;ribute    •  Filter  •  AccessPoint  (new)  •  Analysis  (new)  •  Parameter  (new)    BioMart  hides  complexity  of  underlie  database  schema  and  federaCon  mechanism.   4
  5. 5. BioMart dataset is organized in a reversestar schema 5
  6. 6. 3NF normalized database can be converted toreversed star schema Source  schema   Reverse  star  schema   6
  7. 7. BioMart system components Client-­‐side    Plugin         Query  Engine  /  Plugin   7
  8. 8. MartConfigurator – an integrated toolfor setting up, configuring andmanaging a BioMart server 8
  9. 9. BioMart 0.8 provides several data querying GUIs MartForm 9
  10. 10. BioMart 0.8 provides several data querying GUIs MartWizard 10
  11. 11. BioMart 0.8 provides several data querying GUIs MartExplorer 11
  12. 12. Programmatic access API query syntax at the clickof a button 12
  13. 13. Special GUI - MartReportEnsemblKEGGReactomeMutation frequencies fromcancer projects with datadistributed around the globeCOSMICPancreatic Expression Database(PED)Breast Cancer Campaign Tissue Bank(BCCTB) 13
  14. 14. Special GUI - MartAnalysis Mostly affected pathways 14
  15. 15. Special GUI – MartAnalysis Genomic sequence retrieval tool Sequence retrieval tool is implemented as server-side analysis plugin 15
  16. 16. New query type - AnalysisQuery against ‘affected_pathways’ analysis:<Query> <Analysis name="affected_pathways" dataset="gene_oicrPanc"> <Parameter name="biotype" value="protein_coding"/> <Parameter name="file_type" value=”png"/> <Parameter name="img_height" value="8000"/> <Parameter name="img_width" value="12000"/> </Analysis></Query>Query against ‘gene_sequence’ sequence retrieval tool:<Query> <Analysis name="gene_sequence"> <Parameter name="seq_type" value="gene_flank"/> <Parameter name="upstream_flank" value="500"/> </Analysis></Query> 16
  17. 17. Several large collaborative projects areusing BioMart for data management•  BioMart Central Portal (http://central.biomart.org)•  International Cancer Genome Consortium (http://dcc.icgc.org)•  POPCURE (collaboration with Pfizer, controlled access) 17
  18. 18. BioMart Central Portal (central.biomart.org) First-­‐of-­‐its  kind,  community-­‐driven  effort   to  provide  unified  access  to  dozens  of   biological  databases  spanning  genomics,   proteomics,  model  organisms,  cancer   data,  and  more   18
  19. 19. BioMart Portal provides access to a collectionof data sources “Master/Slave” like 19
  20. 20. International Cancer Genome Consortium Data Portal CANADA EU / UNITED Pancreatic cancer KINGDOM (Ductal adenocarcinoma) Breast cancer Prostate cancer (ER positive, HER2 negative) (Adenocarcinoma) GERMANY UNITED STATES UNITED Malignant lymphoma Bladder cancer KINGDOM (Germinal center B-cell derived lymphomas) Blood cancer Bone cancer Pediatric brain tumors (Acute myeloid leukemia) (Osteosarcoma/ (Medulloblastoma and Brain cancer chondrosarcoma/ Pediatric pilocytic (Glioblastoma multiforme/ rare subtypes) astrocytoma) CHINA lower grade glioma) Breast cancer Breast cancer (Triple negative/lobular/ Prostate cancer Gastric cancer (Intestinal- and di use-type) JAPAN (Early onset) (Ductal & lobular) other) Liver cancer Cervical cancer Chronic Myeloid Disorders (Hepatocellular carcinoma) (Squamous) (Myelodysplastic syndromes, (Virus-associated) Colon cancer myeloproliferative neoplasms (Adenocarcinoma) and other chronic myeloid Endometrial cancer malignancies) (Uterine corpus endometrial Esophageal cancer carcinoma) Prostate cancer Gastric cancer (Adenocarcinoma) Head and neck cancer EU / FRANCE (Squamous cell carcinoma/ Renal cancer Thyroid carcinoma) (Renal cell carcinoma) Renal cancer (Focus on but not limited (Renal clear cell carcinoma/ to clear cell subtype) Renal papillary carcinoma) Liver cancer ITALY AUSTRALIA (Hepatocellular carcinoma) Lung cancer FRANCE Rare pancreatic tumors (Enteropancreatic endocrine INDIA Ovarian cancer Breast cancer (Serous cystadenocarcinoma) (Adenocarcinoma/ tumors and rare pancreatic Oral cancer (Subtype de ned by an Pancreatic cancer squamous cell carcinoma) exocrine tumors) (Gingivobuccal) ampli cation of the (Ductal adenocarcinoma) Ovarian cancer Prostate cancer (Serous cystadenocarcinoma) MEXICO HER2 gene) Liver cancer Prostate cancer (Adenocarcinoma) Multiple sub-types (Hepatocellular carcinoma) SPAIN Rectal cancer (Secondary to alcohol Chronic lymphocytic (Adenocarcinoma) and adiposity) leukemia Skin cancer Prostate cancer (CLL with mutated and (Cutaneous melanoma) (Adenocarcinoma) unmutated IgVH) GOALS: To obtain a comprehensive description of genomic, transcriptomic, and epigenomic changes in 50 different tumor types and/or subtypes, which are of clinical and societal importance across the globe. 500 tumor and matched control samples will be analyzed per tumor type. At present, 12 countries joined ICGC. Data will be generated by institutions all over the world. To make the data available rapidly and with minimal restrictions, to accelerate research of the causes and control of cancer. 20
  21. 21. ICGC Data Portal Architecture “Peer-to-Peer” like 21
  22. 22. (dcc.icgc.org) 22
  23. 23. Future Directions•  Creation of BioMart Central Registry to improve coordination between BioMart servers. It will be a permanent resource where BioMart data providers can register their data models, data sources and services.•  Enhancing data transformation module for building BioMart databases from non-RDBMS data sources (e.g. flat data files, XML data files etc) with high scalability and flexibility.•  Enhancing the plugin system to allow various forms of data analysis and visualization. Third parties are encouraged to develop plugins to extend the capabilities of the system. 23
  24. 24. The BioMart team Joachim  Baran   Anthony  Cros   Jonathan  Guberman   For  support:  users@biomart.org   Jack  Hsu   Yong  Liang   Elena  Rivkin   Bre;  Whi;y   Marie  Wong-­‐Erasmus   Long  Yao   Syed  Haider   Junjun  Zhang   Arek  Kasprzyk   24