BioMart 0.8 offers new tools, moreinterfaces, and increased flexibilitythrough plugins             Junjun Zhang       BOSC...
BioMart: an open source federateddata management system•  Widely used by public/private biological databases•  Quickly bri...
BioMart 0.8 new features •  Integrated Java application makes it possible to build a    BioMart data source, configure que...
Basic BioMart Concepts – thePower of SimplicityBuilding	  or	  querying	  a	  BioMart	  data	  source	  only	  requires	  ...
BioMart dataset is organized in a reversestar schema                                            5
3NF normalized database can be converted toreversed star schema                                                   Source	 ...
BioMart system components                                          Client-­‐side	                                         ...
MartConfigurator – an integrated toolfor setting up, configuring andmanaging a BioMart server                             ...
BioMart 0.8 provides several data querying GUIs                    MartForm                                               ...
BioMart 0.8 provides several data querying GUIs                    MartWizard                                             ...
BioMart 0.8 provides several data querying GUIs                    MartExplorer                                           ...
Programmatic access API query syntax at the clickof a button                                                    12
Special GUI - MartReportEnsemblKEGGReactomeMutation frequencies fromcancer projects with datadistributed around the globeC...
Special GUI - MartAnalysis                 Mostly affected pathways                                            14
Special GUI – MartAnalysis      Genomic sequence retrieval tool                                        Sequence retrieval ...
New query type - AnalysisQuery against ‘affected_pathways’ analysis:<Query>       <Analysis name="affected_pathways" datas...
Several large collaborative projects areusing BioMart for data management•  BioMart Central Portal (http://central.biomart...
BioMart Central Portal    (central.biomart.org)                         First-­‐of-­‐its	  kind,	  community-­‐driven	  eff...
BioMart Portal provides access to a collectionof data sources                                       “Master/Slave” like   ...
International Cancer Genome Consortium Data Portal        CANADA                                              EU / UNITED ...
ICGC Data Portal Architecture          “Peer-to-Peer” like                                21
(dcc.icgc.org)                 22
Future Directions•  Creation of BioMart Central Registry to improve   coordination between BioMart servers. It will be a  ...
The BioMart team    Joachim	  Baran	      Anthony	  Cros	      Jonathan	  Guberman	        For	  support:	  users@biomart....
Upcoming SlideShare
Loading in...5
×

B07-GenomeContent-Biomart

416

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
416
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

B07-GenomeContent-Biomart

  1. 1. BioMart 0.8 offers new tools, moreinterfaces, and increased flexibilitythrough plugins Junjun Zhang BOSC 2011, Vienna, Austria July 15, 2011
  2. 2. BioMart: an open source federateddata management system•  Widely used by public/private biological databases•  Quickly bring in-house data accessible online•  User friendly and flexible querying interfaces: web GUI and programmatic access API (REST, Perl, biomaRt etc)•  Automated data conversion tool•  Effortlessly federate in-house datasets with existing public BioMart datasets www.biomart.org 2  
  3. 3. BioMart 0.8 new features •  Integrated Java application makes it possible to build a BioMart data source, configure querying and presentation interfaces, and deploy a BioMart server from a single tool (MartConfigurator) •  Support more RDBMS (MS SQL Server, DB2, in addition to MySQL, PostgreSQL, and Oracle) •  Create ‘virtual mart’ from 3NF normalized source database without materialization •  New diverse Web GUIs and APIs provide added flexibility and ease of use •  Link indexing and parallel querying optimizations •  Support several security features (HTTPS, OpenID and oAuth protocols) for managing sensitive data •  Extendable plugin framework for analysis and visualization 3
  4. 4. Basic BioMart Concepts – thePower of SimplicityBuilding  or  querying  a  BioMart  data  source  only  requires  understanding  of  a  few  basic  concepts:  •  DataSource  •  DataMart  •  DataSet  •  A;ribute    •  Filter  •  AccessPoint  (new)  •  Analysis  (new)  •  Parameter  (new)    BioMart  hides  complexity  of  underlie  database  schema  and  federaCon  mechanism.   4
  5. 5. BioMart dataset is organized in a reversestar schema 5
  6. 6. 3NF normalized database can be converted toreversed star schema Source  schema   Reverse  star  schema   6
  7. 7. BioMart system components Client-­‐side    Plugin         Query  Engine  /  Plugin   7
  8. 8. MartConfigurator – an integrated toolfor setting up, configuring andmanaging a BioMart server 8
  9. 9. BioMart 0.8 provides several data querying GUIs MartForm 9
  10. 10. BioMart 0.8 provides several data querying GUIs MartWizard 10
  11. 11. BioMart 0.8 provides several data querying GUIs MartExplorer 11
  12. 12. Programmatic access API query syntax at the clickof a button 12
  13. 13. Special GUI - MartReportEnsemblKEGGReactomeMutation frequencies fromcancer projects with datadistributed around the globeCOSMICPancreatic Expression Database(PED)Breast Cancer Campaign Tissue Bank(BCCTB) 13
  14. 14. Special GUI - MartAnalysis Mostly affected pathways 14
  15. 15. Special GUI – MartAnalysis Genomic sequence retrieval tool Sequence retrieval tool is implemented as server-side analysis plugin 15
  16. 16. New query type - AnalysisQuery against ‘affected_pathways’ analysis:<Query> <Analysis name="affected_pathways" dataset="gene_oicrPanc"> <Parameter name="biotype" value="protein_coding"/> <Parameter name="file_type" value=”png"/> <Parameter name="img_height" value="8000"/> <Parameter name="img_width" value="12000"/> </Analysis></Query>Query against ‘gene_sequence’ sequence retrieval tool:<Query> <Analysis name="gene_sequence"> <Parameter name="seq_type" value="gene_flank"/> <Parameter name="upstream_flank" value="500"/> </Analysis></Query> 16
  17. 17. Several large collaborative projects areusing BioMart for data management•  BioMart Central Portal (http://central.biomart.org)•  International Cancer Genome Consortium (http://dcc.icgc.org)•  POPCURE (collaboration with Pfizer, controlled access) 17
  18. 18. BioMart Central Portal (central.biomart.org) First-­‐of-­‐its  kind,  community-­‐driven  effort   to  provide  unified  access  to  dozens  of   biological  databases  spanning  genomics,   proteomics,  model  organisms,  cancer   data,  and  more   18
  19. 19. BioMart Portal provides access to a collectionof data sources “Master/Slave” like 19
  20. 20. International Cancer Genome Consortium Data Portal CANADA EU / UNITED Pancreatic cancer KINGDOM (Ductal adenocarcinoma) Breast cancer Prostate cancer (ER positive, HER2 negative) (Adenocarcinoma) GERMANY UNITED STATES UNITED Malignant lymphoma Bladder cancer KINGDOM (Germinal center B-cell derived lymphomas) Blood cancer Bone cancer Pediatric brain tumors (Acute myeloid leukemia) (Osteosarcoma/ (Medulloblastoma and Brain cancer chondrosarcoma/ Pediatric pilocytic (Glioblastoma multiforme/ rare subtypes) astrocytoma) CHINA lower grade glioma) Breast cancer Breast cancer (Triple negative/lobular/ Prostate cancer Gastric cancer (Intestinal- and di use-type) JAPAN (Early onset) (Ductal & lobular) other) Liver cancer Cervical cancer Chronic Myeloid Disorders (Hepatocellular carcinoma) (Squamous) (Myelodysplastic syndromes, (Virus-associated) Colon cancer myeloproliferative neoplasms (Adenocarcinoma) and other chronic myeloid Endometrial cancer malignancies) (Uterine corpus endometrial Esophageal cancer carcinoma) Prostate cancer Gastric cancer (Adenocarcinoma) Head and neck cancer EU / FRANCE (Squamous cell carcinoma/ Renal cancer Thyroid carcinoma) (Renal cell carcinoma) Renal cancer (Focus on but not limited (Renal clear cell carcinoma/ to clear cell subtype) Renal papillary carcinoma) Liver cancer ITALY AUSTRALIA (Hepatocellular carcinoma) Lung cancer FRANCE Rare pancreatic tumors (Enteropancreatic endocrine INDIA Ovarian cancer Breast cancer (Serous cystadenocarcinoma) (Adenocarcinoma/ tumors and rare pancreatic Oral cancer (Subtype de ned by an Pancreatic cancer squamous cell carcinoma) exocrine tumors) (Gingivobuccal) ampli cation of the (Ductal adenocarcinoma) Ovarian cancer Prostate cancer (Serous cystadenocarcinoma) MEXICO HER2 gene) Liver cancer Prostate cancer (Adenocarcinoma) Multiple sub-types (Hepatocellular carcinoma) SPAIN Rectal cancer (Secondary to alcohol Chronic lymphocytic (Adenocarcinoma) and adiposity) leukemia Skin cancer Prostate cancer (CLL with mutated and (Cutaneous melanoma) (Adenocarcinoma) unmutated IgVH) GOALS: To obtain a comprehensive description of genomic, transcriptomic, and epigenomic changes in 50 different tumor types and/or subtypes, which are of clinical and societal importance across the globe. 500 tumor and matched control samples will be analyzed per tumor type. At present, 12 countries joined ICGC. Data will be generated by institutions all over the world. To make the data available rapidly and with minimal restrictions, to accelerate research of the causes and control of cancer. 20
  21. 21. ICGC Data Portal Architecture “Peer-to-Peer” like 21
  22. 22. (dcc.icgc.org) 22
  23. 23. Future Directions•  Creation of BioMart Central Registry to improve coordination between BioMart servers. It will be a permanent resource where BioMart data providers can register their data models, data sources and services.•  Enhancing data transformation module for building BioMart databases from non-RDBMS data sources (e.g. flat data files, XML data files etc) with high scalability and flexibility.•  Enhancing the plugin system to allow various forms of data analysis and visualization. Third parties are encouraged to develop plugins to extend the capabilities of the system. 23
  24. 24. The BioMart team Joachim  Baran   Anthony  Cros   Jonathan  Guberman   For  support:  users@biomart.org   Jack  Hsu   Yong  Liang   Elena  Rivkin   Bre;  Whi;y   Marie  Wong-­‐Erasmus   Long  Yao   Syed  Haider   Junjun  Zhang   Arek  Kasprzyk   24
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×