0
BioMart 2007 Arek Kasprzyk European Bioinformatics Institute BOSC Vienna, July 2007
Data Flow Mart JAVA PERL Source data DAS Web GUI Command line Desktop  GUI Web Service
Data Flow JAVA PERL Mart DAS Web GUI Command line Desktop  GUI Web Service
Admin Tools
Recent developments (0.4- 0.6) <ul><li>MartBuilder </li></ul><ul><li>MartView </li></ul><ul><li>Web services </li></ul><ul...
Data Flow Mart JAVA PERL Source data DAS Web GUI Command line Desktop  GUI Web Service
MartBuilder
MartBuilder
MartBuilder
MartView
API my $initializer = BioMart::Initializer->new('registryFile'=>$confFile); my $registry = $initializer->getRegistry; my $...
Web service <Query virtualSchemaName=&quot;central_server_1&quot;> <Dataset name=&quot; hsapiens_gene_ensembl &quot; >   <...
MartService <ul><li>Meta data(GET) </li></ul><ul><ul><li>Marts </li></ul></ul><ul><ul><li>Datasets </li></ul></ul><ul><ul>...
Meta data http://www.mycompany.com/mypath/martservice ? <ul><li>Marts </li></ul><ul><ul><li>type=registry </li></ul></ul><...
Query  -O 5utr.dat <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>  <!DOCTYPE Query> <Query  virtualSchemaName ...
Results <ul><li>Ordered according to   </li></ul><ul><ul><li>Datasets </li></ul></ul><ul><ul><li>Attributes </li></ul></ul...
Genomic data
Uniprot, MSD, ArrayExpress
Model organism databases
Developmental models
Proteomics
Name Fragment Position Alleles  strand SNP1 AL139258 1659852 T/A  1 SNP2 NT_25698 2569873 C/T  -1 SNP3 chr13 1125698 C/G 1...
CAPRISA   understanding HIV pathogenesis and epidemiology as well as HIV/AIDS treatment and prevention Clinical Data MID C...
Unilever <ul><li>Human study to evaluate Omics  in assessing safety indicators  </li></ul><ul><li>Study of skin inflammati...
 
1. Filter 2. Attributes 3. Results Use Example 1  All genes in the human genome  up-regulated in Pancreatic Adenocarcinoma...
1. Filter 2. Attributes 3. Results Use Example 2  all  upstream sequences for all genes on chromosome 1  up-regulated in P...
1. Filter 2. Attributes Use Example 3  3. Results Just Finished my experiment and would like to get the overlaps  between ...
Web service
Perl
DAS
Bioconductor package biomaRt
Galaxy
Taverna
Central Server (www.biomart.org)
www.biomart.org/biomart/ martservice
Future plans
New configuration system <ul><li>Normalized </li></ul><ul><ul><li>Based on a partition table concept </li></ul></ul><ul><u...
New configuration system <ul><li>Scalability </li></ul><ul><ul><li>Updates and maintenance of large configurations </li></...
New MartGUI framework <ul><li>Components </li></ul><ul><ul><li>Alternative DS Configurations </li></ul></ul><ul><ul><li>Al...
New GUI framework <ul><li>Old ‘GUI unit’: </li></ul><ul><ul><li>full registry+MartView+default formatters </li></ul></ul><...
New GUI framework Gene Id conversion Functional annotation Compare two gene lists Analyze gene list Draw  distribution Ful...
New GUI framework Gene Id conversion Functional annotation Compare two gene lists Analyze gene list Genbank Trembl Uniprot...
New GUI framework Home Gene Id converter Fu Full search Welcome to my data mining website
New GUI framework Hugo Genebank Uniprot Swissprot Submit paste your  ids here Home Fu Full search Gene Id conversion
Cytogenetic distribution of pancreatic cancer genes satisfying my query (histogram)
Cytogenetic distribution of pancreatic cancer genes satisfying my query (ideogram)
Cytogenetic distribution of chromosomal aberrations in pancreatic cancer
 
New GUI framework
New GUI framework
New configuration tool <ul><li>MartConfigurator </li></ul><ul><ul><li>Handles a complete registry object </li></ul></ul><u...
Credits <ul><li>Martians </li></ul><ul><ul><li>Syed Haider </li></ul></ul><ul><ul><li>Richard Holland </li></ul></ul><ul><...
Upcoming SlideShare
Loading in...5
×

Biomart Update

1,041

Published on

Title: Biomart 2007
Author: Arek Kasprzyk

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,041
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
25
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Biomart Update"

  1. 1. BioMart 2007 Arek Kasprzyk European Bioinformatics Institute BOSC Vienna, July 2007
  2. 2. Data Flow Mart JAVA PERL Source data DAS Web GUI Command line Desktop GUI Web Service
  3. 3. Data Flow JAVA PERL Mart DAS Web GUI Command line Desktop GUI Web Service
  4. 4. Admin Tools
  5. 5. Recent developments (0.4- 0.6) <ul><li>MartBuilder </li></ul><ul><li>MartView </li></ul><ul><li>Web services </li></ul><ul><li>API </li></ul><ul><li>DAS </li></ul><ul><li>Central Server </li></ul><ul><li>More deployers </li></ul>
  6. 6. Data Flow Mart JAVA PERL Source data DAS Web GUI Command line Desktop GUI Web Service
  7. 7. MartBuilder
  8. 8. MartBuilder
  9. 9. MartBuilder
  10. 10. MartView
  11. 11. API my $initializer = BioMart::Initializer->new('registryFile'=>$confFile); my $registry = $initializer->getRegistry; my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>’central_server_1'); $query->setDataset(&quot; hsapiens_gene_ensembl &quot;); $query->addFilter(&quot; chromosome_name &quot;, [” 1 &quot;]); $query->addAttribute(&quot; ensembl_gene_id &quot;); $query->addAttribute(&quot; ensembl_transcript_id &quot;); $query->addAttribute(” ensembl_peptide_id &quot;); $query->setDataset(“ msd ”); $query->addFilter(” experiment_type &quot;, [” NMR &quot;]); $query->addAttribute(&quot; pdb_id &quot;); $query->addAttribute(” resolution &quot;); $query->addAttribute(” release_date &quot;); $query->addAttribute(” header &quot;); my $query_runner = BioMart::QueryRunner->new(); $query_runner->execute($query); $query_runner->printResults();
  12. 12. Web service <Query virtualSchemaName=&quot;central_server_1&quot;> <Dataset name=&quot; hsapiens_gene_ensembl &quot; > <Filter name=&quot; chromosome_name &quot; value=&quot; 1 &quot;/> <Attribute name=&quot; ensembl_gene_id &quot;/> <Attribute name=&quot; ensembl_transcript_id &quot;/> <Attribute name=&quot; ensembl_peptide_id &quot;/> </Dataset> <Dataset name=&quot; msd &quot;> <Filter name=&quot; experiment_type &quot; value=” NMR &quot;/> <Attribute name=&quot; pdb_id &quot;/> <Attribute name=” resolution &quot;/> <Attribute name=” release_date &quot;/> <Attribute name=” header &quot;/> </Dataset> </Query>
  13. 13. MartService <ul><li>Meta data(GET) </li></ul><ul><ul><li>Marts </li></ul></ul><ul><ul><li>Datasets </li></ul></ul><ul><ul><li>Configuration </li></ul></ul><ul><li>Queries (POST) </li></ul>
  14. 14. Meta data http://www.mycompany.com/mypath/martservice ? <ul><li>Marts </li></ul><ul><ul><li>type=registry </li></ul></ul><ul><li>Datasets </li></ul><ul><ul><li>type=datasets&mart=mymart </li></ul></ul><ul><li>Configuration </li></ul><ul><ul><li>type=configuration&dataset=mydataset </li></ul></ul>
  15. 15. Query -O 5utr.dat <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <!DOCTYPE Query> <Query virtualSchemaName = &quot;default&quot; count = &quot;&quot; softwareVersion = &quot;0.5&quot; > <Dataset name=&quot; hsapiens_gene_ensembl &quot; > <Attribute name=&quot; ensembl_gene_id &quot;/> <Attribute name=&quot; ensembl_transcript_id &quot;/> <Filter name=&quot; chromosome_name &quot; value=&quot; 1 &quot;/> <Filter name=&quot; band_end &quot; value=” p36.33 &quot; / > <Filter name=&quot; band_start &quot; value=” q44 &quot;/> </Dataset> <Dataset name=&quot; msd &quot;> <Attribute name=&quot; pdb_id &quot;/> <Attribute name=” experiment_type &quot;/> <Filter name=&quot; experiment_type &quot; value=” NMR &quot;/> </Dataset> </Query> wget -q 'http://www.biomart.org/biomart/martservice?query=
  16. 16. Results <ul><li>Ordered according to </li></ul><ul><ul><li>Datasets </li></ul></ul><ul><ul><li>Attributes </li></ul></ul><ul><li>Default Format TSV </li></ul><ul><ul><li>Can be altered by specifying a formatter </li></ul></ul>
  17. 17. Genomic data
  18. 18. Uniprot, MSD, ArrayExpress
  19. 19. Model organism databases
  20. 20. Developmental models
  21. 21. Proteomics
  22. 22. Name Fragment Position Alleles strand SNP1 AL139258 1659852 T/A 1 SNP2 NT_25698 2569873 C/T -1 SNP3 chr13 1125698 C/G 1 Genetics of Infectious and Autoimmune Diseases, Pasteur Institute, INSERM U730, Paris, France. Target SNP selection for the study of type 1 diabetes (T1D), malaria and dengue Data conversion and integration Ensembl HapMap NCBI UCSC Priopriatery data Diabetes-Gene Association DataBase Combined proprietary and public data
  23. 23. CAPRISA understanding HIV pathogenesis and epidemiology as well as HIV/AIDS treatment and prevention Clinical Data MID Cellular Immunity Humoral Immunity HLA Typing Sequence & Sequence Related Pipeline
  24. 24. Unilever <ul><li>Human study to evaluate Omics in assessing safety indicators </li></ul><ul><li>Study of skin inflammation in response to detergent </li></ul><ul><li>Skin samples taken and analyzed with multiple Omics techniques. </li></ul><ul><ul><li>Blood </li></ul></ul><ul><ul><li>Skin biopsy </li></ul></ul><ul><ul><li>Microdialysis </li></ul></ul>
  25. 26. 1. Filter 2. Attributes 3. Results Use Example 1 All genes in the human genome up-regulated in Pancreatic Adenocarcinomas (PDACs) vs Normal Pancreas (ND))
  26. 27. 1. Filter 2. Attributes 3. Results Use Example 2 all upstream sequences for all genes on chromosome 1 up-regulated in Pancreatic Adenocarcinomas (PDACs) vs Normal Pancreas (ND))
  27. 28. 1. Filter 2. Attributes Use Example 3 3. Results Just Finished my experiment and would like to get the overlaps between my results and those reported in previous studies !
  28. 29. Web service
  29. 30. Perl
  30. 31. DAS
  31. 32. Bioconductor package biomaRt
  32. 33. Galaxy
  33. 34. Taverna
  34. 35. Central Server (www.biomart.org)
  35. 36. www.biomart.org/biomart/ martservice
  36. 37. Future plans
  37. 38. New configuration system <ul><li>Normalized </li></ul><ul><ul><li>Based on a partition table concept </li></ul></ul><ul><ul><li>Unified pointer system -> relational attribute </li></ul></ul><ul><ul><li>Configuration merge - implicit federation </li></ul></ul><ul><ul><li>Write to the db </li></ul></ul><ul><li>Run time slice and dice of a registry object rather than combinatorial pre-compilation </li></ul>
  38. 39. New configuration system <ul><li>Scalability </li></ul><ul><ul><li>Updates and maintenance of large configurations </li></ul></ul><ul><ul><li>Run time server scalability (cache and memory) </li></ul></ul><ul><ul><li>Scalable for multiple mart users (single instance - security) </li></ul></ul><ul><ul><li>Scalable for alternative configurations (new MartGUI framework) </li></ul></ul>
  39. 40. New MartGUI framework <ul><li>Components </li></ul><ul><ul><li>Alternative DS Configurations </li></ul></ul><ul><ul><li>Alternative GUIs (MView, MQForm,MSForm etc) </li></ul></ul><ul><ul><li>Alternative Analyzers/Vizualizers (optional install) </li></ul></ul><ul><li>Extensible </li></ul><ul><ul><li>Custom extensions to the components </li></ul></ul><ul><li>Common interface </li></ul><ul><ul><li>Formatters, DAS, Analyzers, Visualizers </li></ul></ul><ul><ul><li>Importable/Exportable pair interface </li></ul></ul>
  40. 41. New GUI framework <ul><li>Old ‘GUI unit’: </li></ul><ul><ul><li>full registry+MartView+default formatters </li></ul></ul><ul><ul><li>Customization limited to colors and headers </li></ul></ul><ul><li>New ‘GUI unit’: </li></ul><ul><ul><li>RegistrySlice+ MartGUI+Visualizer/Analyzer </li></ul></ul><ul><ul><li>Combine units into your unique functional environment </li></ul></ul><ul><ul><li>Functional level customization </li></ul></ul>
  41. 42. New GUI framework Gene Id conversion Functional annotation Compare two gene lists Analyze gene list Draw distribution Full search Draw bla bla chart Home Welcome to my data mining website SITE HEADER
  42. 43. New GUI framework Gene Id conversion Functional annotation Compare two gene lists Analyze gene list Genbank Trembl Uniprot Submit Draw distribution Full search paste your ids here Draw bla bla chart Hugo Home SITE HEADER
  43. 44. New GUI framework Home Gene Id converter Fu Full search Welcome to my data mining website
  44. 45. New GUI framework Hugo Genebank Uniprot Swissprot Submit paste your ids here Home Fu Full search Gene Id conversion
  45. 46. Cytogenetic distribution of pancreatic cancer genes satisfying my query (histogram)
  46. 47. Cytogenetic distribution of pancreatic cancer genes satisfying my query (ideogram)
  47. 48. Cytogenetic distribution of chromosomal aberrations in pancreatic cancer
  48. 50. New GUI framework
  49. 51. New GUI framework
  50. 52. New configuration tool <ul><li>MartConfigurator </li></ul><ul><ul><li>Handles a complete registry object </li></ul></ul><ul><ul><li>Defines GUI units </li></ul></ul><ul><ul><li>Automated service discovery </li></ul></ul><ul><ul><li>Manual link override </li></ul></ul><ul><ul><li>Automated updates for large configurations </li></ul></ul><ul><ul><li>Improved user interaction </li></ul></ul>
  51. 53. Credits <ul><li>Martians </li></ul><ul><ul><li>Syed Haider </li></ul></ul><ul><ul><li>Richard Holland </li></ul></ul><ul><ul><li>Damian Smedley </li></ul></ul><ul><li>Contributors </li></ul><ul><ul><li>Steffen Durinck (NCI, NIH) </li></ul></ul><ul><ul><li>Eric Just (Northwestern University) </li></ul></ul><ul><ul><li>Don Gilbert (Indiana University) </li></ul></ul><ul><ul><li>Darin London (Duke University) </li></ul></ul><ul><ul><li>Will Spooner (CSHL) </li></ul></ul><ul><ul><li>Gudmundur Thorisson (CSHL) </li></ul></ul><ul><ul><li>Benoit Ballester (Universite de la Mediterranee) </li></ul></ul><ul><ul><li>James Smith (Ensembl) </li></ul></ul><ul><ul><li>Arne Stabenau (Ensembl) </li></ul></ul><ul><ul><li>Andreas Kahari (Ensembl) </li></ul></ul><ul><ul><li>Craig Melsopp (Ensembl) </li></ul></ul><ul><ul><li>Katerina Tzouvara (EBI) </li></ul></ul><ul><ul><li>Paul Donlon (Unilever ) </li></ul></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×