Trip Report Seattle


Published on

The primary goal of my trip to Seattle was to establish a collaboration with a world-leading group on data integration. But by having chosen Seattle, a hub for technology companies, I also learned about synergies between business and research: Ilya Shmulevich from the Institute for Systems Biology makes use of Amazon's ''Random Forest" implementation and Google's 600.000 CPU cluster for cancer genomic association discovery. I also met with experts from University of Washington and Microsoft research to learn about technological advancements to tackle BigData and commoditizing parallelization. Finally, I observed a government funded research agency invest in solutions geared towards their enterprise structure rather than adopt solutions designed for research institutes without active computational community. In conclusion: CSIRO has unique properties and skill-sets that many collaborators would be interested in benefiting from, in return such collaborations would propel CSIRO instantly to the forefront of technology, which in particular for the analysis of big, unstructured datasets could be very rewarding.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Trip Report Seattle

    1. 1. Seattle Trip ReportData Integration – Company Engagement – BigDataDenis C. Bauer | Research Scientist19 November 2012CMIS
    2. 2. About me• BSc (Germany) Bioinformatics + Hons (ITEE, UQ) “In Silico Protein Design” Machine Learning• PhD (IMB, UQ) “Quantitative models of Transcriptional regulation” Optimization• PostDoc (IMB, UQ) “Sorting the intranuclear proteom” Bayesian Networks• PostDoc (QBI, UQ) Bioinformatics for the Sequencing Facility Operation • Research Scientist (CSIRO) “Data integration of ‘Omics data in CRC” • Develop protocols for data generation • Develop pipelines for analysis • Research ways for data integration pHealth (Garry Hannan)
    3. 3. Seattle: Future hub for life sciences?Seattle Trip Report | Denis C. Bauer | Page 3
    4. 4. Primary Goal: Collaboration withWilliam Noble Bayesian Network for automatic grouping of genomic functional elements (TSS, gene) by learning simultaneously from measured genomic features (histone Bill Noble modifications) Michael HoffmanSeattle Trip Report | Denis C. Bauer | Page 4
    5. 5. Segway: predictions Histone Modifications H2M3 x0x00x0xx00x00x000x000x00x000000x00x00xxxxx00xx0x00xxxx0x00x0xx00x00x000x000x00x000000x00x00xxxxx00xx0x00xxxx0x00x0xx00x00x000x000x00x0 H3M4 x0x00x0xx00x00x000x000x00x000000x00x00xxxxx00xx0x00xxxx0x00x0xx00x00x000x000x00x000000x00x00xxxxx00xx0x00xxxx0x00x0xx00x00x000x000x00x0 H3M4 0x0xx00x00x000x000x00x000000x00x00xxxxx00xx0x00xxxx0x00x0xx00x00x000x000x00x000000x00x00xxxxx00xx0x00xxxx0x00x0xx00x00x000x000x00x00000 Bayesian Network Train Segmentation & Classification AnnotationPresentation title | Presenter name | Page 5
    6. 6. Institute for Systems Biology: case studyfor BigData TCGA has 20 different cancer types with up to 900 samples each. • Faster computers • Better approachesAmazon: machine learning method for uncovering Ilya Shmulevichmultivariate associations from large and diverse data sets.Google: Use 10.000 – 600.000 cores and benefit fromGoogle expertise in compute and storage.Seattle Trip Report | Denis C. Bauer | Page 6
    7. 7. ISB App Engine Presentation at Google IO 2012 Seattle Trip Report | Denis C. Bauer | Page 7
    8. 8. Focusing on large scale and tactile interactive experiences that engross and envelope the visitor, Philip Worthington (1977-) created Shadow Monsters, a digital version of the traditional shadow puppet.Seattle Trip Report | Denis C. Bauer | Page 8
    9. 9. Can CSIRO use outline-detection to do cool stuff ?Seattle Trip Report | Denis C. Bauer | Page 9
    10. 10. Road Trip to Pacific Northwestern National LaboratoryPresentation title | Presenter name | Page 10
    11. 11. Road Trip to PNNLPresentation title | Presenter name | Page 11
    12. 12. Road Trip to PNNLPresentation title | Presenter name | Page 12
    13. 13. Road Trip to PNNLPresentation title | Presenter name | Page 13
    14. 14. Road Trip to PNNLPresentation title | Presenter name | Page 14
    15. 15. Road Trip to PNNLPresentation title | Presenter name | Page 15
    16. 16. Enterprise-wide multidisciplinarycollaborationsPNNL predicts from sensor data if and whenradioactive material hits ground water.Mathematical and visual prediction methods ofcompute-intensive expert systemsIan’s team develops a framework that allowsenterprise wide collaboration • Data sharing/annotation/provenance • Computational expert pipelines -> graphical programming -> domain experts • Developed for computer-grid infrastructure Ian GortonSeattle Trip Report | Denis C. Bauer | Page 16
    17. 17. Commoditize parallelization Computer Science & Engineering University of WashingtonCurrently: Expert-system if !(embarrassingly parallel) • Deciding how to most efficiently bundle for parallel execution and how to resolve • The appropriate method can change with the actual load at runtimeParallelization needs to become something thecompiler at run time works out for us(just like we don’t write assembly code anymore) • SciDB • SKEWTUNE (better load for Hadoop) • HaLoop (Iterative parallele Data Processing) Magdalena BalazinskaPresentation title | Presenter name | Page 17
    18. 18. Commoditize parallelization (andvisualization)HDInsight Hadoop on windows Server and Azure Integration with excelPowerView Interactive graphicsSeattle Trip Report | Denis C. Bauer | Page 18
    19. 19. Collaboration options • GS (Bill): Bayesian Network • ISB (Ilya): Variant association • CS (Magda): Iterative parallelization • PNNL (Ian): Graphical programming FrameworkThank youCMISDenis C. BauerResearch Scientistt +61 2 9325 3174E Denis.Bauer@csiro.auw
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.