SlideShare a Scribd company logo
Why HPCs hate biologists
(and what we‟re doing about it)

        C. Titus Brown
      Assistant Professor
    CSE, MMG, BEACON
   Michigan State University
       September 2012
        ctb@msu.edu
Outline
 The basic problem(s) – resequencing and
 assembly

 Data processing & data flow


 Compression approaches


 Some thoughts for the future
Shotgun genomics
 Collect samples;


 Extract DNA;


 Feed into sequencer;


 Computationally analyze.




                      Wikipedia: Environmental shotgun sequencing.p
Analogy: shredding books
        It was the best of times, it was the wor
          , it was the worst of times, it was the
          isdom, it was the age of foolishness
        mes, it was the age of wisdom, it was th



It was the best of times, it was the worst of times, it was
     the age of wisdom, it was the age of foolishness

          …but for lots and lots of fragments!
Sequencers also produce
errors…
         It was the Gest of times, it was the wor
            , it was the worst of timZs, it was the
            isdom, it was the age of foolisXness
           , it was the worVt of times, it was the
         mes, it was Ahe age of wisdom, it was th
          It was the best of times, it Gas the wor
         mes, it was the age of witdom, it was th
             isdom, it was tIe age of foolishness



It was the best of times, it was the worst of times, it was the
         age of wisdom, it was the age of foolishness
Three basic problems
    Resequencing, counting, and assembly.
1. Resequencing analysis
  We know a reference genome, and want to find
  variants (blue) in a background of errors (red)
2. Counting
  We have a reference genome (or gene set) and
  want to know how much we have. Think gene
             expression/microarrays.
3. Assembly
 We don‟t have a genome or any reference, and we
              want to construct one.
  (This is how all new genomes are sequenced.)
Noisy observations <->
information
         It was the Gest of times, it was the wor
            , it was the worst of timZs, it was the
            isdom, it was the age of foolisXness
           , it was the worVt of times, it was the
         mes, it was Ahe age of wisdom, it was th
          It was the best of times, it Gas the wor
         mes, it was the age of witdom, it was th
             isdom, it was tIe age of foolishness



It was the best of times, it was the worst of times, it was the
         age of wisdom, it was the age of foolishness
The scale of the problem is stunning.
 I estimate a worldwide capacity for DNA
  sequencing of 15 petabases/yr (it‟s probably
  larger).
 Individual labs can generate ~100 Gbp in ~1
  week for $10k.
 This sequencing is at a boutique level:
   Sequencing formats are semi-standard.
   Basic analysis approaches are ~80% cookbook.
   Every biological prep, problem, and analysis is
   different.
 Traditionally, biologists receive no training in
  computation
 …and our computational infrastructure is
  optimizing for high performance computing, not
“Three types of data scientists.”
      (Bob Grossman, U. Chicago, at XLDB)

1. Your data gathering rate is slower than Moore‟s
   Law.
2. Your data gathering rate matches Moore‟s Law.
3. Your data gathering rate exceeds Moore‟s Law.
http://www.genome.gov/sequencingcosts/
“Three types of data scientists.”

1. Your data gathering rate is slower than Moore‟s
Law.
                         => Be lazy, all will work out.

2. Your data gathering rate matches Moore‟s Law.
     => You need to write good software, but all will
                                          work out.

3. Your data gathering rate exceeds Moore‟s Law.
                          => You need serious help.
A few use cases
1.   Real-time pathogen analysis
2.   Cancer genome analysis => diagnosis &
     treatment regimen
3.   Evolution of drug resistance in HIV
4.   Gene expression analysis in agricultural animals
5.   Examining microbial community change in
     response to agriculture and/or global climate
     change
6.   Gene discovery & genome sequencing in non-
     model organisms
Three basic problems
    Resequencing, counting, and assembly.
~2 GB – 2 TB of single-chassis RAM


                                                           "Information"
                                                             "Information"
     Raw data                              "Information"
                            Analysis                           "Information"
   (~10-100 GB)                                ~1 GB             "Information"
                                                                    Database &
                                                                    integration




A few interesting computational challenges:

 - How do we store raw data for (1) provenance and (2)
reanalysis?
 - What kind of infrastructure (hardware/software) is the
“right” approach?
 - Can we reduce the cost of analysis?
A few use cases
1.   Real-time pathogen analysis
2.   Cancer genome analysis => diagnosis &
     treatment regimen
3.   Evolution of drug resistance in HIV
4.   Gene expression analysis in agricultural animals
5.   Examining microbial community change in
     response to agriculture and/or global climate
     change
6.   Gene discovery & genome sequencing in non-
     model organisms
What kind of approaches?

 Hardware solutions may not be appropriate
   Everyone in biology/biomedical informatics has or
    will have these data sets; need “commodity”
    solutions (=> ~cloud?)
   …but current commodity hardware is optimized for
    processing power, while memory and I/O is
    expensive.


 Are there algorithmic approaches with which we
 can apply leverage?
"Information"
                                                             "Information"
     Raw data     Compression              "Information"
                                Analysis                       "Information"
   (~10-100 GB)     (~2 GB)                    ~1 GB             "Information"
                                                                    Database &
                                                                    integration




A software & algorithms approach: can we develop
lossy compression approaches that

1. Reduce data size & remove errors => efficient
   processing?
2. Retain all “information”? (think JPEG)

If so, then we canShort answer the compressed data for
                   store only is: yes, we can.
later reanalysis.
My research driven by my
problems…
 Est ~50 Tbp to comprehensively sample the
  microbial composition of a gram of soil.
 Currently we have approximately 2 Tbp spread
  across 9 soil samples, for one project; 1 Tbp
  across 10 samples for another.

 Need 3 TB RAM on single chassis to do
  assembly of 300 Gbp.
 …estimate 500 TB RAM for 50 Tbp of sequence.


As it turns out, if we can solve that problem, we can
                     solve the rest 
My lab at MSU:
Theoretical => applied solutions.




Theoretical advances
                         Practically useful & usable          Demonstrated
in data structures and
                         implementations, at scale.    effectiveness on real data.
      algorithms
1. CountMin Sketch
    To add element: increment associated counter at all hash locales
    To get count: retrieve minimum counter across all hash locales




                       http://highlyscalable.wordpress.com/2012/0
                       5/01/probabilistic-structures-web-analytics-
                       data-mining/
2. Online, streaming, lossy                  (NOVEL)

compression.




 Transcriptomes, microbial genomes incl
 MDA, and most metagenomes can be assembled
 in under 50 GB of RAM, with identical or
 improved results.

 Core algorithm is single pass, “low” memory.
Streaming Twitter analysis.
(NOVEL)

3. Compressible de Bruijn graphs

            1%
                      5%




           10%       15%




                           Jason Pell & Arend Hintze
Concluding thoughts
 Our approaches provide significant and
  substantial practical and theoretical leverage to
  some really challenging problems.
 They provide a path to the future:
   Many-core compatible; distributable?
   Decreased memory footprint => cloud computing
   can be used for many analyses.
 They are in use, ~dozens of labs using digital
  normalization.
 …although we‟re still in the process of publishing
  them.
There is nothing up my sleeves.
Everything discussed here:
 Code: github.com/ged-lab/ ; BSD license
 Blog: http://ivory.idyll.org/blog („titus brown blog‟)
 Twitter: @ctitusbrown
 Grants on Lab Web site:
  http://ged.msu.edu/interests.html
 Preprints: on arXiv, q-bio:
  „diginorm arxiv‟

      …and I welcome e-mails, ctb@msu.edu
Thank you for the invitation!

Acknowledgements
Lab members involved                Collaborators
   Adina Howe (w/Tiedje)            Jim Tiedje, MSU
   Jason Pell
   Arend Hintze                     Billie Swalla, UW
   Rosangela Canino-                Janet Jansson, LBNL
    Koning
   Qingpeng Zhang                   Susannah Tringe, JGI
   Elijah Lowe
   Likit Preeyanon                 Funding
   Jiarong Guo
   Tim Brom                         USDA NIFA; NSF IOS;
   Kanchan Pavangadkar                   BEACON.
   Eric McDonald

More Related Content

What's hot

2013 duke-talk
2013 duke-talk2013 duke-talk
2013 duke-talk
c.titus.brown
 
U Florida / Gainesville talk, apr 13 2011
U Florida / Gainesville  talk, apr 13 2011U Florida / Gainesville  talk, apr 13 2011
U Florida / Gainesville talk, apr 13 2011
c.titus.brown
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
c.titus.brown
 
2013 alumni-webinar
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinar
c.titus.brown
 
2013 talk at TGAC, November 4
2013 talk at TGAC, November 42013 talk at TGAC, November 4
2013 talk at TGAC, November 4
c.titus.brown
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
c.titus.brown
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
c.titus.brown
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Keith Bradnam
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
c.titus.brown
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
mikaelhuss
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
GenomeInABottle
 
2013 stamps-assembly-methods.pptx
2013 stamps-assembly-methods.pptx2013 stamps-assembly-methods.pptx
2013 stamps-assembly-methods.pptx
c.titus.brown
 
The art of good science writing
The art of good science writingThe art of good science writing
The art of good science writing
Keith Bradnam
 
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim PoterbaScaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
Databricks
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
Adina Chuang Howe
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenome
c.titus.brown
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
Adina Chuang Howe
 
2013 siam-cse-big-data
2013 siam-cse-big-data2013 siam-cse-big-data
2013 siam-cse-big-data
c.titus.brown
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Data Driven Innovation
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
c.titus.brown
 

What's hot (20)

2013 duke-talk
2013 duke-talk2013 duke-talk
2013 duke-talk
 
U Florida / Gainesville talk, apr 13 2011
U Florida / Gainesville  talk, apr 13 2011U Florida / Gainesville  talk, apr 13 2011
U Florida / Gainesville talk, apr 13 2011
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
 
2013 alumni-webinar
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinar
 
2013 talk at TGAC, November 4
2013 talk at TGAC, November 42013 talk at TGAC, November 4
2013 talk at TGAC, November 4
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
2013 stamps-assembly-methods.pptx
2013 stamps-assembly-methods.pptx2013 stamps-assembly-methods.pptx
2013 stamps-assembly-methods.pptx
 
The art of good science writing
The art of good science writingThe art of good science writing
The art of good science writing
 
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim PoterbaScaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenome
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
2013 siam-cse-big-data
2013 siam-cse-big-data2013 siam-cse-big-data
2013 siam-cse-big-data
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 

Viewers also liked

XBRL in Oracle 11i and R12
XBRL in Oracle 11i and R12XBRL in Oracle 11i and R12
XBRL in Oracle 11i and R12
Mahesh Vallampati
 
Private Work: How to Secure a Fair Contract + Get Paid
Private Work: How to Secure a Fair Contract + Get PaidPrivate Work: How to Secure a Fair Contract + Get Paid
Private Work: How to Secure a Fair Contract + Get Paid
Kegler Brown Hill + Ritter
 
Team8presentation
Team8presentationTeam8presentation
Team8presentation
guestc6ac983
 
2011 rwc webquest
2011 rwc webquest2011 rwc webquest
2011 rwc webquest
Takahe One
 
Section 1031 For Legistlative Review 12.16.09
Section 1031 For Legistlative Review 12.16.09Section 1031 For Legistlative Review 12.16.09
Section 1031 For Legistlative Review 12.16.09
Edmund_Wheeler
 
Pagerank
PagerankPagerank
Pagerank
ursistri
 
Experiments in Web 2.0: creative communications and digital footprints
Experiments in Web 2.0: creative communications and digital footprints Experiments in Web 2.0: creative communications and digital footprints
Experiments in Web 2.0: creative communications and digital footprints
Judith Baines
 
Power Point Gov
Power Point GovPower Point Gov
Power Point Gov
arii827
 
Retirement Planning
Retirement PlanningRetirement Planning
Retirement Planning
CamiloSilva
 
Roll Over Power Point Advanced Edu Safety Gew 6 10 09
Roll Over Power Point Advanced Edu Safety Gew 6 10 09Roll Over Power Point Advanced Edu Safety Gew 6 10 09
Roll Over Power Point Advanced Edu Safety Gew 6 10 09
George Wendleton
 
Buscadores (Fodehum)
Buscadores (Fodehum)Buscadores (Fodehum)
Buscadores (Fodehum)
grupo3fodehum
 
Organic fertilization and microbial dynamics
Organic fertilization and microbial dynamicsOrganic fertilization and microbial dynamics
Organic fertilization and microbial dynamics
juveultra
 
Portage County Health Literacy Coalition
Portage County Health Literacy CoalitionPortage County Health Literacy Coalition
Portage County Health Literacy Coalition
Sarah Halstead
 
RSC &amp; RIRG
RSC &amp; RIRGRSC &amp; RIRG
RSC &amp; RIRG
rockspot
 
2011 Ohio Hispanic Business Summit
2011 Ohio Hispanic Business Summit2011 Ohio Hispanic Business Summit
2011 Ohio Hispanic Business Summit
Kegler Brown Hill + Ritter
 
Legal Issues Important for Doing Business in the U.S. | Martijn Steger
Legal Issues Important for Doing Business in the U.S. | Martijn StegerLegal Issues Important for Doing Business in the U.S. | Martijn Steger
Legal Issues Important for Doing Business in the U.S. | Martijn Steger
Kegler Brown Hill + Ritter
 
Underage Drinking Parties in San Antonio 2016
Underage Drinking Parties in San Antonio 2016Underage Drinking Parties in San Antonio 2016
Underage Drinking Parties in San Antonio 2016
Circles of San Antonio Community Coalition
 
Seniorforsker Uffe Jørgensen; Aarhus Universitet
Seniorforsker Uffe Jørgensen; Aarhus UniversitetSeniorforsker Uffe Jørgensen; Aarhus Universitet
Seniorforsker Uffe Jørgensen; Aarhus Universitet
Bertel Bolt-Jørgensen
 
Nobody laughed
Nobody laughedNobody laughed
Nobody laughed
Takahe One
 

Viewers also liked (20)

XBRL in Oracle 11i and R12
XBRL in Oracle 11i and R12XBRL in Oracle 11i and R12
XBRL in Oracle 11i and R12
 
Private Work: How to Secure a Fair Contract + Get Paid
Private Work: How to Secure a Fair Contract + Get PaidPrivate Work: How to Secure a Fair Contract + Get Paid
Private Work: How to Secure a Fair Contract + Get Paid
 
Team8presentation
Team8presentationTeam8presentation
Team8presentation
 
2011 rwc webquest
2011 rwc webquest2011 rwc webquest
2011 rwc webquest
 
Section 1031 For Legistlative Review 12.16.09
Section 1031 For Legistlative Review 12.16.09Section 1031 For Legistlative Review 12.16.09
Section 1031 For Legistlative Review 12.16.09
 
Pagerank
PagerankPagerank
Pagerank
 
Experiments in Web 2.0: creative communications and digital footprints
Experiments in Web 2.0: creative communications and digital footprints Experiments in Web 2.0: creative communications and digital footprints
Experiments in Web 2.0: creative communications and digital footprints
 
Power Point Gov
Power Point GovPower Point Gov
Power Point Gov
 
Retirement Planning
Retirement PlanningRetirement Planning
Retirement Planning
 
Roll Over Power Point Advanced Edu Safety Gew 6 10 09
Roll Over Power Point Advanced Edu Safety Gew 6 10 09Roll Over Power Point Advanced Edu Safety Gew 6 10 09
Roll Over Power Point Advanced Edu Safety Gew 6 10 09
 
Buscadores (Fodehum)
Buscadores (Fodehum)Buscadores (Fodehum)
Buscadores (Fodehum)
 
Organic fertilization and microbial dynamics
Organic fertilization and microbial dynamicsOrganic fertilization and microbial dynamics
Organic fertilization and microbial dynamics
 
Portage County Health Literacy Coalition
Portage County Health Literacy CoalitionPortage County Health Literacy Coalition
Portage County Health Literacy Coalition
 
RSC &amp; RIRG
RSC &amp; RIRGRSC &amp; RIRG
RSC &amp; RIRG
 
2011 Ohio Hispanic Business Summit
2011 Ohio Hispanic Business Summit2011 Ohio Hispanic Business Summit
2011 Ohio Hispanic Business Summit
 
Legal Issues Important for Doing Business in the U.S. | Martijn Steger
Legal Issues Important for Doing Business in the U.S. | Martijn StegerLegal Issues Important for Doing Business in the U.S. | Martijn Steger
Legal Issues Important for Doing Business in the U.S. | Martijn Steger
 
Underage Drinking Parties in San Antonio 2016
Underage Drinking Parties in San Antonio 2016Underage Drinking Parties in San Antonio 2016
Underage Drinking Parties in San Antonio 2016
 
A perfect storm
A perfect stormA perfect storm
A perfect storm
 
Seniorforsker Uffe Jørgensen; Aarhus Universitet
Seniorforsker Uffe Jørgensen; Aarhus UniversitetSeniorforsker Uffe Jørgensen; Aarhus Universitet
Seniorforsker Uffe Jørgensen; Aarhus Universitet
 
Nobody laughed
Nobody laughedNobody laughed
Nobody laughed
 

Similar to 2012 hpcuserforum talk

2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
c.titus.brown
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
c.titus.brown
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
c.titus.brown
 
BEACON 101: Sequencing tech
BEACON 101: Sequencing techBEACON 101: Sequencing tech
BEACON 101: Sequencing tech
c.titus.brown
 
2014 naples
2014 naples2014 naples
2014 naples
c.titus.brown
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
c.titus.brown
 
2012 oslo-talk
2012 oslo-talk2012 oslo-talk
2012 oslo-talk
c.titus.brown
 
2013 bms-retreat-talk
2013 bms-retreat-talk2013 bms-retreat-talk
2013 bms-retreat-talk
c.titus.brown
 
2013 hmp-assembly-webinar
2013 hmp-assembly-webinar2013 hmp-assembly-webinar
2013 hmp-assembly-webinar
c.titus.brown
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
Guy Coates
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
c.titus.brown
 
2014 nci-edrn
2014 nci-edrn2014 nci-edrn
2014 nci-edrn
c.titus.brown
 
Talk at Bioinformatics Open Source Conference, 2012
Talk at Bioinformatics Open Source Conference, 2012Talk at Bioinformatics Open Source Conference, 2012
Talk at Bioinformatics Open Source Conference, 2012
c.titus.brown
 
CT Brown - Doing next-gen sequencing analysis in the cloud
CT Brown - Doing next-gen sequencing analysis in the cloudCT Brown - Doing next-gen sequencing analysis in the cloud
CT Brown - Doing next-gen sequencing analysis in the cloud
Jan Aerts
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes
c.titus.brown
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
Adina Chuang Howe
 
2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona
c.titus.brown
 
2015 pycon-talk
2015 pycon-talk2015 pycon-talk
2015 pycon-talk
c.titus.brown
 
B.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 databaseB.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 database
Rai University
 
2014 mmg-talk
2014 mmg-talk2014 mmg-talk
2014 mmg-talk
c.titus.brown
 

Similar to 2012 hpcuserforum talk (20)

2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
BEACON 101: Sequencing tech
BEACON 101: Sequencing techBEACON 101: Sequencing tech
BEACON 101: Sequencing tech
 
2014 naples
2014 naples2014 naples
2014 naples
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
2012 oslo-talk
2012 oslo-talk2012 oslo-talk
2012 oslo-talk
 
2013 bms-retreat-talk
2013 bms-retreat-talk2013 bms-retreat-talk
2013 bms-retreat-talk
 
2013 hmp-assembly-webinar
2013 hmp-assembly-webinar2013 hmp-assembly-webinar
2013 hmp-assembly-webinar
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2014 nci-edrn
2014 nci-edrn2014 nci-edrn
2014 nci-edrn
 
Talk at Bioinformatics Open Source Conference, 2012
Talk at Bioinformatics Open Source Conference, 2012Talk at Bioinformatics Open Source Conference, 2012
Talk at Bioinformatics Open Source Conference, 2012
 
CT Brown - Doing next-gen sequencing analysis in the cloud
CT Brown - Doing next-gen sequencing analysis in the cloudCT Brown - Doing next-gen sequencing analysis in the cloud
CT Brown - Doing next-gen sequencing analysis in the cloud
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona
 
2015 pycon-talk
2015 pycon-talk2015 pycon-talk
2015 pycon-talk
 
B.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 databaseB.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 database
 
2014 mmg-talk
2014 mmg-talk2014 mmg-talk
2014 mmg-talk
 

More from c.titus.brown

2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
c.titus.brown
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
c.titus.brown
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
c.titus.brown
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
c.titus.brown
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
c.titus.brown
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
c.titus.brown
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
c.titus.brown
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
c.titus.brown
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
c.titus.brown
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
c.titus.brown
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
c.titus.brown
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
c.titus.brown
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
c.titus.brown
 
2014 wcgalp
2014 wcgalp2014 wcgalp
2014 wcgalp
c.titus.brown
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-ddd
c.titus.brown
 
2014 ismb-extra-slides
2014 ismb-extra-slides2014 ismb-extra-slides
2014 ismb-extra-slides
c.titus.brown
 
2014 bosc-keynote
2014 bosc-keynote2014 bosc-keynote
2014 bosc-keynote
c.titus.brown
 

More from c.titus.brown (17)

2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
 
2014 wcgalp
2014 wcgalp2014 wcgalp
2014 wcgalp
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-ddd
 
2014 ismb-extra-slides
2014 ismb-extra-slides2014 ismb-extra-slides
2014 ismb-extra-slides
 
2014 bosc-keynote
2014 bosc-keynote2014 bosc-keynote
2014 bosc-keynote
 

Recently uploaded

Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
313mohammedarshad
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
SubhamMandal40
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
digitalxplive
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
alexjohnson7307
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
Brian Pichman
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
Priyanka Aash
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
ssuser1915fe1
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
HackersList
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
Google Developer Group - Harare
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
bhumivarma35300
 

Recently uploaded (20)

Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
 

2012 hpcuserforum talk

  • 1. Why HPCs hate biologists (and what we‟re doing about it) C. Titus Brown Assistant Professor CSE, MMG, BEACON Michigan State University September 2012 ctb@msu.edu
  • 2. Outline  The basic problem(s) – resequencing and assembly  Data processing & data flow  Compression approaches  Some thoughts for the future
  • 3. Shotgun genomics  Collect samples;  Extract DNA;  Feed into sequencer;  Computationally analyze. Wikipedia: Environmental shotgun sequencing.p
  • 4. Analogy: shredding books It was the best of times, it was the wor , it was the worst of times, it was the isdom, it was the age of foolishness mes, it was the age of wisdom, it was th It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness …but for lots and lots of fragments!
  • 5. Sequencers also produce errors… It was the Gest of times, it was the wor , it was the worst of timZs, it was the isdom, it was the age of foolisXness , it was the worVt of times, it was the mes, it was Ahe age of wisdom, it was th It was the best of times, it Gas the wor mes, it was the age of witdom, it was th isdom, it was tIe age of foolishness It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness
  • 6. Three basic problems Resequencing, counting, and assembly.
  • 7. 1. Resequencing analysis We know a reference genome, and want to find variants (blue) in a background of errors (red)
  • 8. 2. Counting We have a reference genome (or gene set) and want to know how much we have. Think gene expression/microarrays.
  • 9. 3. Assembly We don‟t have a genome or any reference, and we want to construct one. (This is how all new genomes are sequenced.)
  • 10. Noisy observations <-> information It was the Gest of times, it was the wor , it was the worst of timZs, it was the isdom, it was the age of foolisXness , it was the worVt of times, it was the mes, it was Ahe age of wisdom, it was th It was the best of times, it Gas the wor mes, it was the age of witdom, it was th isdom, it was tIe age of foolishness It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness
  • 11. The scale of the problem is stunning.  I estimate a worldwide capacity for DNA sequencing of 15 petabases/yr (it‟s probably larger).  Individual labs can generate ~100 Gbp in ~1 week for $10k.  This sequencing is at a boutique level:  Sequencing formats are semi-standard.  Basic analysis approaches are ~80% cookbook.  Every biological prep, problem, and analysis is different.  Traditionally, biologists receive no training in computation  …and our computational infrastructure is optimizing for high performance computing, not
  • 12. “Three types of data scientists.” (Bob Grossman, U. Chicago, at XLDB) 1. Your data gathering rate is slower than Moore‟s Law. 2. Your data gathering rate matches Moore‟s Law. 3. Your data gathering rate exceeds Moore‟s Law.
  • 14. “Three types of data scientists.” 1. Your data gathering rate is slower than Moore‟s Law. => Be lazy, all will work out. 2. Your data gathering rate matches Moore‟s Law. => You need to write good software, but all will work out. 3. Your data gathering rate exceeds Moore‟s Law. => You need serious help.
  • 15. A few use cases 1. Real-time pathogen analysis 2. Cancer genome analysis => diagnosis & treatment regimen 3. Evolution of drug resistance in HIV 4. Gene expression analysis in agricultural animals 5. Examining microbial community change in response to agriculture and/or global climate change 6. Gene discovery & genome sequencing in non- model organisms
  • 16. Three basic problems Resequencing, counting, and assembly.
  • 17. ~2 GB – 2 TB of single-chassis RAM "Information" "Information" Raw data "Information" Analysis "Information" (~10-100 GB) ~1 GB "Information" Database & integration A few interesting computational challenges: - How do we store raw data for (1) provenance and (2) reanalysis? - What kind of infrastructure (hardware/software) is the “right” approach? - Can we reduce the cost of analysis?
  • 18. A few use cases 1. Real-time pathogen analysis 2. Cancer genome analysis => diagnosis & treatment regimen 3. Evolution of drug resistance in HIV 4. Gene expression analysis in agricultural animals 5. Examining microbial community change in response to agriculture and/or global climate change 6. Gene discovery & genome sequencing in non- model organisms
  • 19. What kind of approaches?  Hardware solutions may not be appropriate  Everyone in biology/biomedical informatics has or will have these data sets; need “commodity” solutions (=> ~cloud?)  …but current commodity hardware is optimized for processing power, while memory and I/O is expensive.  Are there algorithmic approaches with which we can apply leverage?
  • 20. "Information" "Information" Raw data Compression "Information" Analysis "Information" (~10-100 GB) (~2 GB) ~1 GB "Information" Database & integration A software & algorithms approach: can we develop lossy compression approaches that 1. Reduce data size & remove errors => efficient processing? 2. Retain all “information”? (think JPEG) If so, then we canShort answer the compressed data for store only is: yes, we can. later reanalysis.
  • 21. My research driven by my problems…  Est ~50 Tbp to comprehensively sample the microbial composition of a gram of soil.  Currently we have approximately 2 Tbp spread across 9 soil samples, for one project; 1 Tbp across 10 samples for another.  Need 3 TB RAM on single chassis to do assembly of 300 Gbp.  …estimate 500 TB RAM for 50 Tbp of sequence. As it turns out, if we can solve that problem, we can solve the rest 
  • 22. My lab at MSU: Theoretical => applied solutions. Theoretical advances Practically useful & usable Demonstrated in data structures and implementations, at scale. effectiveness on real data. algorithms
  • 23. 1. CountMin Sketch To add element: increment associated counter at all hash locales To get count: retrieve minimum counter across all hash locales http://highlyscalable.wordpress.com/2012/0 5/01/probabilistic-structures-web-analytics- data-mining/
  • 24. 2. Online, streaming, lossy (NOVEL) compression.  Transcriptomes, microbial genomes incl MDA, and most metagenomes can be assembled in under 50 GB of RAM, with identical or improved results.  Core algorithm is single pass, “low” memory.
  • 26. (NOVEL) 3. Compressible de Bruijn graphs 1% 5% 10% 15% Jason Pell & Arend Hintze
  • 27. Concluding thoughts  Our approaches provide significant and substantial practical and theoretical leverage to some really challenging problems.  They provide a path to the future:  Many-core compatible; distributable?  Decreased memory footprint => cloud computing can be used for many analyses.  They are in use, ~dozens of labs using digital normalization.  …although we‟re still in the process of publishing them.
  • 28. There is nothing up my sleeves. Everything discussed here:  Code: github.com/ged-lab/ ; BSD license  Blog: http://ivory.idyll.org/blog („titus brown blog‟)  Twitter: @ctitusbrown  Grants on Lab Web site: http://ged.msu.edu/interests.html  Preprints: on arXiv, q-bio: „diginorm arxiv‟ …and I welcome e-mails, ctb@msu.edu
  • 29. Thank you for the invitation! Acknowledgements Lab members involved Collaborators  Adina Howe (w/Tiedje)  Jim Tiedje, MSU  Jason Pell  Arend Hintze  Billie Swalla, UW  Rosangela Canino-  Janet Jansson, LBNL Koning  Qingpeng Zhang  Susannah Tringe, JGI  Elijah Lowe  Likit Preeyanon Funding  Jiarong Guo  Tim Brom USDA NIFA; NSF IOS;  Kanchan Pavangadkar BEACON.  Eric McDonald

Editor's Notes

  1. Goal is to do first stage data reduction/analysis in less time than it takes to generate the data. Compression =&gt; OLC assembly.