SlideShare a Scribd company logo
1 of 50
I’ve got the Big Data Blues
C. Titus Brown
ctb@msu.edu
Microbiology, Computer Science, and
BEACON
Outline
1. Genetics 101 and 102 - what you need to know.
2. Marek’s Disease – chicken cancer.
3. Generating lots of data – the sequencing
revolution.
4. The problems of data analysis and data
integration.
5. Some preliminary results on Marek’s Disease
5. An apparent digression: chess and computers.
6. My actual research :)
Genetics 101: DNA to RNA to protein to phenotype…
Genome
(DNA)
Transcripts
(Genes; RNA)
Proteins
(Amino acids)
Animal
http://commons.wikimedia.org/wiki/File:Spombe_Pop2p_protein_stru
cture_rainbow.png;
http://commons.wikimedia.org/wiki/File:Protein_CA2_PDB_12ca.png
…plus diploidy (2x each chromosome)
Genome
(DNA)
Transcripts
(Genes; RNA)
Proteins
(Amino acids)
Animal
GT
A
C
…plus regulation and interaction.
Genome
(DNA)
Transcripts
(Genes; RNA)
Proteins
(Amino acids)
Animal
GT
A
C
Regulation
Interaction
PHYSICAL
AGENTS
INFECTIOUS
AGENTS
HORMONES RADIATION
GENETIC
FACTORS
CHEMICAL
CARCINOGENS
LIFESTYLE
FACTORS
(slide courtesy Suga Subramanian)
Herpesvirus and Cancer
• Epstein-Barr Virus
– Burkitt’s lymphoma
– Hodgkin’s lymphoma
– Nasopharyngeal
carcinoma
• Herpes Virus-8
– Kaposi’s sarcoma
– Multicentric lymphoma
• Mardivirus
– Marek’s Disease
• Viral neoplastic disease
• Alpha-herpesvirus
• Model for Burkitt’s lymphoma
(slide courtesy Suga Subramanian)
Clinical Signs Asymmetric Paralysis
http://partnersah.vet.cornell.edu/avian-atlas/
Visceral Lymphoma
Liver
NORMAL
LYMPHOMA
Courtesy: John Dunn, USDA
Importance of Marek’s Disease
• Agricultural Impact
– Economic losses (2 billion)
– Viral evolution: Increased virulence
– Current Vaccines: Not enough
– Long term viral persistence
• Model Sytem
– Human herpes viral infections
– Viral induced lymphoma
(slide courtesy Suga Subramanian)
MAREK’S DISEASE
VIRUS
(MDV)
INBRED CHICKEN
LINES
MD-RESISTANT
LINE
MD-SUSCEPTIBLE
LINE
LINE 62 LINE 73
GENETIC RESISTANCE TO
MAREK’S DISEASE
(slide courtesy Suga Subramanian)
What happens when we infect?
Genome
(DNA)
Transcripts
(Genes; RNA)
Proteins
(Amino acids)
Animal
GT
A
C
Regulation
Interaction
Infect with virus
?
…how does the virus specifically interact with
genes?
Genome
(DNA)
Transcripts
(Genes; RNA)
Proteins
(Amino acids)
Animal
GT
A
C
Regulation
Interaction
Infect with virus
?
Mechanism of regulation?
…and what are the mechanisms of resistance?
Genome
(DNA)
Transcripts
(Genes; RNA)
Proteins
(Amino acids)
Animal
GT
A
C
Regulation
Interaction
Infect with virus
?
Mechanism of resistance?
Digression: DNA sequencing
• Observation of actual DNA sequence
• Counting of molecules
Image: Werner Van Belle
Fast, cheap, and easy to generate.
Image: Werner Van Belle
Applying sequencing to Marek’s Disease
Genome
(DNA)
Transcripts
(Genes; RNA)
Proteins
(Amino acids)
Animal
GT
A
C
Regulation
Interaction
SEQUENCING
Differentially expressed genes (DEG) due to infection
Gene GO Analysis, IPA Pathway Analysis
DEGs in Md5-infected and not in Md5ΔMeq-infected groups
YES NO
Meq-dependent DEGs DEGs not dependent on Meq
DEGs in Line 6 and not in Line 7 DEGs in Line 7 and not in Line 6
YES NO NO YES
Meq-dependent
DEGs involved in
MD resistance
Meq-dependent
DEGs involved in
MD susceptibility
Meq-dependent DEGs
common to both lines
Back to Marek’s disease:
(slide courtesy Suga Subramanian)
LINE 6
MD-RESISTANCE: ROLE OF MEQ
MDV MDV-no Meq
Genes involved in
MD-resistance
that are regulated
by Meq
Genes involved in
MD-resistance that
are not regulated
by Meq
1031 1670
(slide courtesy Suga Subramanian)
Pathway Analysis: MD resistance
(slide courtesy Suga Subramanian)
LINE 7
MD-SUSCEPTIBILITY: ROLE OF MEQ
MDV MDV-no Meq
Genes involved in
MD-susceptibility
that are regulated
by Meq
Genes involved in
MD-susceptibility
that are not
regulated by Meq
650 540
(slide courtesy Suga Subramanian)
Pathway Analysis: MD susceptibility
(slide courtesy Suga Subramanian)
Next problem: data analysis &
integration!
• Once you can generate virtually any data set you
want…
• …the next problem becomes finding your answer
in the data set!
• Think of it as a gigantic NSA treasure hunt: you
know there are terrorists out there, but to find
them you to hunt through 1 bn phone calls a
day…
Digression: “Heuristics”
• What do computers do when the answer is
either really, really hard to compute exactly, or
actually impossible?
• They approximate! Or guess!
• The term “heuristic” refers to a guess, or
shortcut procedure, that usually returns a
pretty good answer.
Often explicit or implicit tradeoffs between
compute “amount” and quality of result
http://www.infernodevelopment.com/how-
computer-chess-engines-think-minimax-tree
My actual research focus
What we do is think about ways to get
computers to play chess better, by:
– Identifying better ways to guess;
– Speeding up the guessing process;
– Improving people’s ability to use the chess playing
computer
Now, replace “play chess” with
“analyze biological data”...
My actual research focus…
We build tools that help experimental biologists work
efficiently and correctly with large amounts of data, to help
answer their scientific questions.
This touches on many problems, including:
• Computational and scientific correctness.
• Computational efficiency.
• Cultural divides between experimental biologists and
computational scientists.
• Lack of training (biology and medical curricula devoid of
math and computing).
Not-so-secret sauce: “digital normalization”
• One primary step of one type of data
analysis becomes 20-200x faster, 20-150x
“cheaper”.
http://en.wikipedia.org/wiki/JPEG
Lossy compression
http://en.wikipedia.org/wiki/JPEG
Lossy compression
http://en.wikipedia.org/wiki/JPEG
Lossy compression
http://en.wikipedia.org/wiki/JPEG
Lossy compression
http://en.wikipedia.org/wiki/JPEG
Lossy compression
Raw data
(~10-100 GB)
Analysis "Information"
~1 GB
"Information"
"Information"
"Information"
"Information"
Database &
integration
Restated:
Can we use lossy compression approaches to make
downstream analysis faster and better? (Yes.)
~2 GB – 2 TB of single-chassis RAM
Some diginorm examples:
1. Assembly of the H. contortus parasitic nematode
genome.
2. Assembly of two Midwest soil metagenomes,
Iowa corn and Iowa prairie.
3. Reference-free assembly of the lamprey (P.
marinus) transcriptome.
1. The H. contortus problem
• A sheep parasite.
• ~350 Mbp genome
• Sequenced DNA 6 individuals after whole genome
amplification, estimated 10% heterozygosity (!?)
• Significant bacterial contamination.
(w/Robin Gasser, Paul Sternberg, and Erich Schwarz)
H. contortus life cycle
Refs.: Nikolaou and Gasser (2006), Int. J. Parasitol. 36, 859-868;
Prichard and Geary (2008), Nature 452, 157-158.
Assembly after digital normalization
• Diginorm readily enabled assembly of a 404
Mbp genome with N50 of 15.6 kb;
• Post-processing led to 73-94% complete
genome.
• Diginorm helped by making analysis possible.
– Highly variable population.
– Lots of contamination from microbes.
Next steps with H. contortus
• Publish the genome paper 
• Identification of antibiotic targets for
treatment in agricultural settings (animal
husbandry).
• Serving as “reference approach” for a wide
variety of parasitic nematodes, many of which
have similar genomic issues.
2. Soil metagenome assembly
A “Grand Challenge” dataset (DOE/JGI)
0
100
200
300
400
500
600
Iowa,
Continuous
corn
Iowa, Native
Prairie
Kansas,
Cultivated
corn
Kansas,
Native
Prairie
Wisconsin,
Continuous
corn
Wisconsin,
Native
Prairie
Wisconsin,
Restored
Prairie
Wisconsin,
Switchgrass
BasepairsofSequencing(Gbp)
GAII HiSeq
Rumen (Hess et. al, 2011), 268 Gbp
MetaHIT (Qin et. al, 2011), 578 Gbp
NCBI nr database,
37 Gbp
Total: 1,846 Gbp soil metagenome
Rumen K-mer Filtered,
111 Gbp
Putting it in perspective:
Total equivalent of ~1200 bacterial genomes
Human genome ~3 billion bp
Assembly results for Iowa corn and prairie
(2x ~300 Gbp soil metagenomes)
Total
Assembly
Total Contigs
(> 300 bp)
% Reads
Assembled
Predicted
protein
coding
2.5 bill 4.5 mill 19% 5.3 mill
3.5 bill 5.9 mill 22% 6.8 mill
Adina Howe
3. Sea lamprey gene expression
• Non-native
• Parasite of
medium to
large fishes
• Caused
populations of
host fishes to
crash
Li Lab / Y-W C-D
Transcriptome results
• Started with 5.1 billion reads from 50 different tissues.
(4 years of computational research, and about 1 month of compute
time, GO HERE)
• Final assembly contains ~95% of genes (est.)
• This is an extra 40% over previous work.
• Enabling studies in –
– Basal vertebrate phylogeny
– Biliary atresia
– Evolutionary origin of brown fat (previously thought to be
mammalian only!) – J Exp Biol. 2013
– Pheromonal response in adults
What are the tissue level changes in gene expression that support
regeneration? Transcriptome analysis of a regenerating vertebrate after SCI
brain
spinal cord
RNA-Seq to determine
differential expression
profile after injury
Sampling >weekly
-/+ Dex
Ona Bloom
Challenges ahead
• We need more people working at the interface
– “Priesthood” model doesn’t scale!
– Cultural shifts in biology needed…
• We need more data!
– Data often only makes sense in context of other data
– This is a hard sell: “if you give us 1000x as much data,
we might start to develop some idea of what it
means.”
• We actually know very little about biology still!
Open science & sharing
• Science, and biology in particular, is in the
middle of a transition to a “data intensive”
field.
• The sharing ethos is not incentivized properly;
you get more credit for discovering new stuff
than for discoveries resulting from sharing.
• We are focused on sharing: methods,
programs, educational materials…
Being disruptive?
Possible initiative from my lab:
“We will analyze your data for you if we can
make your data openly available in 1 yr.”
Will it work, or sink like a stone? Ask me in a
year 
MSU’s role in my research
• MSU provides nice infrastructure, great
administrative support, and a truly excellent
community (students, profs, and other
researchers).
• MSU is also uniquely interdisciplinary in many
ways; very few “hard” boundaries in biology
research.
Credits
• Marek’s Disease: Suga Subramanian and Hans Cheng (USDA)
• Haemonchus: Erich Schwarz (Caltech/Cornell), Paul Sternberg
(Caltech), Robin Gasser (U. Melbourne)
• Lamprey: Weiming Li (MSU), Ona Bloom (Feinstein), Jen
Morgan (MBL/Woods Hole)
• Great Prairie: Jim Tiedje (MSU), Janet Jansson (LBL), Susanna
Tringe (Joint Genome Inst.)
Funding: MSU; USDA; NSF; NIH.
Drop me a line – ctb@msu.edu

More Related Content

What's hot

Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsBayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsJonathan Eisen
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics Christopher Mason
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotesc.titus.brown
 
Parallel Altitudinal Clines Reveal Adaptive Evolution Of Genome Size In Zea mays
Parallel Altitudinal Clines Reveal Adaptive Evolution Of Genome Size In Zea maysParallel Altitudinal Clines Reveal Adaptive Evolution Of Genome Size In Zea mays
Parallel Altitudinal Clines Reveal Adaptive Evolution Of Genome Size In Zea maysjrossibarra
 
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesBuilding an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesLarry Smarr
 
Genome size and adaptation in plants
Genome size and adaptation in plantsGenome size and adaptation in plants
Genome size and adaptation in plantsjrossibarra
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talkc.titus.brown
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global communityExternalEvents
 
U Florida / Gainesville talk, apr 13 2011
U Florida / Gainesville  talk, apr 13 2011U Florida / Gainesville  talk, apr 13 2011
U Florida / Gainesville talk, apr 13 2011c.titus.brown
 
Adaptive evolution of genome size across altitudinal clines in maize
Adaptive evolution of genome size across altitudinal clines in maizeAdaptive evolution of genome size across altitudinal clines in maize
Adaptive evolution of genome size across altitudinal clines in maizejrossibarra
 
Novel Computational Approaches to Investigate Microbial Diversity
Novel Computational Approaches to Investigate Microbial DiversityNovel Computational Approaches to Investigate Microbial Diversity
Novel Computational Approaches to Investigate Microbial DiversityQingpeng "Q.P." Zhang
 
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton SeedHail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton SeedSpark Summit
 
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...Larry Smarr
 

What's hot (20)

2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation MetagenomicsBayesian Taxonomic Assignment for the Next-Generation Metagenomics
Bayesian Taxonomic Assignment for the Next-Generation Metagenomics
 
2014 davis-talk
2014 davis-talk2014 davis-talk
2014 davis-talk
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes
 
Parallel Altitudinal Clines Reveal Adaptive Evolution Of Genome Size In Zea mays
Parallel Altitudinal Clines Reveal Adaptive Evolution Of Genome Size In Zea maysParallel Altitudinal Clines Reveal Adaptive Evolution Of Genome Size In Zea mays
Parallel Altitudinal Clines Reveal Adaptive Evolution Of Genome Size In Zea mays
 
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesBuilding an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
 
Genome size and adaptation in plants
Genome size and adaptation in plantsGenome size and adaptation in plants
Genome size and adaptation in plants
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talk
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
U Florida / Gainesville talk, apr 13 2011
U Florida / Gainesville  talk, apr 13 2011U Florida / Gainesville  talk, apr 13 2011
U Florida / Gainesville talk, apr 13 2011
 
Adaptive evolution of genome size across altitudinal clines in maize
Adaptive evolution of genome size across altitudinal clines in maizeAdaptive evolution of genome size across altitudinal clines in maize
Adaptive evolution of genome size across altitudinal clines in maize
 
Novel Computational Approaches to Investigate Microbial Diversity
Novel Computational Approaches to Investigate Microbial DiversityNovel Computational Approaches to Investigate Microbial Diversity
Novel Computational Approaches to Investigate Microbial Diversity
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton SeedHail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
 
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
 

Viewers also liked

Printemps
PrintempsPrintemps
PrintempsJURY
 
2013 beacon-congress-social-media
2013 beacon-congress-social-media2013 beacon-congress-social-media
2013 beacon-congress-social-mediac.titus.brown
 
Global Brand Management Series: Internet Marketing for Start-Ups in Taiwan
Global Brand Management Series: Internet Marketing for Start-Ups in TaiwanGlobal Brand Management Series: Internet Marketing for Start-Ups in Taiwan
Global Brand Management Series: Internet Marketing for Start-Ups in Taiwankwoolf
 
MoMoTLV Israel March 2010 - Aviv Revach - Mobile Apps Monetization Overview
MoMoTLV Israel March 2010 - Aviv Revach - Mobile Apps Monetization OverviewMoMoTLV Israel March 2010 - Aviv Revach - Mobile Apps Monetization Overview
MoMoTLV Israel March 2010 - Aviv Revach - Mobile Apps Monetization OverviewMobileMonday Tel-Aviv
 
Homework, Term 3 & 4
Homework, Term 3 & 4Homework, Term 3 & 4
Homework, Term 3 & 4Takahe One
 
Exporting from the United States: Key Legal Considerations
Exporting from the United States: Key Legal ConsiderationsExporting from the United States: Key Legal Considerations
Exporting from the United States: Key Legal ConsiderationsKegler Brown Hill + Ritter
 
Trainings Evaluation Report WPS Phase-I Lodharn
Trainings Evaluation Report WPS Phase-I LodharnTrainings Evaluation Report WPS Phase-I Lodharn
Trainings Evaluation Report WPS Phase-I LodharnZafar Ahmad
 
The role of skills in recession and recovery by Chris Humphries
The role of skills in recession and recovery by Chris HumphriesThe role of skills in recession and recovery by Chris Humphries
The role of skills in recession and recovery by Chris HumphriesAcas Comms
 
Hohmann liber2006text
Hohmann liber2006textHohmann liber2006text
Hohmann liber2006textTina Hohmann
 
Long term evaluation of IL programme slides
Long term evaluation of IL programme slidesLong term evaluation of IL programme slides
Long term evaluation of IL programme slidesTina Hohmann
 
06 Outsource To India Open Source Development
06 Outsource To India Open Source Development06 Outsource To India Open Source Development
06 Outsource To India Open Source DevelopmentoutsourceToIndia
 
BlackBerry Clinique-Short Review OS 7.1
BlackBerry Clinique-Short Review OS 7.1BlackBerry Clinique-Short Review OS 7.1
BlackBerry Clinique-Short Review OS 7.1Khomeini Mujahid
 
Absence Makes You a Goner: Dealing with Employee Leave
Absence Makes You a Goner: Dealing with Employee LeaveAbsence Makes You a Goner: Dealing with Employee Leave
Absence Makes You a Goner: Dealing with Employee LeaveKegler Brown Hill + Ritter
 
2013 pag-poultry-workshop
2013 pag-poultry-workshop2013 pag-poultry-workshop
2013 pag-poultry-workshopc.titus.brown
 
Introducing BlackBerry 10 [Indonesian Version]
Introducing BlackBerry 10 [Indonesian Version]Introducing BlackBerry 10 [Indonesian Version]
Introducing BlackBerry 10 [Indonesian Version]Khomeini Mujahid
 
Digital Footprints: Using the Internet to enhance your career prospects
Digital Footprints: Using the Internet to enhance your career prospectsDigital Footprints: Using the Internet to enhance your career prospects
Digital Footprints: Using the Internet to enhance your career prospectsJudith Baines
 

Viewers also liked (20)

Printemps
PrintempsPrintemps
Printemps
 
2013 beacon-congress-social-media
2013 beacon-congress-social-media2013 beacon-congress-social-media
2013 beacon-congress-social-media
 
Global Brand Management Series: Internet Marketing for Start-Ups in Taiwan
Global Brand Management Series: Internet Marketing for Start-Ups in TaiwanGlobal Brand Management Series: Internet Marketing for Start-Ups in Taiwan
Global Brand Management Series: Internet Marketing for Start-Ups in Taiwan
 
MoMoTLV Israel March 2010 - Aviv Revach - Mobile Apps Monetization Overview
MoMoTLV Israel March 2010 - Aviv Revach - Mobile Apps Monetization OverviewMoMoTLV Israel March 2010 - Aviv Revach - Mobile Apps Monetization Overview
MoMoTLV Israel March 2010 - Aviv Revach - Mobile Apps Monetization Overview
 
Homework, Term 3 & 4
Homework, Term 3 & 4Homework, Term 3 & 4
Homework, Term 3 & 4
 
Exporting from the United States: Key Legal Considerations
Exporting from the United States: Key Legal ConsiderationsExporting from the United States: Key Legal Considerations
Exporting from the United States: Key Legal Considerations
 
Trainings Evaluation Report WPS Phase-I Lodharn
Trainings Evaluation Report WPS Phase-I LodharnTrainings Evaluation Report WPS Phase-I Lodharn
Trainings Evaluation Report WPS Phase-I Lodharn
 
The role of skills in recession and recovery by Chris Humphries
The role of skills in recession and recovery by Chris HumphriesThe role of skills in recession and recovery by Chris Humphries
The role of skills in recession and recovery by Chris Humphries
 
Hohmann liber2006text
Hohmann liber2006textHohmann liber2006text
Hohmann liber2006text
 
Coalition Orientation to Public
Coalition Orientation to PublicCoalition Orientation to Public
Coalition Orientation to Public
 
h-ubu - CDI in JavaScript
h-ubu - CDI in JavaScripth-ubu - CDI in JavaScript
h-ubu - CDI in JavaScript
 
Long term evaluation of IL programme slides
Long term evaluation of IL programme slidesLong term evaluation of IL programme slides
Long term evaluation of IL programme slides
 
06 Outsource To India Open Source Development
06 Outsource To India Open Source Development06 Outsource To India Open Source Development
06 Outsource To India Open Source Development
 
BlackBerry Clinique-Short Review OS 7.1
BlackBerry Clinique-Short Review OS 7.1BlackBerry Clinique-Short Review OS 7.1
BlackBerry Clinique-Short Review OS 7.1
 
Absence Makes You a Goner: Dealing with Employee Leave
Absence Makes You a Goner: Dealing with Employee LeaveAbsence Makes You a Goner: Dealing with Employee Leave
Absence Makes You a Goner: Dealing with Employee Leave
 
Demystifying SEO
Demystifying SEODemystifying SEO
Demystifying SEO
 
2013 pag-poultry-workshop
2013 pag-poultry-workshop2013 pag-poultry-workshop
2013 pag-poultry-workshop
 
Introducing BlackBerry 10 [Indonesian Version]
Introducing BlackBerry 10 [Indonesian Version]Introducing BlackBerry 10 [Indonesian Version]
Introducing BlackBerry 10 [Indonesian Version]
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-ddd
 
Digital Footprints: Using the Internet to enhance your career prospects
Digital Footprints: Using the Internet to enhance your career prospectsDigital Footprints: Using the Internet to enhance your career prospects
Digital Footprints: Using the Internet to enhance your career prospects
 

Similar to 2013 alumni-webinar

2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grcc.titus.brown
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)jmoore89
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p collegeSKUASTKashmir
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesAmos Watentena
 
Human Genome Sequencing and health Biotechnology.ppt
Human Genome Sequencing and health Biotechnology.pptHuman Genome Sequencing and health Biotechnology.ppt
Human Genome Sequencing and health Biotechnology.ppthkk03012587
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08Russ Altman
 
2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekingeProf. Wim Van Criekinge
 
DNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesDNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesBarbera van Schaik
 
Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel DudleyMoving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel DudleyCityAge
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Seattle DAML meetup
 

Similar to 2013 alumni-webinar (20)

2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
2015 03 13_puurs_v_public
2015 03 13_puurs_v_public2015 03 13_puurs_v_public
2015 03 13_puurs_v_public
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
2015 pycon-talk
2015 pycon-talk2015 pycon-talk
2015 pycon-talk
 
TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)TLSC Biotech 101 Noc 2010 (Moore)
TLSC Biotech 101 Noc 2010 (Moore)
 
2014 naples
2014 naples2014 naples
2014 naples
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
 
Human Genome Sequencing and health Biotechnology.ppt
Human Genome Sequencing and health Biotechnology.pptHuman Genome Sequencing and health Biotechnology.ppt
Human Genome Sequencing and health Biotechnology.ppt
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge
 
MoM2010: Bioinformatics
MoM2010: BioinformaticsMoM2010: Bioinformatics
MoM2010: Bioinformatics
 
DNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesDNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differences
 
Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel DudleyMoving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
 

More from c.titus.brown

More from c.titus.brown (20)

2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenome
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
 
2014 mmg-talk
2014 mmg-talk2014 mmg-talk
2014 mmg-talk
 
2014 nci-edrn
2014 nci-edrn2014 nci-edrn
2014 nci-edrn
 

Recently uploaded

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

2013 alumni-webinar

  • 1. I’ve got the Big Data Blues C. Titus Brown ctb@msu.edu Microbiology, Computer Science, and BEACON
  • 2. Outline 1. Genetics 101 and 102 - what you need to know. 2. Marek’s Disease – chicken cancer. 3. Generating lots of data – the sequencing revolution. 4. The problems of data analysis and data integration. 5. Some preliminary results on Marek’s Disease 5. An apparent digression: chess and computers. 6. My actual research :)
  • 3. Genetics 101: DNA to RNA to protein to phenotype… Genome (DNA) Transcripts (Genes; RNA) Proteins (Amino acids) Animal http://commons.wikimedia.org/wiki/File:Spombe_Pop2p_protein_stru cture_rainbow.png; http://commons.wikimedia.org/wiki/File:Protein_CA2_PDB_12ca.png
  • 4. …plus diploidy (2x each chromosome) Genome (DNA) Transcripts (Genes; RNA) Proteins (Amino acids) Animal GT A C
  • 5. …plus regulation and interaction. Genome (DNA) Transcripts (Genes; RNA) Proteins (Amino acids) Animal GT A C Regulation Interaction
  • 7. Herpesvirus and Cancer • Epstein-Barr Virus – Burkitt’s lymphoma – Hodgkin’s lymphoma – Nasopharyngeal carcinoma • Herpes Virus-8 – Kaposi’s sarcoma – Multicentric lymphoma • Mardivirus – Marek’s Disease • Viral neoplastic disease • Alpha-herpesvirus • Model for Burkitt’s lymphoma (slide courtesy Suga Subramanian)
  • 8. Clinical Signs Asymmetric Paralysis http://partnersah.vet.cornell.edu/avian-atlas/
  • 10. Importance of Marek’s Disease • Agricultural Impact – Economic losses (2 billion) – Viral evolution: Increased virulence – Current Vaccines: Not enough – Long term viral persistence • Model Sytem – Human herpes viral infections – Viral induced lymphoma (slide courtesy Suga Subramanian)
  • 11. MAREK’S DISEASE VIRUS (MDV) INBRED CHICKEN LINES MD-RESISTANT LINE MD-SUSCEPTIBLE LINE LINE 62 LINE 73 GENETIC RESISTANCE TO MAREK’S DISEASE (slide courtesy Suga Subramanian)
  • 12. What happens when we infect? Genome (DNA) Transcripts (Genes; RNA) Proteins (Amino acids) Animal GT A C Regulation Interaction Infect with virus ?
  • 13. …how does the virus specifically interact with genes? Genome (DNA) Transcripts (Genes; RNA) Proteins (Amino acids) Animal GT A C Regulation Interaction Infect with virus ? Mechanism of regulation?
  • 14. …and what are the mechanisms of resistance? Genome (DNA) Transcripts (Genes; RNA) Proteins (Amino acids) Animal GT A C Regulation Interaction Infect with virus ? Mechanism of resistance?
  • 15. Digression: DNA sequencing • Observation of actual DNA sequence • Counting of molecules Image: Werner Van Belle
  • 16. Fast, cheap, and easy to generate. Image: Werner Van Belle
  • 17. Applying sequencing to Marek’s Disease Genome (DNA) Transcripts (Genes; RNA) Proteins (Amino acids) Animal GT A C Regulation Interaction SEQUENCING
  • 18. Differentially expressed genes (DEG) due to infection Gene GO Analysis, IPA Pathway Analysis DEGs in Md5-infected and not in Md5ΔMeq-infected groups YES NO Meq-dependent DEGs DEGs not dependent on Meq DEGs in Line 6 and not in Line 7 DEGs in Line 7 and not in Line 6 YES NO NO YES Meq-dependent DEGs involved in MD resistance Meq-dependent DEGs involved in MD susceptibility Meq-dependent DEGs common to both lines Back to Marek’s disease: (slide courtesy Suga Subramanian)
  • 19. LINE 6 MD-RESISTANCE: ROLE OF MEQ MDV MDV-no Meq Genes involved in MD-resistance that are regulated by Meq Genes involved in MD-resistance that are not regulated by Meq 1031 1670 (slide courtesy Suga Subramanian)
  • 20. Pathway Analysis: MD resistance (slide courtesy Suga Subramanian)
  • 21. LINE 7 MD-SUSCEPTIBILITY: ROLE OF MEQ MDV MDV-no Meq Genes involved in MD-susceptibility that are regulated by Meq Genes involved in MD-susceptibility that are not regulated by Meq 650 540 (slide courtesy Suga Subramanian)
  • 22. Pathway Analysis: MD susceptibility (slide courtesy Suga Subramanian)
  • 23. Next problem: data analysis & integration! • Once you can generate virtually any data set you want… • …the next problem becomes finding your answer in the data set! • Think of it as a gigantic NSA treasure hunt: you know there are terrorists out there, but to find them you to hunt through 1 bn phone calls a day…
  • 24. Digression: “Heuristics” • What do computers do when the answer is either really, really hard to compute exactly, or actually impossible? • They approximate! Or guess! • The term “heuristic” refers to a guess, or shortcut procedure, that usually returns a pretty good answer.
  • 25. Often explicit or implicit tradeoffs between compute “amount” and quality of result http://www.infernodevelopment.com/how- computer-chess-engines-think-minimax-tree
  • 26. My actual research focus What we do is think about ways to get computers to play chess better, by: – Identifying better ways to guess; – Speeding up the guessing process; – Improving people’s ability to use the chess playing computer Now, replace “play chess” with “analyze biological data”...
  • 27. My actual research focus… We build tools that help experimental biologists work efficiently and correctly with large amounts of data, to help answer their scientific questions. This touches on many problems, including: • Computational and scientific correctness. • Computational efficiency. • Cultural divides between experimental biologists and computational scientists. • Lack of training (biology and medical curricula devoid of math and computing).
  • 28. Not-so-secret sauce: “digital normalization” • One primary step of one type of data analysis becomes 20-200x faster, 20-150x “cheaper”.
  • 34. Raw data (~10-100 GB) Analysis "Information" ~1 GB "Information" "Information" "Information" "Information" Database & integration Restated: Can we use lossy compression approaches to make downstream analysis faster and better? (Yes.) ~2 GB – 2 TB of single-chassis RAM
  • 35. Some diginorm examples: 1. Assembly of the H. contortus parasitic nematode genome. 2. Assembly of two Midwest soil metagenomes, Iowa corn and Iowa prairie. 3. Reference-free assembly of the lamprey (P. marinus) transcriptome.
  • 36. 1. The H. contortus problem • A sheep parasite. • ~350 Mbp genome • Sequenced DNA 6 individuals after whole genome amplification, estimated 10% heterozygosity (!?) • Significant bacterial contamination. (w/Robin Gasser, Paul Sternberg, and Erich Schwarz)
  • 37. H. contortus life cycle Refs.: Nikolaou and Gasser (2006), Int. J. Parasitol. 36, 859-868; Prichard and Geary (2008), Nature 452, 157-158.
  • 38. Assembly after digital normalization • Diginorm readily enabled assembly of a 404 Mbp genome with N50 of 15.6 kb; • Post-processing led to 73-94% complete genome. • Diginorm helped by making analysis possible. – Highly variable population. – Lots of contamination from microbes.
  • 39. Next steps with H. contortus • Publish the genome paper  • Identification of antibiotic targets for treatment in agricultural settings (animal husbandry). • Serving as “reference approach” for a wide variety of parasitic nematodes, many of which have similar genomic issues.
  • 40. 2. Soil metagenome assembly
  • 41. A “Grand Challenge” dataset (DOE/JGI) 0 100 200 300 400 500 600 Iowa, Continuous corn Iowa, Native Prairie Kansas, Cultivated corn Kansas, Native Prairie Wisconsin, Continuous corn Wisconsin, Native Prairie Wisconsin, Restored Prairie Wisconsin, Switchgrass BasepairsofSequencing(Gbp) GAII HiSeq Rumen (Hess et. al, 2011), 268 Gbp MetaHIT (Qin et. al, 2011), 578 Gbp NCBI nr database, 37 Gbp Total: 1,846 Gbp soil metagenome Rumen K-mer Filtered, 111 Gbp
  • 42. Putting it in perspective: Total equivalent of ~1200 bacterial genomes Human genome ~3 billion bp Assembly results for Iowa corn and prairie (2x ~300 Gbp soil metagenomes) Total Assembly Total Contigs (> 300 bp) % Reads Assembled Predicted protein coding 2.5 bill 4.5 mill 19% 5.3 mill 3.5 bill 5.9 mill 22% 6.8 mill Adina Howe
  • 43. 3. Sea lamprey gene expression • Non-native • Parasite of medium to large fishes • Caused populations of host fishes to crash Li Lab / Y-W C-D
  • 44. Transcriptome results • Started with 5.1 billion reads from 50 different tissues. (4 years of computational research, and about 1 month of compute time, GO HERE) • Final assembly contains ~95% of genes (est.) • This is an extra 40% over previous work. • Enabling studies in – – Basal vertebrate phylogeny – Biliary atresia – Evolutionary origin of brown fat (previously thought to be mammalian only!) – J Exp Biol. 2013 – Pheromonal response in adults
  • 45. What are the tissue level changes in gene expression that support regeneration? Transcriptome analysis of a regenerating vertebrate after SCI brain spinal cord RNA-Seq to determine differential expression profile after injury Sampling >weekly -/+ Dex Ona Bloom
  • 46. Challenges ahead • We need more people working at the interface – “Priesthood” model doesn’t scale! – Cultural shifts in biology needed… • We need more data! – Data often only makes sense in context of other data – This is a hard sell: “if you give us 1000x as much data, we might start to develop some idea of what it means.” • We actually know very little about biology still!
  • 47. Open science & sharing • Science, and biology in particular, is in the middle of a transition to a “data intensive” field. • The sharing ethos is not incentivized properly; you get more credit for discovering new stuff than for discoveries resulting from sharing. • We are focused on sharing: methods, programs, educational materials…
  • 48. Being disruptive? Possible initiative from my lab: “We will analyze your data for you if we can make your data openly available in 1 yr.” Will it work, or sink like a stone? Ask me in a year 
  • 49. MSU’s role in my research • MSU provides nice infrastructure, great administrative support, and a truly excellent community (students, profs, and other researchers). • MSU is also uniquely interdisciplinary in many ways; very few “hard” boundaries in biology research.
  • 50. Credits • Marek’s Disease: Suga Subramanian and Hans Cheng (USDA) • Haemonchus: Erich Schwarz (Caltech/Cornell), Paul Sternberg (Caltech), Robin Gasser (U. Melbourne) • Lamprey: Weiming Li (MSU), Ona Bloom (Feinstein), Jen Morgan (MBL/Woods Hole) • Great Prairie: Jim Tiedje (MSU), Janet Jansson (LBL), Susanna Tringe (Joint Genome Inst.) Funding: MSU; USDA; NSF; NIH. Drop me a line – ctb@msu.edu

Editor's Notes

  1. This image depict numerous lymphoma aggregates in the liver
  2. Figure 6. IPA Pathway analysis for significantly expressed genes that are Meq-dependent and involved in resistance to MD (A) and MD susceptibility (B). P-value < 0.05 and FDR <0.05 were used as thresholds to select significant canonical pathways.
  3. Goal is to do first stage data reduction/analysis in less time than it takes to generate the data. Compression => OLC assembly.
  4. Larvae/stream bottoms 3-6 years; parasitic adult -> great lakes, 12-20 months feeding. 5-8 years. 40 lbs of fish per life as parasite. 98% of fish in great lakes went away!