SlideShare a Scribd company logo
1 of 20
腸内細菌叢のメタゲノム解析に関する調査
A survey on metagenomic analysis for
gut microbiota
July 21st, 2017
Kazumasa Kaneko
Kobayashi Lab
1
Outline
• Background : Importance of gut microbiota
• Measurement : Metagenomic sequencing
• Preprocessing : From sequences to OTUs by clustering
• Data analysis methods
• Future prospects
2
Importance of gut microbiota
3
Many diseases are revealed to be related to the gut microbio
Population : 1013
Species : 103
Wu, et al., 2011, Science Horai, et al., 2015,
Immunity
Sender, et al., 2016, Cell
Sommer, et al., 2013, Nat. Rev. Microbiol.
Autoimmune
diseases on eyes
Inflammatory
Bowel Diseases
Diabetes & Obesity
Fig : http://www.irasutoya.com/
Measurement : Metagenomic
sequencing
4
16S rRNA gene
Varies between species
Q. How can we know the population of microbiota?
A. Count gene sequences varying between species.
ACGTGG…
Just an error or different species?
Environment
: microbe
Genome
*1 https://www.slideshare.net/AshokSharma53/16s-classifier
*2 http://togotv.dbcls.jp/ja/pics.html
*1
Read by NGS
(Next-generation sequencing)
*2
Count
AAGTGG…
AAGTCG…
What species?
Preprocessing : from sequences to OTUs by clustering
5
Q. How can we define “species” in microbes?
A. Currently no widely accepted concept.
Instead, use OTUs (Operational Taxonomic Units) [Franzen, et al.,
OTUs : clustered units with <3% dissimilarity of 16S rRNA
2 ways of clustering methods:
Heuristic approach Hierarchical approach
~2010s 2010s~
Computational cost Light Heavy
Trend
Ghodsi, et al., 2011
Li, et al., 2006
Edgar, 2010
Sun, et al., 2009
Matias, et al., 2014
Literatures
Why was heuristic approach common?
6
2 main problems for OTUs clustering:
1. Large size of sequence reads
・100K reads/shot -> 100GB of distance matrix
・Hierarchical clustering algorithm : O(n2)
2. Computational cost on calculating distance btw. sequences
・103 bp/seq
・Sequence Alignment : O(mn)
acctggtaaa
acatgcgtata
acctg-gtaaa
acatgcgtata
s1:
s2:
s1:
s2:
How heuristic approach alleviates computational cost?
(1)
7
1. Large size of sequence reads
-> Greedy algorithm
ex) Start clustering from longest sequence (Li, et al., 2006)
1. Sort by length
ATGCGTGGCAG
TGGCTGGACA
ATGGCATGG
︙
ATGCGTGGCAG
2. Pick longest as seed
3. Calc. distance
& join into cluster
Problem: wrong cluster depending on clustering order (Franzen, et al., 2
seed1
seed2
d(seed2, seq.i)
d(seed1, seq.i)
But seq.i belongs to seed1 cluster
seq.i
4. iterate
>
threshold
8
2. Computational cost on calculating distance btw. Sequences
-> Filtering (Li, et al., 2006) -> Prefix tree (Ghodsi, et al., 20
How heuristic approach alleviates computational cost?
(2)
ATCTGGCTAGCACCTGAGTTGA
… …
1) Find chars. complete match
(efficiently by look up table)
2) Let sequence length
If no matches found,
lower bound of mismatches is
upper bound of matching rate is
-> Filter by threshold
A
A T
T G
1) Create prefix tree for sequence
AAT…
AAG…
AT…
…
2) Reuse DP matrix for next leaf
A A T …
Query Same as
AAG…
Calculate distance for AAT…
Hierarchical Clustering for large data
Sun, et al., 2009, Nucleic Acids Res.
9
0. Create a sparse sorted distance matrix
skipping too dissimilar pairs by filtering
Problem : Cannot pass a full distance matrix as input due to its
Method : Sparse sorted matrix & on-the-fly processing
Algorithm (complete linkage clustering)
si 1 1 3 3 2 4
sj 2 3 4 5 3 5
dist. 0.1 0.2 0.3 0.4 0.5 0.6
1
2
4
5
3
Step 1
1
2
4
5
3
Step 2
1
2
4
5
3
Step 3
1
2
4
5
3
Step 4
1
2
4
5
3
Step 5
1
2
4
5
3
Step 6
index of seq. & distance.
Outline
• Background : Importance of gut microbiota
• Measurement : metagenomic sequencing
• Preprocessing : from sequences to OTUs by clustering
• Data analysis methods
• Future prospects
10
SGSL : Sparse Group-Subgroup LASSO
Garcia, et al., 2014, Bioinformatics
11
Phylogenetic tree
Phylum?
Family?
Genus?
Q. What are the key factors in microbiota to objectives?
#1 #2 #3 …
y
x1
x2
x3
…
…
Observed data
x1x2x3
…
…
→ find subset of x
in tree structure
correlated to y
Q. Estimate the values of parameters with sparse X in
linear regression :
Sparse regression by LASSO
Tibshirani, 1996, J. R. Statist. Soc. B
12
A. LASSO (Least Absolute Shrinkage and Selection Operator)
: # of data : data
: Objective variable
: Explanatory variables
: Noise
Tibshirani, 1996, J. R. Statist. Soc. B
Penalty term w/ param.
Sparse result
Constraint
Region
Sparse
From LASSO to SGSL
13
Lasso
(Tibshirani, 1996)
Group Lasso
(Yuan& Lin, 2006)
Sparse-group Lasso
(Simon, et al., 2012)
Sparse group
subgroup Lasso
(Garcia, et al., 2014)
Group
SparseGroup
Sparse
Group
Subgroup
How to estimate correlation?
Estimation of correlation from relative
population
14
16.5 17 17.5 18 18.5 19 19.5 20 20.5 21 21.5
7
8
9
10
11
12
13
0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74
0.26
0.28
0.3
0.32
0.34
0.36
0.38
0.4
Unobservable Absolute population Observable Relative population
Q. How can we infer the correlation btw. absolute population
from relative population?
#1 #2 #3 …
x1
x2
x3
…
#1 #2 #3 …
n1
n2
n3
…
Since , if sparse, neglig
Estimation of sparse correlation from relative
population
Friedman & Alm, 2012, PLOS Comp. Biol.
15
Solved!
Estimation of interaction in ecosystem
16
Q. Which microbes interacts with each other in time series
data?
Equation-based approach
Brunton, et al., 2016, PNAS
Equation-free approach
Deyle, et al., 2016, Proc. R. Soc. B
Suzuki, et al., 2017, Methods Ecol. Evol.
#1
t=1
#2
t=2
#3
t=3
…
x1
x2
x3
… Suzuki, et al., 2017, Methods Ecol. Evol.
?
Interaction?
VAR (vector autoregressive) model
17
: minimizing residual sum of squares
where is constant interaction from j to i
Data Model
Value at the next time step is
determined by current values
Fitting
For each i,
S-map for estimation of interaction
Deyle, et al., 2016, Proc. R. Soc. B
18
: minimizing weighted residual sum of squares
where is manifold dependent
interaction from j to i
Data Model
Value at the next time step is
determined by current values
Fitting
For each i,
For given , weight of for :
: parameter : normalize ter
Sparse S-map : extension of S-map to sparse
interaction
Suzuki, et al., 2017, Methods Ecol. Evol.
19
Sparse S-map = S-map + Stepwise variable selection + Baggin
S-map : limited variable size
1 2 N…Variables :
1
Selecte
d S-map Estimation Error
2
Selecte
d
Selecte
d …
Selecte
d
2
Selecte
d
N
Selecte
d…
1 2 3 2 N
︙
S-map Estimation Error
Step 1
Step 2
Bagging
Bagging
Selecte
d
Overview of analysis method for microbiota data &
Future prospect
20
Genomic
tree structure
Sparse interaction
network
Phenotypic
dynamics?
Measurement
Relative population
Other
Factor
s
Dynamics
in time
t
Dynamics
in space?
s
Control?

More Related Content

Similar to 腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota

2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenomec.titus.brown
 
EVE 161 Winter 2018 Class 16
EVE 161 Winter 2018 Class 16EVE 161 Winter 2018 Class 16
EVE 161 Winter 2018 Class 16Jonathan Eisen
 
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Md Rahman
 
Session ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corrSession ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corrUSD Bioinformatics
 
Classification_and_Ordination_Methods_as_a_Tool.pdf
Classification_and_Ordination_Methods_as_a_Tool.pdfClassification_and_Ordination_Methods_as_a_Tool.pdf
Classification_and_Ordination_Methods_as_a_Tool.pdfAgathaHaselvin
 
Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Alexander Decker
 
A Novel High Accuracy Algorithm for Reference Assembly in Colour Space
A Novel High Accuracy Algorithm for Reference Assembly in Colour SpaceA Novel High Accuracy Algorithm for Reference Assembly in Colour Space
A Novel High Accuracy Algorithm for Reference Assembly in Colour SpaceCSCJournals
 
Genetic diversity clustering and AMOVA
Genetic diversityclustering and AMOVAGenetic diversityclustering and AMOVA
Genetic diversity clustering and AMOVAFAO
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
Hierarchical clustering of multi class data (the zoo dataset)
Hierarchical clustering of multi class data (the zoo dataset)Hierarchical clustering of multi class data (the zoo dataset)
Hierarchical clustering of multi class data (the zoo dataset)Raid Mahbouba
 
Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)Tania Acuna
 
Inductive Classification through Evidence-based Models and Their Ensemble
Inductive Classification through Evidence-based Models and Their EnsembleInductive Classification through Evidence-based Models and Their Ensemble
Inductive Classification through Evidence-based Models and Their EnsembleGiuseppe Rizzo
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017David Cook
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotationScott Dawson
 
Trabajo de ingles (5)
Trabajo de ingles (5)Trabajo de ingles (5)
Trabajo de ingles (5)sasmaripo
 
The Algorithms of Life - Scientific Computing for Systems Biology
The Algorithms of Life - Scientific Computing for Systems BiologyThe Algorithms of Life - Scientific Computing for Systems Biology
The Algorithms of Life - Scientific Computing for Systems Biologyinside-BigData.com
 

Similar to 腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota (20)

2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
EVE 161 Winter 2018 Class 16
EVE 161 Winter 2018 Class 16EVE 161 Winter 2018 Class 16
EVE 161 Winter 2018 Class 16
 
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
 
ppt
pptppt
ppt
 
Session ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corrSession ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corr
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
Classification_and_Ordination_Methods_as_a_Tool.pdf
Classification_and_Ordination_Methods_as_a_Tool.pdfClassification_and_Ordination_Methods_as_a_Tool.pdf
Classification_and_Ordination_Methods_as_a_Tool.pdf
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...
 
MAGIC POPULATION
MAGIC POPULATIONMAGIC POPULATION
MAGIC POPULATION
 
A Novel High Accuracy Algorithm for Reference Assembly in Colour Space
A Novel High Accuracy Algorithm for Reference Assembly in Colour SpaceA Novel High Accuracy Algorithm for Reference Assembly in Colour Space
A Novel High Accuracy Algorithm for Reference Assembly in Colour Space
 
Genetic diversity clustering and AMOVA
Genetic diversityclustering and AMOVAGenetic diversityclustering and AMOVA
Genetic diversity clustering and AMOVA
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Hierarchical clustering of multi class data (the zoo dataset)
Hierarchical clustering of multi class data (the zoo dataset)Hierarchical clustering of multi class data (the zoo dataset)
Hierarchical clustering of multi class data (the zoo dataset)
 
Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)
 
Inductive Classification through Evidence-based Models and Their Ensemble
Inductive Classification through Evidence-based Models and Their EnsembleInductive Classification through Evidence-based Models and Their Ensemble
Inductive Classification through Evidence-based Models and Their Ensemble
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
Trabajo de ingles (5)
Trabajo de ingles (5)Trabajo de ingles (5)
Trabajo de ingles (5)
 
The Algorithms of Life - Scientific Computing for Systems Biology
The Algorithms of Life - Scientific Computing for Systems BiologyThe Algorithms of Life - Scientific Computing for Systems Biology
The Algorithms of Life - Scientific Computing for Systems Biology
 

Recently uploaded

Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 

Recently uploaded (20)

Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 

腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota

  • 1. 腸内細菌叢のメタゲノム解析に関する調査 A survey on metagenomic analysis for gut microbiota July 21st, 2017 Kazumasa Kaneko Kobayashi Lab 1
  • 2. Outline • Background : Importance of gut microbiota • Measurement : Metagenomic sequencing • Preprocessing : From sequences to OTUs by clustering • Data analysis methods • Future prospects 2
  • 3. Importance of gut microbiota 3 Many diseases are revealed to be related to the gut microbio Population : 1013 Species : 103 Wu, et al., 2011, Science Horai, et al., 2015, Immunity Sender, et al., 2016, Cell Sommer, et al., 2013, Nat. Rev. Microbiol. Autoimmune diseases on eyes Inflammatory Bowel Diseases Diabetes & Obesity Fig : http://www.irasutoya.com/
  • 4. Measurement : Metagenomic sequencing 4 16S rRNA gene Varies between species Q. How can we know the population of microbiota? A. Count gene sequences varying between species. ACGTGG… Just an error or different species? Environment : microbe Genome *1 https://www.slideshare.net/AshokSharma53/16s-classifier *2 http://togotv.dbcls.jp/ja/pics.html *1 Read by NGS (Next-generation sequencing) *2 Count AAGTGG… AAGTCG… What species?
  • 5. Preprocessing : from sequences to OTUs by clustering 5 Q. How can we define “species” in microbes? A. Currently no widely accepted concept. Instead, use OTUs (Operational Taxonomic Units) [Franzen, et al., OTUs : clustered units with <3% dissimilarity of 16S rRNA 2 ways of clustering methods: Heuristic approach Hierarchical approach ~2010s 2010s~ Computational cost Light Heavy Trend Ghodsi, et al., 2011 Li, et al., 2006 Edgar, 2010 Sun, et al., 2009 Matias, et al., 2014 Literatures
  • 6. Why was heuristic approach common? 6 2 main problems for OTUs clustering: 1. Large size of sequence reads ・100K reads/shot -> 100GB of distance matrix ・Hierarchical clustering algorithm : O(n2) 2. Computational cost on calculating distance btw. sequences ・103 bp/seq ・Sequence Alignment : O(mn) acctggtaaa acatgcgtata acctg-gtaaa acatgcgtata s1: s2: s1: s2:
  • 7. How heuristic approach alleviates computational cost? (1) 7 1. Large size of sequence reads -> Greedy algorithm ex) Start clustering from longest sequence (Li, et al., 2006) 1. Sort by length ATGCGTGGCAG TGGCTGGACA ATGGCATGG ︙ ATGCGTGGCAG 2. Pick longest as seed 3. Calc. distance & join into cluster Problem: wrong cluster depending on clustering order (Franzen, et al., 2 seed1 seed2 d(seed2, seq.i) d(seed1, seq.i) But seq.i belongs to seed1 cluster seq.i 4. iterate > threshold
  • 8. 8 2. Computational cost on calculating distance btw. Sequences -> Filtering (Li, et al., 2006) -> Prefix tree (Ghodsi, et al., 20 How heuristic approach alleviates computational cost? (2) ATCTGGCTAGCACCTGAGTTGA … … 1) Find chars. complete match (efficiently by look up table) 2) Let sequence length If no matches found, lower bound of mismatches is upper bound of matching rate is -> Filter by threshold A A T T G 1) Create prefix tree for sequence AAT… AAG… AT… … 2) Reuse DP matrix for next leaf A A T … Query Same as AAG… Calculate distance for AAT…
  • 9. Hierarchical Clustering for large data Sun, et al., 2009, Nucleic Acids Res. 9 0. Create a sparse sorted distance matrix skipping too dissimilar pairs by filtering Problem : Cannot pass a full distance matrix as input due to its Method : Sparse sorted matrix & on-the-fly processing Algorithm (complete linkage clustering) si 1 1 3 3 2 4 sj 2 3 4 5 3 5 dist. 0.1 0.2 0.3 0.4 0.5 0.6 1 2 4 5 3 Step 1 1 2 4 5 3 Step 2 1 2 4 5 3 Step 3 1 2 4 5 3 Step 4 1 2 4 5 3 Step 5 1 2 4 5 3 Step 6 index of seq. & distance.
  • 10. Outline • Background : Importance of gut microbiota • Measurement : metagenomic sequencing • Preprocessing : from sequences to OTUs by clustering • Data analysis methods • Future prospects 10
  • 11. SGSL : Sparse Group-Subgroup LASSO Garcia, et al., 2014, Bioinformatics 11 Phylogenetic tree Phylum? Family? Genus? Q. What are the key factors in microbiota to objectives? #1 #2 #3 … y x1 x2 x3 … … Observed data x1x2x3 … … → find subset of x in tree structure correlated to y
  • 12. Q. Estimate the values of parameters with sparse X in linear regression : Sparse regression by LASSO Tibshirani, 1996, J. R. Statist. Soc. B 12 A. LASSO (Least Absolute Shrinkage and Selection Operator) : # of data : data : Objective variable : Explanatory variables : Noise Tibshirani, 1996, J. R. Statist. Soc. B Penalty term w/ param. Sparse result Constraint Region
  • 13. Sparse From LASSO to SGSL 13 Lasso (Tibshirani, 1996) Group Lasso (Yuan& Lin, 2006) Sparse-group Lasso (Simon, et al., 2012) Sparse group subgroup Lasso (Garcia, et al., 2014) Group SparseGroup Sparse Group Subgroup
  • 14. How to estimate correlation? Estimation of correlation from relative population 14 16.5 17 17.5 18 18.5 19 19.5 20 20.5 21 21.5 7 8 9 10 11 12 13 0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4 Unobservable Absolute population Observable Relative population Q. How can we infer the correlation btw. absolute population from relative population? #1 #2 #3 … x1 x2 x3 … #1 #2 #3 … n1 n2 n3 …
  • 15. Since , if sparse, neglig Estimation of sparse correlation from relative population Friedman & Alm, 2012, PLOS Comp. Biol. 15 Solved!
  • 16. Estimation of interaction in ecosystem 16 Q. Which microbes interacts with each other in time series data? Equation-based approach Brunton, et al., 2016, PNAS Equation-free approach Deyle, et al., 2016, Proc. R. Soc. B Suzuki, et al., 2017, Methods Ecol. Evol. #1 t=1 #2 t=2 #3 t=3 … x1 x2 x3 … Suzuki, et al., 2017, Methods Ecol. Evol. ? Interaction?
  • 17. VAR (vector autoregressive) model 17 : minimizing residual sum of squares where is constant interaction from j to i Data Model Value at the next time step is determined by current values Fitting For each i,
  • 18. S-map for estimation of interaction Deyle, et al., 2016, Proc. R. Soc. B 18 : minimizing weighted residual sum of squares where is manifold dependent interaction from j to i Data Model Value at the next time step is determined by current values Fitting For each i, For given , weight of for : : parameter : normalize ter
  • 19. Sparse S-map : extension of S-map to sparse interaction Suzuki, et al., 2017, Methods Ecol. Evol. 19 Sparse S-map = S-map + Stepwise variable selection + Baggin S-map : limited variable size 1 2 N…Variables : 1 Selecte d S-map Estimation Error 2 Selecte d Selecte d … Selecte d 2 Selecte d N Selecte d… 1 2 3 2 N ︙ S-map Estimation Error Step 1 Step 2 Bagging Bagging Selecte d
  • 20. Overview of analysis method for microbiota data & Future prospect 20 Genomic tree structure Sparse interaction network Phenotypic dynamics? Measurement Relative population Other Factor s Dynamics in time t Dynamics in space? s Control?

Editor's Notes

  1. Genomic:ジノミック I will conclude my presentation with some future prospects
  2. Ten to the power of 13, ten to the power of 3 They are paying more attentions these days.
  3. There is a certain environment. we want to know about the composition of microbes in the environment. Each microbe has its じのむ, in the genome, there is a domain, this is a gene for ライボソーム We can get count data for each sequence What species do these sequences correspond to? And
  4. はいえらーきかる
  5. One hundred thousands On to the ßpower of two, Alignment is an algorithm, which arrange the sequences to identify similar region. How do the sequences correspond to each other How much they are similar to each other.
  6. A over L
  7. Y is
  8. Residual sum of squares If the scholar field of the cost function is ellipse like this,
  9. I will show the path from LASSO to Sparse group subgroup lasso.
  10. Candidate with smallest estimation error is picked up
  11. ふぇのてぃぴっく Get more healthy