SlideShare a Scribd company logo
Milko Krachunov2
, Ivan Popov1
, Valeria Simeonova2
, Irena Avdjieva1
,
Paweł Szczęsny3
, Urszula Zelenkiewicz3
, Piotr Zelenkiewicz3
,
Dimitar Vassilev1
1
Bioinforomatics group, AgroBioInstitute, Bulgaria
2
Faculty of mathematics and informatics; Sofia University “St. Kliment Ohridski”, Bulgaria
3
Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
Detection and correction of errors in
metagenomic 16S RNA parallel sequencing
NGS errors – common problems
 Introduced errors in the assembled reads due to
imperfections both of biological and mathematical origin;
Impossibility to re-sequence the same sample again in
metagenomic studies ;
Tendency the error rate to increase in every step of the
process;
No easy way to differentiate between “sequencing error” and
“rare variant”;
Many existing methods and algorithms concerning different
aspects of the problem but no unified solutions are available;
Large amounts of data are difficult to process with common
software.
Significance of 16S RNA sequencing
Highly conserved between different species of bacteria and
archaea;
Sequence analysis is done with universal PCR primers;
Contains hypervariable regions that can provide species-
specific signature sequences;
Suitable for phylogenetic studies;
Suitable for metagenomic studies.
General approach in metagenomic biodiversity studies
454 Sequencing
Filtering / Denoising
Multiple alignment
Distance matrix
ОTU clusters with abundance count
Our approach:
A. Raw data characteristics and processing
Two separate runs of metagenomic 16S RNA fragments,
sequenced with 454 platform and converted in FASTA format:
run 02 – 46429 short reads
run 04 – 41386 short reads
Our task – extract, denoise and correct only the quality
reads.
Raw data length histogram
Run 02 Run 04
B. Correction with SHREC
C. Correction with our method:
Classification and performance evaluation
ClaMS parameters:
Distance cut-off: 0,05
Signature type: DBC
k-mer length: 3
Existing taxonomy: 4th Level
Aim of the method – idea outline
To deal with the heterogeneous nature of the data, similar or
related sequences are considered more important in the error
evaluation
The naïve approach: If a base is less common than the
sequencer error rate, assume it’s likely an error and replace
with the most common base
Our modification: Calculate the occurrence of the base in
reads that are similar in the given region – assign them bigger
weights or use them exclusively
Progress so far
Calculate occurrence rates of every base in reads that are
identical to the evaluated read in a window with radius of n
bases
 Preliminary results: The first basic implementation leads to
an increase in the number of OTUs found with ClaMS
Under development
 Good choice(s) of approach for alignment of the reads
 Empirical evaluation of the parameters
 Comparative evaluation of the variants of the approach
Software used in this project:
Python: http://www.python.org/
Cython: http://cython.org/
MEGA (Molecular Evolutionary Genetics Analysis):
http://www.megasoftware.net/
Muscle: http://www.drive5.com/muscle/
SHREC (SHort Read Error Correction method):
http://ww2.cs.mu.oz.au/~schroder/shrec_www/
ClaMS (Classifier for Metagenomic Sequences): http://clams.jgi-
psf.org/
NINJA (modified): http://nimbletwist.com/software/ninja/index.html
R-package: http://www.r-project.org/
milko@3mhz.net
Thank you

More Related Content

What's hot

Prediction and visualisation of viral genome antigen using
Prediction and visualisation of viral genome antigen usingPrediction and visualisation of viral genome antigen using
Prediction and visualisation of viral genome antigen using
Shamik Tiwari
 
Network approaches to systems biology analysis of complex disease integrative...
Network approaches to systems biology analysis of complex disease integrative...Network approaches to systems biology analysis of complex disease integrative...
Network approaches to systems biology analysis of complex disease integrative...
PMAS Arid Agriculture University, Rawalpindi
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of Cancer
Open Networking Summit
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using Spark
Jen Aman
 
Bioinformatics Projects And Applications
Bioinformatics Projects And ApplicationsBioinformatics Projects And Applications
Bioinformatics Projects And Applications
Dr. Paulsharma Chakravarthy
 
Master's Thesis - deep genomics: harnessing the power of deep neural networks...
Master's Thesis - deep genomics: harnessing the power of deep neural networks...Master's Thesis - deep genomics: harnessing the power of deep neural networks...
Master's Thesis - deep genomics: harnessing the power of deep neural networks...
Enrico Busto
 
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
Rebeca Orellana
 
Modular RADAR: Immune System Inspired Strategies for Distributed Systems
Modular RADAR: Immune System Inspired Strategies for Distributed SystemsModular RADAR: Immune System Inspired Strategies for Distributed Systems
Modular RADAR: Immune System Inspired Strategies for Distributed Systems
Soumya Banerjee
 
nicolau_BioSketch
nicolau_BioSketchnicolau_BioSketch
nicolau_BioSketch
Monica Nicolau
 

What's hot (9)

Prediction and visualisation of viral genome antigen using
Prediction and visualisation of viral genome antigen usingPrediction and visualisation of viral genome antigen using
Prediction and visualisation of viral genome antigen using
 
Network approaches to systems biology analysis of complex disease integrative...
Network approaches to systems biology analysis of complex disease integrative...Network approaches to systems biology analysis of complex disease integrative...
Network approaches to systems biology analysis of complex disease integrative...
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of Cancer
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using Spark
 
Bioinformatics Projects And Applications
Bioinformatics Projects And ApplicationsBioinformatics Projects And Applications
Bioinformatics Projects And Applications
 
Master's Thesis - deep genomics: harnessing the power of deep neural networks...
Master's Thesis - deep genomics: harnessing the power of deep neural networks...Master's Thesis - deep genomics: harnessing the power of deep neural networks...
Master's Thesis - deep genomics: harnessing the power of deep neural networks...
 
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
Abstract STUDY OF NETWORK PROTOCOL ASTM DEFINED FOR THE CALCULATION OF REAGEN...
 
Modular RADAR: Immune System Inspired Strategies for Distributed Systems
Modular RADAR: Immune System Inspired Strategies for Distributed SystemsModular RADAR: Immune System Inspired Strategies for Distributed Systems
Modular RADAR: Immune System Inspired Strategies for Distributed Systems
 
nicolau_BioSketch
nicolau_BioSketchnicolau_BioSketch
nicolau_BioSketch
 

Viewers also liked

Ett Profile
Ett ProfileEtt Profile
Ett Profile
martin86315
 
3302 3305
3302 33053302 3305
3877 3884
3877 38843877 3884
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
Valeriya Simeonova
 
Product List
Product ListProduct List
Product List
martin86315
 
Simeonova
SimeonovaSimeonova
Startup pitching tips at LaunchPad 2015, Pakistan
Startup pitching tips at LaunchPad 2015, PakistanStartup pitching tips at LaunchPad 2015, Pakistan
Startup pitching tips at LaunchPad 2015, Pakistan
Siim Teller
 
Day in the life of a mobile commerce user
Day in the life of a mobile commerce userDay in the life of a mobile commerce user
Day in the life of a mobile commerce user
Siim Teller
 
Startup lessons from Estonia
Startup lessons from EstoniaStartup lessons from Estonia
Startup lessons from Estonia
Siim Teller
 
Thailand Mobile Market 2013
Thailand Mobile Market 2013Thailand Mobile Market 2013
Thailand Mobile Market 2013
Siim Teller
 
Pakistan Trends 2013: Online, Mobile, Social
Pakistan Trends 2013: Online, Mobile, SocialPakistan Trends 2013: Online, Mobile, Social
Pakistan Trends 2013: Online, Mobile, Social
Siim Teller
 

Viewers also liked (12)

Ett Profile
Ett ProfileEtt Profile
Ett Profile
 
3302 3305
3302 33053302 3305
3302 3305
 
3877 3884
3877 38843877 3884
3877 3884
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
Product List
Product ListProduct List
Product List
 
Simeonova
SimeonovaSimeonova
Simeonova
 
Startup pitching tips at LaunchPad 2015, Pakistan
Startup pitching tips at LaunchPad 2015, PakistanStartup pitching tips at LaunchPad 2015, Pakistan
Startup pitching tips at LaunchPad 2015, Pakistan
 
Day in the life of a mobile commerce user
Day in the life of a mobile commerce userDay in the life of a mobile commerce user
Day in the life of a mobile commerce user
 
Kontakt 2006
Kontakt 2006Kontakt 2006
Kontakt 2006
 
Startup lessons from Estonia
Startup lessons from EstoniaStartup lessons from Estonia
Startup lessons from Estonia
 
Thailand Mobile Market 2013
Thailand Mobile Market 2013Thailand Mobile Market 2013
Thailand Mobile Market 2013
 
Pakistan Trends 2013: Online, Mobile, Social
Pakistan Trends 2013: Online, Mobile, SocialPakistan Trends 2013: Online, Mobile, Social
Pakistan Trends 2013: Online, Mobile, Social
 

Similar to Milko stat seq_toulouse

Kirmitzoglou_PhD_Final
Kirmitzoglou_PhD_FinalKirmitzoglou_PhD_Final
Kirmitzoglou_PhD_Final
Ioannis Kirmitzoglou
 
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceEfficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
IJSTA
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
Yannick Wurm
 
Systems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasetsSystems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasets
improvemed
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
DataScienceConferenc1
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
Jaclyn Williams
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and Approaches
Elia Brodsky
 
STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...
Lars Juhl Jensen
 
Pathway analysis for genomics data
Pathway analysis for genomics dataPathway analysis for genomics data
Pathway analysis for genomics data
SakshiJha40
 
Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation Sequencing
Nixon Mendez
 
Assign 2.0 software for the analysis of Phred quality values for quality con...
Assign 2.0  software for the analysis of Phred quality values for quality con...Assign 2.0  software for the analysis of Phred quality values for quality con...
Assign 2.0 software for the analysis of Phred quality values for quality con...
Crystal Sanchez
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
Long Pei
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods
Zohaib HUSSAIN
 
Medical science
Medical scienceMedical science
Medical science
Palani Appan
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
VHIR Vall d’Hebron Institut de Recerca
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic Analysis
Prasanthperceptron
 
Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]
Luís Rita
 
Common copy number variation detection from multiple sequenced samples
Common copy number variation detection from multiple sequenced samplesCommon copy number variation detection from multiple sequenced samples
Common copy number variation detection from multiple sequenced samples
ieeepondy
 
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
Ji-Youn Yeo
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposter
Elsa Fecke
 

Similar to Milko stat seq_toulouse (20)

Kirmitzoglou_PhD_Final
Kirmitzoglou_PhD_FinalKirmitzoglou_PhD_Final
Kirmitzoglou_PhD_Final
 
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceEfficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
Systems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasetsSystems biology for Medicine' is 'Experimental methods and the big datasets
Systems biology for Medicine' is 'Experimental methods and the big datasets
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and Approaches
 
STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...
 
Pathway analysis for genomics data
Pathway analysis for genomics dataPathway analysis for genomics data
Pathway analysis for genomics data
 
Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation Sequencing
 
Assign 2.0 software for the analysis of Phred quality values for quality con...
Assign 2.0  software for the analysis of Phred quality values for quality con...Assign 2.0  software for the analysis of Phred quality values for quality con...
Assign 2.0 software for the analysis of Phred quality values for quality con...
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods
 
Medical science
Medical scienceMedical science
Medical science
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic Analysis
 
Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]
 
Common copy number variation detection from multiple sequenced samples
Common copy number variation detection from multiple sequenced samplesCommon copy number variation detection from multiple sequenced samples
Common copy number variation detection from multiple sequenced samples
 
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
2013-Blomquist-Targeted RNA-sequencing with competitive multiplex-PCR amplico...
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposter
 

Recently uploaded

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 

Recently uploaded (20)

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 

Milko stat seq_toulouse

  • 1. Milko Krachunov2 , Ivan Popov1 , Valeria Simeonova2 , Irena Avdjieva1 , Paweł Szczęsny3 , Urszula Zelenkiewicz3 , Piotr Zelenkiewicz3 , Dimitar Vassilev1 1 Bioinforomatics group, AgroBioInstitute, Bulgaria 2 Faculty of mathematics and informatics; Sofia University “St. Kliment Ohridski”, Bulgaria 3 Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland Detection and correction of errors in metagenomic 16S RNA parallel sequencing
  • 2. NGS errors – common problems  Introduced errors in the assembled reads due to imperfections both of biological and mathematical origin; Impossibility to re-sequence the same sample again in metagenomic studies ; Tendency the error rate to increase in every step of the process; No easy way to differentiate between “sequencing error” and “rare variant”; Many existing methods and algorithms concerning different aspects of the problem but no unified solutions are available; Large amounts of data are difficult to process with common software.
  • 3. Significance of 16S RNA sequencing Highly conserved between different species of bacteria and archaea; Sequence analysis is done with universal PCR primers; Contains hypervariable regions that can provide species- specific signature sequences; Suitable for phylogenetic studies; Suitable for metagenomic studies.
  • 4. General approach in metagenomic biodiversity studies 454 Sequencing Filtering / Denoising Multiple alignment Distance matrix ОTU clusters with abundance count
  • 6. A. Raw data characteristics and processing Two separate runs of metagenomic 16S RNA fragments, sequenced with 454 platform and converted in FASTA format: run 02 – 46429 short reads run 04 – 41386 short reads Our task – extract, denoise and correct only the quality reads.
  • 7. Raw data length histogram Run 02 Run 04
  • 9. C. Correction with our method:
  • 10. Classification and performance evaluation ClaMS parameters: Distance cut-off: 0,05 Signature type: DBC k-mer length: 3 Existing taxonomy: 4th Level
  • 11. Aim of the method – idea outline To deal with the heterogeneous nature of the data, similar or related sequences are considered more important in the error evaluation The naïve approach: If a base is less common than the sequencer error rate, assume it’s likely an error and replace with the most common base Our modification: Calculate the occurrence of the base in reads that are similar in the given region – assign them bigger weights or use them exclusively
  • 12. Progress so far Calculate occurrence rates of every base in reads that are identical to the evaluated read in a window with radius of n bases  Preliminary results: The first basic implementation leads to an increase in the number of OTUs found with ClaMS Under development  Good choice(s) of approach for alignment of the reads  Empirical evaluation of the parameters  Comparative evaluation of the variants of the approach
  • 13. Software used in this project: Python: http://www.python.org/ Cython: http://cython.org/ MEGA (Molecular Evolutionary Genetics Analysis): http://www.megasoftware.net/ Muscle: http://www.drive5.com/muscle/ SHREC (SHort Read Error Correction method): http://ww2.cs.mu.oz.au/~schroder/shrec_www/ ClaMS (Classifier for Metagenomic Sequences): http://clams.jgi- psf.org/ NINJA (modified): http://nimbletwist.com/software/ninja/index.html R-package: http://www.r-project.org/

Editor's Notes

  1. Last two change places?
  2. Нещо допълнително?
  3. Деф. заглавие!
  4. Още 1 доп. Слайд?