SlideShare a Scribd company logo
1 of 71
1
T-BioInfo is designed for processing, analysis and
integration of multi-omics data. The platform is used in
multiple research groups to extract meaningful insights
from large multi-omics datasets. Our current effort
expands to education, by enabling more people to
extract meaningful, data-driven insights from omics
datasets with biomedical applications. To learn more
about the platform and it’s research and educational
features, follow the highlighted links .
T-bio.info | edu.t-bio.info | server.t-bio.info
2
3
4
5
Modeling Precision Medicine
Machine Learning forTranscriptomics Data: Extracting Meaningful
insights from high-throughput biomedical data.
6
Clinical Subtypes Molecular Subtypes
7
Diagnosis, Prognosis, Response toTreatment
8
Survival prediction
Treatment Selection
OncotypeDXPAM50
Daemen et al., 2013, “Modeling precision treatment of breast cancer”: an analysis of over 70 different Breast Cancer cell lines and over 90 different
therapeutic agents. https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110 9
Files we will use in this session
10
BREAK
11
Q&A
Part 1:
RNA-Seq Processing
from raw reads to a table of expression
12
RNA-Seq: overview
.…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….
Genome
13
Gene A Gene B Gene C
Transcr. ATranscript A Transcr. ATranscript C
14
.…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….Gene A Gene B Gene C
Transcr. ATranscript A Transcr. ATranscript C
Reads
RNA-Seq: overview
15
.…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….Gene A Gene B Gene C
Transcr. ATranscript A Transcr. ATranscript C
Reads
RNA-Seq: some details
1. Shattering 2. Adapters ligation 3. PCR amplification 4. “Reading”
Preprocessing:
• Adapters removal plus additional
• Removing PCR duplicates
16
Quantification of expression levels
Mapping
• Mapping on the set of known transcripts
• Mapping on genome (and potential
identification of novel transcripts)
• Combined strategy
RNA-Seq: overview
17
RNA-Seq: basic pipeline
18
Data Processing Practice
Create a pipeline:
1. Upload same SVL files
2. pre-processing steps:Trimmomatic, PCRclean
3. Mapping on Genome: HiSat2
4. IsoformConstruction: Cufflinks
5. GTF Merging: Cuffmerge
6. Mapping onTranscripts: Bowtie2-t
7. Quantification: RSEMExpTable
19
RNA-Seq: extended pipeline
20
ExpressionTable
Sample Name
Gene ID What is this number?
Standard Measures of RNA Quantification:
• Counts
• FPKM – fragments per kilobase per million mapped reads:
Number of reads mapped on the gene
((total number of mapped reads – in millions) x (gene length in
kilobases))
• TPM – transcripts per million
For one sample TPMg = C x FPKMg, where C is selected in such a way that sum of all
million. Constants C are different for different samples.
21
Linear scale vs Log-scale
Relative differences are biologically more meaningful than absolute.
are simplified if a log-scaling is performed:
Log-scaled measure =
log2 (linear-scale measure + shift)
For relatively large values:
difference equal to 1 in log-scale is a 2x difference in linear scale;
difference equal to 3 in log-scale is a 8x difference in linear scale. etc;
difference equal to -1 in log-scale is a 2x difference in linear scale, but in the opposite direction.
22
Preprocessing:
• Adapters removal plus additional
• Removing PCR duplicates
23
Quantification of expression levels
Mapping
• Mapping on the set of known transcripts
• Mapping on genome (and potential
identification of novel transcripts)
• Combined strategy
RNA-Seq: overview
Comparison: the role of preprocessing
24
High expression can be affected by pre-processing steps like PCR-clean and “Trimmomatic”
BREAK
25
Q&A
BREAK
26
Q&A
Error Correction – CORAL, ECHO, RACER, eMER
Different Mappers – HiSat,TopHat, STAR, BWA
Other Sections:
• Differential Expression – CuffDiff, EDGER, DESEQ
• Segmentation - BinS
Part 2:
Machine Learning
Data exploration and classification
27
28
Unsupervised Machine Learning
Dog
Dog
Dog
Cat
Cat
Cat
29
Group 1
Group 2
Outlier
Unsupervised analysis: PCA
30
• Explore data
• Visualize
Why use Principal Component
Analysis?
• Data Filtering
• Outliers
• Interpretation
Considerations:
31
Unsupervised analysis: PCA
32
Unsupervised analysis: PCA
PCA 7,000 genes PCA PAM50 (35) genes
Normal-like
Basal
Claudin-low
Luminal
33
Unsupervised analysis: Hierarchical Clustering
• Identify groups
• Associate sample to group
Why use clustering?
• Various methods
• Random selection in some methods
• Interpretation
Considerations:
34
Unsupervised analysis: Hierarchical Clustering
Unsupervised analysis: hierarchical clustering
Dendrogram
35
2 clusters
4 clusters
8 clusters
36
Unsupervised Analysis Practice
• Remove sample IDs
• Mark Group Names as ID
• Run H-clust
CellLines_ExprData_marked.txt
BREAK
37
Q&A
38
DogsCats
?????
Training Set Test Set
Supervised Machine Learning
39
Step-wise Linear Discriminant Analysis (swLDA)
40
SupportVector Machine (SVM) with Linear Kernel
d
d
41
SupportVector Machine (SVM) with Linear Kernel
?
?
42
Support Vector Machine (SVM) with Linear Kernel
• Fitting classifier on training set and predicting classes on the test set
• Is it possible to tune 7000 coefficients by 52 samples?
• Some algorithms do feature selection: swLDA, random forest
• Other algorithms won’t work if number of features >> number of
samples
• Curse of dimensionality
43
Considerations Supervised analysis
44
• Extracting 15 highly informative genes from the swLDA classifier
• How other supervised learning algorithms can be applied (e.g.,
SVM)
• Feature selection can also improve quality of unsupervised learning
analysis
Step-wise Linear Discriminant Analysis (swLDA)
45
Classification Practice
• Organize the table with 15
genes by sample type
• Color expression (green –
low; red – high)
• Which genes stand out?
• Which sample stand out?
• What groups are hard to
detect?
CellLines_15Genes_market.txt
46
Classification Practice: PCA of 15 gene table
47
Hierarchical Clustering of 15 gene table
N-like Basal
C-low
Luminal4 clusters
BREAK
48
Q&A
Part 3:
Interpretation
Annotating and Interpreting Gene Expression
49
Gene annotation: ENSG to Gene Symbols plus GO
50
51
Annotation Practice
52
http://www.oncotarget.com/index.php?journal=oncotarget&page=arti
cle&op=view&path[]=23869&path[]=75083
https://www.nature.com/articles/1208329
BREAK
53
Q&A
1. PCA plot using top 15 genes
from differential expression analysis
54
Homework:
Separation of samples from various sources:TCGA and PDX
55
2. New Datasets
56
Part 1: Conventional Machine Learning Approaches for Next
Generation Sequencing
Rapid RNA-seq processing for expression quantification applying
logical pipeline construction and pre-processing considerations.
hands-on exercises, participants will explore the expression
using conventional unsupervised machine learning methods and
supervised classifiers with and without feature extraction. Using
BioInfo platform, participants will learn about the logic and
considerations of applying such methods and be prepared for
independent downstream analysis and visualization of data
downloaded R scripts produced by the system. The
produced/downloaded code will be reviewed, customized and
subsequent session.
T-bio.info | edu.t-bio.info (FREE) | server.t-bio.info (14 days DEMO)
57
58
Required installations:
R >= 3.4
R Studio
gplots
ggfortify
ggplot2
ggpubr
e1071
mda
MASS
klaR
Part 2: Combining custom software with R to
streamline analysis workflows and visualize ‘Omics
data insights.
Differential Gene Expression, Gene Set Enrichment
Analysis
R visualization from scratch: utilize the same dataset for
basic data exploration and visualization in R.
This session will strengthen the participants ability to
transition to script-based workflows in RNA-seq
downstream analysis and visualization. Participants will
learn about downstream capabilities of R-based workflow
to transform and manipulate tables and visualize findings
in a meaningful way.
59
Download and Modify R Scripts
60
Differential expression analysis
Quantities related to the degree of differential expression:
• Difference between mean expression levels – fold change
(please, pay attention to scale);
• Statistical significance – p-value, adjusted p-value (e.g., FDR)
• Level of Expression (caution with low-expressed genes from the
analysis)
61
• Hard to interpret when number of groups is greater than two, so we can use Claudin-low vs normal-
like groups.
• Differential Expression is a natural and easy to interpret feature selection procedure.
• Pathway enrichment analysis can be applied to the resulting table 62
Differential expression analysis
63
Differential expression analysis
64
Differential expression analysis
Gene set / pathway enrichment analysis
GAGE -
• Use only lists (thresholding required): one of the standard tools here isThe
Database for Annotation,Visualization and Integrated Discovery – DAVID
(https://david.ncifcrf.gov/home.jsp, https://david-d.ncifcrf.gov/).
• Takes into consideration level of differential expression
65
66
Gene set / pathway enrichment analysis
67
Gene set / pathway enrichment analysis
68
Gene set / pathway enrichment analysis
Regulation of Actin Cytoskeleton B Cell Receptor Signaling Pathway
69
Required installations:
R >= 3.4
R Studio
gplots
ggfortify
ggplot2
ggpubr
e1071
mda
MASS
klaR
Part 2: Combining custom software with R to
streamline analysis workflows and visualize ‘Omics
data insights.
Differential Gene Expression, Gene Set Enrichment
Analysis
R visualization from scratch: utilize the same dataset for
basic data exploration and visualization in R.
This session will strengthen the participants ability to
transition to script-based workflows in RNA-seq
downstream analysis and visualization. Participants will
learn about downstream capabilities of R-based workflow
to transform and manipulate tables and visualize findings
in a meaningful way.
70
R Studio
71

More Related Content

What's hot

2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekingeProf. Wim Van Criekinge
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNJeremy Yang
 
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna DesignANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna DesignAIRCC Publishing Corporation
 
Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...ijcsit
 
Dynamic SA/Reports - ACS Philadelphia 2012
Dynamic SA/Reports - ACS Philadelphia 2012Dynamic SA/Reports - ACS Philadelphia 2012
Dynamic SA/Reports - ACS Philadelphia 2012Deepak Bandyopadhyay
 
Rational Drug Design using Genetic Algorithm
Rational Drug Design using Genetic Algorithm Rational Drug Design using Genetic Algorithm
Rational Drug Design using Genetic Algorithm Hassan Alsafi
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...ChemAxon
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekingeProf. Wim Van Criekinge
 
Zarlish attique 187104 project assignment modeller
Zarlish attique 187104 project assignment modellerZarlish attique 187104 project assignment modeller
Zarlish attique 187104 project assignment modellerZarlishAttique1
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821GenomeInABottle
 

What's hot (11)

2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
 
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna DesignANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
 
Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...
 
Dynamic SA/Reports - ACS Philadelphia 2012
Dynamic SA/Reports - ACS Philadelphia 2012Dynamic SA/Reports - ACS Philadelphia 2012
Dynamic SA/Reports - ACS Philadelphia 2012
 
Rational Drug Design using Genetic Algorithm
Rational Drug Design using Genetic Algorithm Rational Drug Design using Genetic Algorithm
Rational Drug Design using Genetic Algorithm
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 
Defence_5
Defence_5Defence_5
Defence_5
 
Zarlish attique 187104 project assignment modeller
Zarlish attique 187104 project assignment modellerZarlish attique 187104 project assignment modeller
Zarlish attique 187104 project assignment modeller
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
 

Similar to May 15 workshop

AIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdfAIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdfLayne Sadler
 
A short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analysesA short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analysestuxette
 
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3QIAGEN
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례mothersafe
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsNatalio Krasnogor
 
Neo4j_Cypher.pdf
Neo4j_Cypher.pdfNeo4j_Cypher.pdf
Neo4j_Cypher.pdfJaberRad1
 
A Method to facilitate cancer detection and type classification from gene exp...
A Method to facilitate cancer detection and type classification from gene exp...A Method to facilitate cancer detection and type classification from gene exp...
A Method to facilitate cancer detection and type classification from gene exp...Xi Chen
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...Enrico Glaab
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0Computer Science Club
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...DataScienceConferenc1
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsPawan Kumar
 
Data analysis
Data analysisData analysis
Data analysisamlbinder
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappanElsa von Licy
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger Eli Kaminuma
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshopGenomeInABottle
 

Similar to May 15 workshop (20)

AIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdfAIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdf
 
A short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analysesA short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analyses
 
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Tpa 2013
Tpa 2013Tpa 2013
Tpa 2013
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
 
Neo4j_Cypher.pdf
Neo4j_Cypher.pdfNeo4j_Cypher.pdf
Neo4j_Cypher.pdf
 
A Method to facilitate cancer detection and type classification from gene exp...
A Method to facilitate cancer detection and type classification from gene exp...A Method to facilitate cancer detection and type classification from gene exp...
A Method to facilitate cancer detection and type classification from gene exp...
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
Ngs webinar 2013
Ngs webinar 2013Ngs webinar 2013
Ngs webinar 2013
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Data analysis
Data analysisData analysis
Data analysis
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 

Recently uploaded

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 

Recently uploaded (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 

May 15 workshop

  • 1. 1
  • 2. T-BioInfo is designed for processing, analysis and integration of multi-omics data. The platform is used in multiple research groups to extract meaningful insights from large multi-omics datasets. Our current effort expands to education, by enabling more people to extract meaningful, data-driven insights from omics datasets with biomedical applications. To learn more about the platform and it’s research and educational features, follow the highlighted links . T-bio.info | edu.t-bio.info | server.t-bio.info 2
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. Modeling Precision Medicine Machine Learning forTranscriptomics Data: Extracting Meaningful insights from high-throughput biomedical data. 6
  • 8. Diagnosis, Prognosis, Response toTreatment 8 Survival prediction Treatment Selection OncotypeDXPAM50
  • 9. Daemen et al., 2013, “Modeling precision treatment of breast cancer”: an analysis of over 70 different Breast Cancer cell lines and over 90 different therapeutic agents. https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110 9
  • 10. Files we will use in this session 10
  • 12. Part 1: RNA-Seq Processing from raw reads to a table of expression 12
  • 13. RNA-Seq: overview .…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA…. Genome 13 Gene A Gene B Gene C Transcr. ATranscript A Transcr. ATranscript C
  • 14. 14 .…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….Gene A Gene B Gene C Transcr. ATranscript A Transcr. ATranscript C Reads RNA-Seq: overview
  • 15. 15 .…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….Gene A Gene B Gene C Transcr. ATranscript A Transcr. ATranscript C Reads RNA-Seq: some details 1. Shattering 2. Adapters ligation 3. PCR amplification 4. “Reading”
  • 16. Preprocessing: • Adapters removal plus additional • Removing PCR duplicates 16 Quantification of expression levels Mapping • Mapping on the set of known transcripts • Mapping on genome (and potential identification of novel transcripts) • Combined strategy RNA-Seq: overview
  • 18. 18 Data Processing Practice Create a pipeline: 1. Upload same SVL files 2. pre-processing steps:Trimmomatic, PCRclean 3. Mapping on Genome: HiSat2 4. IsoformConstruction: Cufflinks 5. GTF Merging: Cuffmerge 6. Mapping onTranscripts: Bowtie2-t 7. Quantification: RSEMExpTable
  • 21. Standard Measures of RNA Quantification: • Counts • FPKM – fragments per kilobase per million mapped reads: Number of reads mapped on the gene ((total number of mapped reads – in millions) x (gene length in kilobases)) • TPM – transcripts per million For one sample TPMg = C x FPKMg, where C is selected in such a way that sum of all million. Constants C are different for different samples. 21
  • 22. Linear scale vs Log-scale Relative differences are biologically more meaningful than absolute. are simplified if a log-scaling is performed: Log-scaled measure = log2 (linear-scale measure + shift) For relatively large values: difference equal to 1 in log-scale is a 2x difference in linear scale; difference equal to 3 in log-scale is a 8x difference in linear scale. etc; difference equal to -1 in log-scale is a 2x difference in linear scale, but in the opposite direction. 22
  • 23. Preprocessing: • Adapters removal plus additional • Removing PCR duplicates 23 Quantification of expression levels Mapping • Mapping on the set of known transcripts • Mapping on genome (and potential identification of novel transcripts) • Combined strategy RNA-Seq: overview
  • 24. Comparison: the role of preprocessing 24 High expression can be affected by pre-processing steps like PCR-clean and “Trimmomatic”
  • 26. BREAK 26 Q&A Error Correction – CORAL, ECHO, RACER, eMER Different Mappers – HiSat,TopHat, STAR, BWA Other Sections: • Differential Expression – CuffDiff, EDGER, DESEQ • Segmentation - BinS
  • 27. Part 2: Machine Learning Data exploration and classification 27
  • 30. Unsupervised analysis: PCA 30 • Explore data • Visualize Why use Principal Component Analysis? • Data Filtering • Outliers • Interpretation Considerations:
  • 32. 32 Unsupervised analysis: PCA PCA 7,000 genes PCA PAM50 (35) genes Normal-like Basal Claudin-low Luminal
  • 33. 33 Unsupervised analysis: Hierarchical Clustering • Identify groups • Associate sample to group Why use clustering? • Various methods • Random selection in some methods • Interpretation Considerations:
  • 35. Unsupervised analysis: hierarchical clustering Dendrogram 35 2 clusters 4 clusters 8 clusters
  • 36. 36 Unsupervised Analysis Practice • Remove sample IDs • Mark Group Names as ID • Run H-clust CellLines_ExprData_marked.txt
  • 38. 38 DogsCats ????? Training Set Test Set Supervised Machine Learning
  • 40. 40 SupportVector Machine (SVM) with Linear Kernel d d
  • 41. 41 SupportVector Machine (SVM) with Linear Kernel ?
  • 42. ? 42 Support Vector Machine (SVM) with Linear Kernel
  • 43. • Fitting classifier on training set and predicting classes on the test set • Is it possible to tune 7000 coefficients by 52 samples? • Some algorithms do feature selection: swLDA, random forest • Other algorithms won’t work if number of features >> number of samples • Curse of dimensionality 43 Considerations Supervised analysis
  • 44. 44 • Extracting 15 highly informative genes from the swLDA classifier • How other supervised learning algorithms can be applied (e.g., SVM) • Feature selection can also improve quality of unsupervised learning analysis Step-wise Linear Discriminant Analysis (swLDA)
  • 45. 45 Classification Practice • Organize the table with 15 genes by sample type • Color expression (green – low; red – high) • Which genes stand out? • Which sample stand out? • What groups are hard to detect? CellLines_15Genes_market.txt
  • 47. 47 Hierarchical Clustering of 15 gene table N-like Basal C-low Luminal4 clusters
  • 49. Part 3: Interpretation Annotating and Interpreting Gene Expression 49
  • 50. Gene annotation: ENSG to Gene Symbols plus GO 50
  • 54. 1. PCA plot using top 15 genes from differential expression analysis 54 Homework:
  • 55. Separation of samples from various sources:TCGA and PDX 55 2. New Datasets
  • 56. 56 Part 1: Conventional Machine Learning Approaches for Next Generation Sequencing Rapid RNA-seq processing for expression quantification applying logical pipeline construction and pre-processing considerations. hands-on exercises, participants will explore the expression using conventional unsupervised machine learning methods and supervised classifiers with and without feature extraction. Using BioInfo platform, participants will learn about the logic and considerations of applying such methods and be prepared for independent downstream analysis and visualization of data downloaded R scripts produced by the system. The produced/downloaded code will be reviewed, customized and subsequent session. T-bio.info | edu.t-bio.info (FREE) | server.t-bio.info (14 days DEMO)
  • 57. 57
  • 58. 58 Required installations: R >= 3.4 R Studio gplots ggfortify ggplot2 ggpubr e1071 mda MASS klaR Part 2: Combining custom software with R to streamline analysis workflows and visualize ‘Omics data insights. Differential Gene Expression, Gene Set Enrichment Analysis R visualization from scratch: utilize the same dataset for basic data exploration and visualization in R. This session will strengthen the participants ability to transition to script-based workflows in RNA-seq downstream analysis and visualization. Participants will learn about downstream capabilities of R-based workflow to transform and manipulate tables and visualize findings in a meaningful way.
  • 60. 60
  • 61. Differential expression analysis Quantities related to the degree of differential expression: • Difference between mean expression levels – fold change (please, pay attention to scale); • Statistical significance – p-value, adjusted p-value (e.g., FDR) • Level of Expression (caution with low-expressed genes from the analysis) 61
  • 62. • Hard to interpret when number of groups is greater than two, so we can use Claudin-low vs normal- like groups. • Differential Expression is a natural and easy to interpret feature selection procedure. • Pathway enrichment analysis can be applied to the resulting table 62 Differential expression analysis
  • 65. Gene set / pathway enrichment analysis GAGE - • Use only lists (thresholding required): one of the standard tools here isThe Database for Annotation,Visualization and Integrated Discovery – DAVID (https://david.ncifcrf.gov/home.jsp, https://david-d.ncifcrf.gov/). • Takes into consideration level of differential expression 65
  • 66. 66 Gene set / pathway enrichment analysis
  • 67. 67 Gene set / pathway enrichment analysis
  • 68. 68 Gene set / pathway enrichment analysis Regulation of Actin Cytoskeleton B Cell Receptor Signaling Pathway
  • 69. 69 Required installations: R >= 3.4 R Studio gplots ggfortify ggplot2 ggpubr e1071 mda MASS klaR Part 2: Combining custom software with R to streamline analysis workflows and visualize ‘Omics data insights. Differential Gene Expression, Gene Set Enrichment Analysis R visualization from scratch: utilize the same dataset for basic data exploration and visualization in R. This session will strengthen the participants ability to transition to script-based workflows in RNA-seq downstream analysis and visualization. Participants will learn about downstream capabilities of R-based workflow to transform and manipulate tables and visualize findings in a meaningful way.
  • 71. 71