SlideShare a Scribd company logo
1 of 71
1
T-BioInfo is designed for processing, analysis and
integration of multi-omics data. The platform is used in
multiple research groups to extract meaningful insights
from large multi-omics datasets. Our current effort
expands to education, by enabling more people to
extract meaningful, data-driven insights from omics
datasets with biomedical applications. To learn more
about the platform and it’s research and educational
features, follow the highlighted links .
T-bio.info | edu.t-bio.info | server.t-bio.info
2
3
4
5
Modeling Precision Medicine
Machine Learning forTranscriptomics Data: Extracting Meaningful
insights from high-throughput biomedical data.
6
Clinical Subtypes Molecular Subtypes
7
Diagnosis, Prognosis, Response toTreatment
8
Survival prediction
Treatment Selection
OncotypeDXPAM50
Daemen et al., 2013, “Modeling precision treatment of breast cancer”: an analysis of over 70 different Breast Cancer cell lines and over 90 different
therapeutic agents. https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110 9
Files we will use in this session
10
BREAK
11
Q&A
Part 1:
RNA-Seq Processing
from raw reads to a table of expression
12
RNA-Seq: overview
.…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….
Genome
13
Gene A Gene B Gene C
Transcr. ATranscript A Transcr. ATranscript C
14
.…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….Gene A Gene B Gene C
Transcr. ATranscript A Transcr. ATranscript C
Reads
RNA-Seq: overview
15
.…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….Gene A Gene B Gene C
Transcr. ATranscript A Transcr. ATranscript C
Reads
RNA-Seq: some details
1. Shattering 2. Adapters ligation 3. PCR amplification 4. “Reading”
Preprocessing:
• Adapters removal plus additional
• Removing PCR duplicates
16
Quantification of expression levels
Mapping
• Mapping on the set of known transcripts
• Mapping on genome (and potential
identification of novel transcripts)
• Combined strategy
RNA-Seq: overview
17
RNA-Seq: basic pipeline
18
Data Processing Practice
Create a pipeline:
1. Upload same SVL files
2. pre-processing steps:Trimmomatic, PCRclean
3. Mapping on Genome: HiSat2
4. IsoformConstruction: Cufflinks
5. GTF Merging: Cuffmerge
6. Mapping onTranscripts: Bowtie2-t
7. Quantification: RSEMExpTable
19
RNA-Seq: extended pipeline
20
ExpressionTable
Sample Name
Gene ID What is this number?
Standard Measures of RNA Quantification:
• Counts
• FPKM – fragments per kilobase per million mapped reads:
Number of reads mapped on the gene
((total number of mapped reads – in millions) x (gene length in
kilobases))
• TPM – transcripts per million
For one sample TPMg = C x FPKMg, where C is selected in such a way that sum of all
million. Constants C are different for different samples.
21
Linear scale vs Log-scale
Relative differences are biologically more meaningful than absolute.
Computations are simplified if a log-scaling is performed:
Log-scaled measure =
log2 (linear-scale measure + shift)
For relatively large values:
difference equal to 1 in log-scale is a 2x difference in linear scale;
difference equal to 3 in log-scale is a 8x difference in linear scale. etc;
difference equal to -1 in log-scale is a 2x difference in linear scale, but in the opposite
22
Preprocessing:
• Adapters removal plus additional
• Removing PCR duplicates
23
Quantification of expression levels
Mapping
• Mapping on the set of known transcripts
• Mapping on genome (and potential
identification of novel transcripts)
• Combined strategy
RNA-Seq: overview
Comparison: the role of preprocessing
24
High expression can be affected by pre-processing steps like PCR-clean and “Trimmomatic”
BREAK
25
Q&A
BREAK
26
Q&A
Error Correction – CORAL, ECHO, RACER, eMER
Different Mappers – HiSat,TopHat, STAR, BWA
Other Sections:
• Differential Expression – CuffDiff, EDGER, DESEQ
• Segmentation - BinS
Part 2:
Machine Learning
Data exploration and classification
27
28
Unsupervised Machine Learning
Dog
Dog
Dog
Cat
Cat
Cat
29
Group 1
Group 2
Outlier
Unsupervised analysis: PCA
30
• Explore data
• Visualize
Why use Principal Component
Analysis?
• Data Filtering
• Outliers
• Interpretation
Considerations:
31
Unsupervised analysis: PCA
32
Unsupervised analysis: PCA
PCA 7,000 genes PCA PAM50 (35) genes
Normal-like
Basal
Claudin-low
Luminal
33
Unsupervised analysis: Hierarchical Clustering
• Identify groups
• Associate sample to group
Why use clustering?
• Various methods
• Random selection in some methods
• Interpretation
Considerations:
34
Unsupervised analysis: Hierarchical Clustering
Unsupervised analysis: hierarchical clustering
Dendrogram
35
2 clusters
4 clusters
8 clusters
36
Unsupervised Analysis Practice
• Remove sample IDs
• Mark Group Names as ID
• Run H-clust
CellLines_ExprData_marked.txt
BREAK
37
Q&A
38
DogsCats
?????
Training Set Test Set
Supervised Machine Learning
39
Step-wise Linear Discriminant Analysis (swLDA)
40
SupportVector Machine (SVM) with Linear Kernel
d
d
41
SupportVector Machine (SVM) with Linear Kernel
?
?
42
Support Vector Machine (SVM) with Linear Kernel
• Fitting classifier on training set and predicting classes on the test set
• Is it possible to tune 7000 coefficients by 52 samples?
• Some algorithms do feature selection: swLDA, random forest
• Other algorithms won’t work if number of features >> number of
samples
• Curse of dimensionality
43
Considerations Supervised analysis
44
• Extracting 15 highly informative genes from the swLDA classifier
• How other supervised learning algorithms can be applied (e.g.,
SVM)
• Feature selection can also improve quality of unsupervised learning
analysis
Step-wise Linear Discriminant Analysis (swLDA)
45
Classification Practice
• Organize the table with 15
genes by sample type
• Color expression (green –
low; red – high)
• Which genes stand out?
• Which sample stand out?
• What groups are hard to
detect?
CellLines_15Genes_market.txt
46
Classification Practice: PCA of 15 gene table
47
Hierarchical Clustering of 15 gene table
N-like Basal
C-low
Luminal4 clusters
BREAK
48
Q&A
Part 3:
Interpretation
Annotating and Interpreting Gene Expression
49
Gene annotation: ENSG to Gene Symbols plus GO
50
51
Annotation Practice
52
http://www.oncotarget.com/index.php?journal=oncotarget&page=arti
cle&op=view&path[]=23869&path[]=75083
https://www.nature.com/articles/1208329
BREAK
53
Q&A
1. PCA plot using top 15 genes
from differential expression analysis
54
Homework:
Separation of samples from various sources:TCGA and PDX
55
2. New Datasets
56
Part 1: Conventional Machine Learning Approaches for Next
Generation Sequencing
Rapid RNA-seq processing for expression quantification applying
logical pipeline construction and pre-processing considerations.
hands-on exercises, participants will explore the expression
using conventional unsupervised machine learning methods and
supervised classifiers with and without feature extraction. Using
BioInfo platform, participants will learn about the logic and
considerations of applying such methods and be prepared for
independent downstream analysis and visualization of data
downloaded R scripts produced by the system. The
produced/downloaded code will be reviewed, customized and
subsequent session.
T-bio.info | edu.t-bio.info (FREE) | server.t-bio.info (14 days DEMO)
57
58
Required installations:
R >= 3.4
R Studio
gplots
ggfortify
ggplot2
ggpubr
e1071
mda
MASS
klaR
Part 2: Combining custom software with R to
streamline analysis workflows and visualize ‘Omics
data insights.
Differential Gene Expression, Gene Set Enrichment
Analysis
R visualization from scratch: utilize the same dataset for
basic data exploration and visualization in R.
This session will strengthen the participants ability to
transition to script-based workflows in RNA-seq
downstream analysis and visualization. Participants will
learn about downstream capabilities of R-based workflow
to transform and manipulate tables and visualize findings
in a meaningful way.
59
Download and Modify R Scripts
60
Differential expression analysis
Quantities related to the degree of differential expression:
• Difference between mean expression levels – fold change
(please, pay attention to scale);
• Statistical significance – p-value, adjusted p-value (e.g., FDR)
• Level of Expression (caution with low-expressed genes from the
analysis)
61
• Hard to interpret when number of groups is greater than two, so we can use Claudin-low vs normal-
like groups.
• Differential Expression is a natural and easy to interpret feature selection procedure.
• Pathway enrichment analysis can be applied to the resulting table 62
Differential expression analysis
63
Differential expression analysis
64
Differential expression analysis
Gene set / pathway enrichment analysis
GAGE -
• Use only lists (thresholding required): one of the standard tools here isThe
Database for Annotation,Visualization and Integrated Discovery – DAVID
(https://david.ncifcrf.gov/home.jsp, https://david-d.ncifcrf.gov/).
• Takes into consideration level of differential expression
65
66
Gene set / pathway enrichment analysis
67
Gene set / pathway enrichment analysis
68
Gene set / pathway enrichment analysis
Regulation of Actin Cytoskeleton B Cell Receptor Signaling Pathway
69
Required installations:
R >= 3.4
R Studio
gplots
ggfortify
ggplot2
ggpubr
e1071
mda
MASS
klaR
Part 2: Combining custom software with R to
streamline analysis workflows and visualize ‘Omics
data insights.
Differential Gene Expression, Gene Set Enrichment
Analysis
R visualization from scratch: utilize the same dataset for
basic data exploration and visualization in R.
This session will strengthen the participants ability to
transition to script-based workflows in RNA-seq
downstream analysis and visualization. Participants will
learn about downstream capabilities of R-based workflow
to transform and manipulate tables and visualize findings
in a meaningful way.
70
R Studio
71

More Related Content

What's hot

ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna DesignANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna DesignAIRCC Publishing Corporation
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekingeProf. Wim Van Criekinge
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNJeremy Yang
 
Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...ijcsit
 
Dynamic SA/Reports - ACS Philadelphia 2012
Dynamic SA/Reports - ACS Philadelphia 2012Dynamic SA/Reports - ACS Philadelphia 2012
Dynamic SA/Reports - ACS Philadelphia 2012Deepak Bandyopadhyay
 
Rational Drug Design using Genetic Algorithm
Rational Drug Design using Genetic Algorithm Rational Drug Design using Genetic Algorithm
Rational Drug Design using Genetic Algorithm Hassan Alsafi
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...ChemAxon
 
Zarlish attique 187104 project assignment modeller
Zarlish attique 187104 project assignment modellerZarlish attique 187104 project assignment modeller
Zarlish attique 187104 project assignment modellerZarlishAttique1
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekingeProf. Wim Van Criekinge
 

What's hot (9)

ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna DesignANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
 
Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...
 
Dynamic SA/Reports - ACS Philadelphia 2012
Dynamic SA/Reports - ACS Philadelphia 2012Dynamic SA/Reports - ACS Philadelphia 2012
Dynamic SA/Reports - ACS Philadelphia 2012
 
Rational Drug Design using Genetic Algorithm
Rational Drug Design using Genetic Algorithm Rational Drug Design using Genetic Algorithm
Rational Drug Design using Genetic Algorithm
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
 
Zarlish attique 187104 project assignment modeller
Zarlish attique 187104 project assignment modellerZarlish attique 187104 project assignment modeller
Zarlish attique 187104 project assignment modeller
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 

Similar to Machine Learning Insights from Multi-Omics Data

AIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdfAIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdfLayne Sadler
 
A short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analysesA short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analysestuxette
 
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3QIAGEN
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례mothersafe
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsNatalio Krasnogor
 
A Method to facilitate cancer detection and type classification from gene exp...
A Method to facilitate cancer detection and type classification from gene exp...A Method to facilitate cancer detection and type classification from gene exp...
A Method to facilitate cancer detection and type classification from gene exp...Xi Chen
 
Neo4j_Cypher.pdf
Neo4j_Cypher.pdfNeo4j_Cypher.pdf
Neo4j_Cypher.pdfJaberRad1
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...Enrico Glaab
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0Computer Science Club
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...DataScienceConferenc1
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsPawan Kumar
 
Data analysis
Data analysisData analysis
Data analysisamlbinder
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger Eli Kaminuma
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappanElsa von Licy
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 

Similar to Machine Learning Insights from Multi-Omics Data (20)

AIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdfAIQC - ISCB 2022.pdf
AIQC - ISCB 2022.pdf
 
A short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analysesA short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analyses
 
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
 
Tpa 2013
Tpa 2013Tpa 2013
Tpa 2013
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
 
A Method to facilitate cancer detection and type classification from gene exp...
A Method to facilitate cancer detection and type classification from gene exp...A Method to facilitate cancer detection and type classification from gene exp...
A Method to facilitate cancer detection and type classification from gene exp...
 
Neo4j_Cypher.pdf
Neo4j_Cypher.pdfNeo4j_Cypher.pdf
Neo4j_Cypher.pdf
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
Ngs webinar 2013
Ngs webinar 2013Ngs webinar 2013
Ngs webinar 2013
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Data analysis
Data analysisData analysis
Data analysis
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 

Recently uploaded

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 

Recently uploaded (20)

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 

Machine Learning Insights from Multi-Omics Data

  • 1. 1
  • 2. T-BioInfo is designed for processing, analysis and integration of multi-omics data. The platform is used in multiple research groups to extract meaningful insights from large multi-omics datasets. Our current effort expands to education, by enabling more people to extract meaningful, data-driven insights from omics datasets with biomedical applications. To learn more about the platform and it’s research and educational features, follow the highlighted links . T-bio.info | edu.t-bio.info | server.t-bio.info 2
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. Modeling Precision Medicine Machine Learning forTranscriptomics Data: Extracting Meaningful insights from high-throughput biomedical data. 6
  • 8. Diagnosis, Prognosis, Response toTreatment 8 Survival prediction Treatment Selection OncotypeDXPAM50
  • 9. Daemen et al., 2013, “Modeling precision treatment of breast cancer”: an analysis of over 70 different Breast Cancer cell lines and over 90 different therapeutic agents. https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110 9
  • 10. Files we will use in this session 10
  • 12. Part 1: RNA-Seq Processing from raw reads to a table of expression 12
  • 13. RNA-Seq: overview .…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA…. Genome 13 Gene A Gene B Gene C Transcr. ATranscript A Transcr. ATranscript C
  • 14. 14 .…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….Gene A Gene B Gene C Transcr. ATranscript A Transcr. ATranscript C Reads RNA-Seq: overview
  • 15. 15 .…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….Gene A Gene B Gene C Transcr. ATranscript A Transcr. ATranscript C Reads RNA-Seq: some details 1. Shattering 2. Adapters ligation 3. PCR amplification 4. “Reading”
  • 16. Preprocessing: • Adapters removal plus additional • Removing PCR duplicates 16 Quantification of expression levels Mapping • Mapping on the set of known transcripts • Mapping on genome (and potential identification of novel transcripts) • Combined strategy RNA-Seq: overview
  • 18. 18 Data Processing Practice Create a pipeline: 1. Upload same SVL files 2. pre-processing steps:Trimmomatic, PCRclean 3. Mapping on Genome: HiSat2 4. IsoformConstruction: Cufflinks 5. GTF Merging: Cuffmerge 6. Mapping onTranscripts: Bowtie2-t 7. Quantification: RSEMExpTable
  • 21. Standard Measures of RNA Quantification: • Counts • FPKM – fragments per kilobase per million mapped reads: Number of reads mapped on the gene ((total number of mapped reads – in millions) x (gene length in kilobases)) • TPM – transcripts per million For one sample TPMg = C x FPKMg, where C is selected in such a way that sum of all million. Constants C are different for different samples. 21
  • 22. Linear scale vs Log-scale Relative differences are biologically more meaningful than absolute. Computations are simplified if a log-scaling is performed: Log-scaled measure = log2 (linear-scale measure + shift) For relatively large values: difference equal to 1 in log-scale is a 2x difference in linear scale; difference equal to 3 in log-scale is a 8x difference in linear scale. etc; difference equal to -1 in log-scale is a 2x difference in linear scale, but in the opposite 22
  • 23. Preprocessing: • Adapters removal plus additional • Removing PCR duplicates 23 Quantification of expression levels Mapping • Mapping on the set of known transcripts • Mapping on genome (and potential identification of novel transcripts) • Combined strategy RNA-Seq: overview
  • 24. Comparison: the role of preprocessing 24 High expression can be affected by pre-processing steps like PCR-clean and “Trimmomatic”
  • 26. BREAK 26 Q&A Error Correction – CORAL, ECHO, RACER, eMER Different Mappers – HiSat,TopHat, STAR, BWA Other Sections: • Differential Expression – CuffDiff, EDGER, DESEQ • Segmentation - BinS
  • 27. Part 2: Machine Learning Data exploration and classification 27
  • 30. Unsupervised analysis: PCA 30 • Explore data • Visualize Why use Principal Component Analysis? • Data Filtering • Outliers • Interpretation Considerations:
  • 32. 32 Unsupervised analysis: PCA PCA 7,000 genes PCA PAM50 (35) genes Normal-like Basal Claudin-low Luminal
  • 33. 33 Unsupervised analysis: Hierarchical Clustering • Identify groups • Associate sample to group Why use clustering? • Various methods • Random selection in some methods • Interpretation Considerations:
  • 35. Unsupervised analysis: hierarchical clustering Dendrogram 35 2 clusters 4 clusters 8 clusters
  • 36. 36 Unsupervised Analysis Practice • Remove sample IDs • Mark Group Names as ID • Run H-clust CellLines_ExprData_marked.txt
  • 38. 38 DogsCats ????? Training Set Test Set Supervised Machine Learning
  • 40. 40 SupportVector Machine (SVM) with Linear Kernel d d
  • 41. 41 SupportVector Machine (SVM) with Linear Kernel ?
  • 42. ? 42 Support Vector Machine (SVM) with Linear Kernel
  • 43. • Fitting classifier on training set and predicting classes on the test set • Is it possible to tune 7000 coefficients by 52 samples? • Some algorithms do feature selection: swLDA, random forest • Other algorithms won’t work if number of features >> number of samples • Curse of dimensionality 43 Considerations Supervised analysis
  • 44. 44 • Extracting 15 highly informative genes from the swLDA classifier • How other supervised learning algorithms can be applied (e.g., SVM) • Feature selection can also improve quality of unsupervised learning analysis Step-wise Linear Discriminant Analysis (swLDA)
  • 45. 45 Classification Practice • Organize the table with 15 genes by sample type • Color expression (green – low; red – high) • Which genes stand out? • Which sample stand out? • What groups are hard to detect? CellLines_15Genes_market.txt
  • 47. 47 Hierarchical Clustering of 15 gene table N-like Basal C-low Luminal4 clusters
  • 49. Part 3: Interpretation Annotating and Interpreting Gene Expression 49
  • 50. Gene annotation: ENSG to Gene Symbols plus GO 50
  • 54. 1. PCA plot using top 15 genes from differential expression analysis 54 Homework:
  • 55. Separation of samples from various sources:TCGA and PDX 55 2. New Datasets
  • 56. 56 Part 1: Conventional Machine Learning Approaches for Next Generation Sequencing Rapid RNA-seq processing for expression quantification applying logical pipeline construction and pre-processing considerations. hands-on exercises, participants will explore the expression using conventional unsupervised machine learning methods and supervised classifiers with and without feature extraction. Using BioInfo platform, participants will learn about the logic and considerations of applying such methods and be prepared for independent downstream analysis and visualization of data downloaded R scripts produced by the system. The produced/downloaded code will be reviewed, customized and subsequent session. T-bio.info | edu.t-bio.info (FREE) | server.t-bio.info (14 days DEMO)
  • 57. 57
  • 58. 58 Required installations: R >= 3.4 R Studio gplots ggfortify ggplot2 ggpubr e1071 mda MASS klaR Part 2: Combining custom software with R to streamline analysis workflows and visualize ‘Omics data insights. Differential Gene Expression, Gene Set Enrichment Analysis R visualization from scratch: utilize the same dataset for basic data exploration and visualization in R. This session will strengthen the participants ability to transition to script-based workflows in RNA-seq downstream analysis and visualization. Participants will learn about downstream capabilities of R-based workflow to transform and manipulate tables and visualize findings in a meaningful way.
  • 60. 60
  • 61. Differential expression analysis Quantities related to the degree of differential expression: • Difference between mean expression levels – fold change (please, pay attention to scale); • Statistical significance – p-value, adjusted p-value (e.g., FDR) • Level of Expression (caution with low-expressed genes from the analysis) 61
  • 62. • Hard to interpret when number of groups is greater than two, so we can use Claudin-low vs normal- like groups. • Differential Expression is a natural and easy to interpret feature selection procedure. • Pathway enrichment analysis can be applied to the resulting table 62 Differential expression analysis
  • 65. Gene set / pathway enrichment analysis GAGE - • Use only lists (thresholding required): one of the standard tools here isThe Database for Annotation,Visualization and Integrated Discovery – DAVID (https://david.ncifcrf.gov/home.jsp, https://david-d.ncifcrf.gov/). • Takes into consideration level of differential expression 65
  • 66. 66 Gene set / pathway enrichment analysis
  • 67. 67 Gene set / pathway enrichment analysis
  • 68. 68 Gene set / pathway enrichment analysis Regulation of Actin Cytoskeleton B Cell Receptor Signaling Pathway
  • 69. 69 Required installations: R >= 3.4 R Studio gplots ggfortify ggplot2 ggpubr e1071 mda MASS klaR Part 2: Combining custom software with R to streamline analysis workflows and visualize ‘Omics data insights. Differential Gene Expression, Gene Set Enrichment Analysis R visualization from scratch: utilize the same dataset for basic data exploration and visualization in R. This session will strengthen the participants ability to transition to script-based workflows in RNA-seq downstream analysis and visualization. Participants will learn about downstream capabilities of R-based workflow to transform and manipulate tables and visualize findings in a meaningful way.
  • 71. 71