SlideShare a Scribd company logo
Can deep learning learn chromatin structure from sequence?
Nathalie Vialaneix
nathalie.vialaneix@inrae.fr
http://www.nathalievialaneix.eu
GT Chrocogen
January 20th, 2023
What is this chrocotalk about?
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 2
Outline
Hi-C data
Deep NN model
Data and training methodology
Results
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 3
3D organization of the genome
courtesy of Sylvain Foissac
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 4
Hi-C protocole
again courtesy of Sylvain Foissac
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 5
Hi-C matrices
also courtesy of Sylvain Foissac
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 6
Hi-C matrices
also courtesy of Sylvain Foissac
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 6
Page de pub : [Neuvial et al., 2023]
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 7
Question tackled by the article
Can we predict a Hi-C matrix from the DNA sequence?
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 8
Understanding binning and resolution
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 9
Outline
Hi-C data
Deep NN model
Data and training methodology
Results
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 10
The model
availability:
https://github.com/
jzhoulab/orca (trained
model and model to
train... all well
documented)
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 11
Two main parts: encoder/decoder
Encoder: from sequence to features
↓
↓
Decoder: from features to Hi-C matrix
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 12
Three (multi-resolution) modules trained sequentially
Orca-1Mb: used to predict one 1Mb
region at resolution 4Kb (bin size)a
a
hence, 250 × 250 Hi-C matrices
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 13
Three (multi-resolution) modules trained sequentially
Orca-1Mb: used to predict one 1Mb
region at resolution 4Kb (bin size)a
Orca-32Mb (multi-resolution): trained
using results from Orca-1Mb and used to
predict one region (size 1Mb, 2Mb, 4Mb,
. . . , 32Mb) at resolution 8kb, 16kb, . . . ,
128kba
a
hence, 250 × 250 Hi-C matrices
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 13
Three (multi-resolution) modules trained sequentially
Orca-1Mb: used to predict one 1Mb
region at resolution 4Kb (bin size)a
Orca-32Mb (multi-resolution): trained
using results from Orca-1Mb and used to
predict one region (size 1Mb, 2Mb, 4Mb,
. . . , 32Mb) at resolution 8kb, 16kb, . . . ,
128kba
Orca-256Mb (multi-resolution): trained
using results from Orca-32Mb and used to
predict one region (size 32Mb, 64Mb,
128Mb, 256Mb) at resolution 128kb,
256kb, 512kb, 1024kba
a
hence, 250 × 250 Hi-C matrices
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 13
Other features
▶ all Encoder blocks have a linear subblock and non linear subblock with two
convolutional layer each + ReLU + max pooling
▶ the Encoder block is also trained on an auxiliary task (predicting DNase-seq and
ChiP-seq)
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 14
Outline
Hi-C data
Deep NN model
Data and training methodology
Results
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 15
Used data
▶ two micro-C datasets (two models trained separately): H1-ESCs and HFF cells
(4D nucleome portal), normalized (matrix balancing) – 1 Hi-C dataset: HCT1116
(cohesin-depleted Hi-C)
▶ sequence: GRCh38/hg38 reference genome
▶ chromatine tracks
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 16
Methodology
▶ each model trained by random sampling of regions from training chromosomes
▶ evaluation is made on the same data for holdout (test) chromosomes
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 17
Outline
Hi-C data
Deep NN model
Data and training methodology
Results
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 18
Overall performance
▶ correlation 0.73-0.85 between prediction and experimental data
intra-chromosomes and 0.47-0.74 inter-chromosomes
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 19
Overall performance
▶ predicts well CTCF-based interaction but also polycomb-mediated interactions
and promoter-enhancer interactions
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 19
Predicting structural variation effects
▶ transposon-mediated 2kb TAD boundary element insertions (2kb ± 5kb
transposon)
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 20
Predicting structural variation effects
▶ 40.5Mb inversion mutation (involved in acute myeloid leukemia) – E/P interaction
demonstrated experimentally
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 20
Predicting structural variation effects
▶ multiple deletion, inversion, and duplication variants in the same region (involved
in limb malformation), each ∼ 1Mb – consistent with mechanisms observed in 4C
experimental data
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 20
Predicting structural variation effects
▶ regions with multiple variants leading to distinct phenotypes
???
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 20
In silico mutagenesis
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 21
Structural changes on compartments
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 22
Structural changes on compartments
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 22
Structural changes on compartments
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 22
References
(unofficial) Beamer template made with the help of Thomas Schiex, Matthias Zytnicki and Andreea Dreau:
https://forgemia.inra.fr/nathalie.villa-vialaneix/bainrae
Neuvial, P., Foissac, S., and Vialaneix, N. (2023).
Comprendre l’organisation spatiale de l’ADN à l’aide de la statistique.
In L’Interdisciplinarité. Voyage au-del‘a des Disciplines. CNRS.
Forthcoming (book chapter).
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 23
Other credits for figures
▶ DNA sequence (p 8, 12): NIH. https://www.genome.gov
▶ NN features (p 12): Horikawa & Kamitani, Frontiers in Computational Neuroscience, 2017. https://doi.org/10.3389/fncom.2017.00004
▶ Sequence CNN (p 14): Elfermi Rachid https://medium.com/analytics-vidhya/predicting-genes-with-cnn-bdf278504e79
GT Chrocogen
January 20th, 2023 / Nathalie Vialaneix
p. 23

More Related Content

Similar to Can deep learning learn chromatin structure from sequence?

O-BEE-COL
O-BEE-COLO-BEE-COL
O-BEE-COL
Mario Pavone
 
VOLT - ESWC 2016
VOLT - ESWC 2016VOLT - ESWC 2016
VOLT - ESWC 2016
Blake Regalia
 
Introduction of Feature Hashing
Introduction of Feature HashingIntroduction of Feature Hashing
Introduction of Feature Hashing
Wush Wu
 
A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...
A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...
A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...
Daniel H. Stolfi
 
Demystifying Garbage Collection in Java
Demystifying Garbage Collection in JavaDemystifying Garbage Collection in Java
Demystifying Garbage Collection in Java
Igor Braga
 
Climbing Mt. Metagenome
Climbing Mt. MetagenomeClimbing Mt. Metagenome
Climbing Mt. Metagenome
c.titus.brown
 
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
Koji Yamamoto
 
A Wavelet - Based Object Watermarking System for MPEG4 Video
A Wavelet - Based Object Watermarking System for MPEG4 VideoA Wavelet - Based Object Watermarking System for MPEG4 Video
A Wavelet - Based Object Watermarking System for MPEG4 Video
CSCJournals
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
Genome Reference Consortium
 
Cellular Backhauling over Satellite: The Other Side of the Coin
Cellular Backhauling over Satellite: The Other Side of the CoinCellular Backhauling over Satellite: The Other Side of the Coin
Cellular Backhauling over Satellite: The Other Side of the Coin
Small Cell Forum
 
IRJET- An Overview of Hiding Information in H.264/Avc Compressed Video
IRJET- An Overview of Hiding Information in H.264/Avc Compressed VideoIRJET- An Overview of Hiding Information in H.264/Avc Compressed Video
IRJET- An Overview of Hiding Information in H.264/Avc Compressed Video
IRJET Journal
 
Variation Graphs and Structural Variation
Variation Graphs and Structural VariationVariation Graphs and Structural Variation
Variation Graphs and Structural Variation
Eric Dawson
 
8_Meersman_FTTH-5G-Convergence_withoutPoll.pdf
8_Meersman_FTTH-5G-Convergence_withoutPoll.pdf8_Meersman_FTTH-5G-Convergence_withoutPoll.pdf
8_Meersman_FTTH-5G-Convergence_withoutPoll.pdf
OlabusayoOladiran1
 
An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...
NECST Lab @ Politecnico di Milano
 
G-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action DetectionG-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action Detection
Mengmeng Xu
 
|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain
Kan Yuenyong
 
Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)
Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)
Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)
Andrew Nix
 
GenoThreat / GenoGUARD -- open source biosecurity solution for the gene synth...
GenoThreat / GenoGUARD -- open source biosecurity solution for the gene synth...GenoThreat / GenoGUARD -- open source biosecurity solution for the gene synth...
GenoThreat / GenoGUARD -- open source biosecurity solution for the gene synth...
madalladam
 
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision..."Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
Edge AI and Vision Alliance
 

Similar to Can deep learning learn chromatin structure from sequence? (20)

O-BEE-COL
O-BEE-COLO-BEE-COL
O-BEE-COL
 
VOLT - ESWC 2016
VOLT - ESWC 2016VOLT - ESWC 2016
VOLT - ESWC 2016
 
Introduction of Feature Hashing
Introduction of Feature HashingIntroduction of Feature Hashing
Introduction of Feature Hashing
 
A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...
A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...
A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...
 
Demystifying Garbage Collection in Java
Demystifying Garbage Collection in JavaDemystifying Garbage Collection in Java
Demystifying Garbage Collection in Java
 
Climbing Mt. Metagenome
Climbing Mt. MetagenomeClimbing Mt. Metagenome
Climbing Mt. Metagenome
 
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
Machine Learning and Stochastic Geometry: Statistical Frameworks Against Unce...
 
A Wavelet - Based Object Watermarking System for MPEG4 Video
A Wavelet - Based Object Watermarking System for MPEG4 VideoA Wavelet - Based Object Watermarking System for MPEG4 Video
A Wavelet - Based Object Watermarking System for MPEG4 Video
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Cellular Backhauling over Satellite: The Other Side of the Coin
Cellular Backhauling over Satellite: The Other Side of the CoinCellular Backhauling over Satellite: The Other Side of the Coin
Cellular Backhauling over Satellite: The Other Side of the Coin
 
IRJET- An Overview of Hiding Information in H.264/Avc Compressed Video
IRJET- An Overview of Hiding Information in H.264/Avc Compressed VideoIRJET- An Overview of Hiding Information in H.264/Avc Compressed Video
IRJET- An Overview of Hiding Information in H.264/Avc Compressed Video
 
Variation Graphs and Structural Variation
Variation Graphs and Structural VariationVariation Graphs and Structural Variation
Variation Graphs and Structural Variation
 
8_Meersman_FTTH-5G-Convergence_withoutPoll.pdf
8_Meersman_FTTH-5G-Convergence_withoutPoll.pdf8_Meersman_FTTH-5G-Convergence_withoutPoll.pdf
8_Meersman_FTTH-5G-Convergence_withoutPoll.pdf
 
An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...
 
Presentation-Umar
Presentation-UmarPresentation-Umar
Presentation-Umar
 
G-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action DetectionG-TAD: Sub-Graph Localization for Temporal Action Detection
G-TAD: Sub-Graph Localization for Temporal Action Detection
 
|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain
 
Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)
Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)
Globecom 2015: Citywide MU vs SU MIMO (Siming Zhang)
 
GenoThreat / GenoGUARD -- open source biosecurity solution for the gene synth...
GenoThreat / GenoGUARD -- open source biosecurity solution for the gene synth...GenoThreat / GenoGUARD -- open source biosecurity solution for the gene synth...
GenoThreat / GenoGUARD -- open source biosecurity solution for the gene synth...
 
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision..."Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
"Separable Convolutions for Efficient Implementation of CNNs and Other Vision...
 

More from tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
tuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
tuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
tuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
tuxette
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
tuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
tuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
tuxette
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
tuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
tuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
tuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
tuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
tuxette
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
tuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
tuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
tuxette
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
tuxette
 

More from tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
 

Recently uploaded

一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 

Recently uploaded (20)

一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 

Can deep learning learn chromatin structure from sequence?

  • 1. Can deep learning learn chromatin structure from sequence? Nathalie Vialaneix nathalie.vialaneix@inrae.fr http://www.nathalievialaneix.eu GT Chrocogen January 20th, 2023
  • 2. What is this chrocotalk about? GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 2
  • 3. Outline Hi-C data Deep NN model Data and training methodology Results GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 3
  • 4. 3D organization of the genome courtesy of Sylvain Foissac GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 4
  • 5. Hi-C protocole again courtesy of Sylvain Foissac GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 5
  • 6. Hi-C matrices also courtesy of Sylvain Foissac GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 6
  • 7. Hi-C matrices also courtesy of Sylvain Foissac GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 6
  • 8. Page de pub : [Neuvial et al., 2023] GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 7
  • 9. Question tackled by the article Can we predict a Hi-C matrix from the DNA sequence? GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 8
  • 10. Understanding binning and resolution GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 9
  • 11. Outline Hi-C data Deep NN model Data and training methodology Results GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 10
  • 12. The model availability: https://github.com/ jzhoulab/orca (trained model and model to train... all well documented) GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 11
  • 13. Two main parts: encoder/decoder Encoder: from sequence to features ↓ ↓ Decoder: from features to Hi-C matrix GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 12
  • 14. Three (multi-resolution) modules trained sequentially Orca-1Mb: used to predict one 1Mb region at resolution 4Kb (bin size)a a hence, 250 × 250 Hi-C matrices GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 13
  • 15. Three (multi-resolution) modules trained sequentially Orca-1Mb: used to predict one 1Mb region at resolution 4Kb (bin size)a Orca-32Mb (multi-resolution): trained using results from Orca-1Mb and used to predict one region (size 1Mb, 2Mb, 4Mb, . . . , 32Mb) at resolution 8kb, 16kb, . . . , 128kba a hence, 250 × 250 Hi-C matrices GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 13
  • 16. Three (multi-resolution) modules trained sequentially Orca-1Mb: used to predict one 1Mb region at resolution 4Kb (bin size)a Orca-32Mb (multi-resolution): trained using results from Orca-1Mb and used to predict one region (size 1Mb, 2Mb, 4Mb, . . . , 32Mb) at resolution 8kb, 16kb, . . . , 128kba Orca-256Mb (multi-resolution): trained using results from Orca-32Mb and used to predict one region (size 32Mb, 64Mb, 128Mb, 256Mb) at resolution 128kb, 256kb, 512kb, 1024kba a hence, 250 × 250 Hi-C matrices GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 13
  • 17. Other features ▶ all Encoder blocks have a linear subblock and non linear subblock with two convolutional layer each + ReLU + max pooling ▶ the Encoder block is also trained on an auxiliary task (predicting DNase-seq and ChiP-seq) GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 14
  • 18. Outline Hi-C data Deep NN model Data and training methodology Results GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 15
  • 19. Used data ▶ two micro-C datasets (two models trained separately): H1-ESCs and HFF cells (4D nucleome portal), normalized (matrix balancing) – 1 Hi-C dataset: HCT1116 (cohesin-depleted Hi-C) ▶ sequence: GRCh38/hg38 reference genome ▶ chromatine tracks GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 16
  • 20. Methodology ▶ each model trained by random sampling of regions from training chromosomes ▶ evaluation is made on the same data for holdout (test) chromosomes GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 17
  • 21. Outline Hi-C data Deep NN model Data and training methodology Results GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 18
  • 22. Overall performance ▶ correlation 0.73-0.85 between prediction and experimental data intra-chromosomes and 0.47-0.74 inter-chromosomes GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 19
  • 23. Overall performance ▶ predicts well CTCF-based interaction but also polycomb-mediated interactions and promoter-enhancer interactions GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 19
  • 24. Predicting structural variation effects ▶ transposon-mediated 2kb TAD boundary element insertions (2kb ± 5kb transposon) GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 20
  • 25. Predicting structural variation effects ▶ 40.5Mb inversion mutation (involved in acute myeloid leukemia) – E/P interaction demonstrated experimentally GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 20
  • 26. Predicting structural variation effects ▶ multiple deletion, inversion, and duplication variants in the same region (involved in limb malformation), each ∼ 1Mb – consistent with mechanisms observed in 4C experimental data GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 20
  • 27. Predicting structural variation effects ▶ regions with multiple variants leading to distinct phenotypes ??? GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 20
  • 28. In silico mutagenesis GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 21
  • 29. Structural changes on compartments GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 22
  • 30. Structural changes on compartments GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 22
  • 31. Structural changes on compartments GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 22
  • 32. References (unofficial) Beamer template made with the help of Thomas Schiex, Matthias Zytnicki and Andreea Dreau: https://forgemia.inra.fr/nathalie.villa-vialaneix/bainrae Neuvial, P., Foissac, S., and Vialaneix, N. (2023). Comprendre l’organisation spatiale de l’ADN à l’aide de la statistique. In L’Interdisciplinarité. Voyage au-del‘a des Disciplines. CNRS. Forthcoming (book chapter). GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 23
  • 33. Other credits for figures ▶ DNA sequence (p 8, 12): NIH. https://www.genome.gov ▶ NN features (p 12): Horikawa & Kamitani, Frontiers in Computational Neuroscience, 2017. https://doi.org/10.3389/fncom.2017.00004 ▶ Sequence CNN (p 14): Elfermi Rachid https://medium.com/analytics-vidhya/predicting-genes-with-cnn-bdf278504e79 GT Chrocogen January 20th, 2023 / Nathalie Vialaneix p. 23