SlideShare a Scribd company logo
1 of 23
Learning Biologically Relevant Features Using
Convolutional Neural Networks for DNA Sequence Analysis
22/01/2018
Invited Research Talk @ Bayer, Ghent, Belgium
Jasper Zuallaert, Wesley De Neve
¹ IDLab, ELIS, Ghent University, Ghent, Belgium
² Center for Biotech Data Science, Ghent University Global Campus (GUGC), Songdo, Korea
Introduction
Convolutional neural networks for DNA analysis
Visualization of biologically relevant features
Conclusions & future work
2
Introduction
Convolutional neural networks for DNA analysis
Visualization of biologically relevant features
Conclusions & future work
3
Automatic genome annotation
* Which parts of the genome correspond to which functionalities?
* Which anomalies in the genome correspond to diseases?
* Can we manipulate the genome to avoid or cure diseases?
→ First step in mapping functionality to the genome, is to structure it
?
Primary structure?
Tertiary structure?
Binding sites?
Exons? Introns?
Genes?
Secondary structure?
4
Expert knowledge on translation initiation & splice sites
ExonIntron
10s to 10 000s
< 20
Exon Intron
G C C G C C C C A T G G … A G G T A G T …
A C A
G A G
C T A … … N A G G … … …
A C C C C C C C C C C C C C
G T T T T T T T T T T T T T
TAA
TAG
TGA
10s to 100s
~ 20
Exon
Translation initiation site Donor splice site
Acceptor splice site
Polypyrimidine tractBranch point
Stop codon
5
Dataset composition
6
Fixed length
(~ 200 - 400)
annotated site
Datasets with true and pseudo splice / translation initiation sites
…AGCGGCATCCAGGTAAGTTCTTCAACCTGTAAGGGAGGCTTCAGTTAAAGCCATCCGA…
…AGCGGCATCCAGGTAAATGTCTTCAA…
…CATCCAGATGAAGTTCTTCAACCTAT…
…TGTCTTCAACCTGTAAGGGAGGCTTC…
…AGGGAGGCTTCAGTTAAAGCCATCCG…
Introduction
Convolutional neural networks for DNA analysis
Visualization of biologically relevant features
Conclusions & future work
7
125 126 215 218
158 210 056 089
068 063 066 067
054 065 045 023
085 112 102 106
058 154 156 181
085 084 120 123
The success of Deep Learning
8
Introduction of Deep Learning
1.2 million images
......
...
1000 classes
ImageNet classification competition
Neural networks
Input Output
Self-learning, black-box systems
Feedback → update parameters
9
Convolutional Neural Networks for images
Input image
Lines and
shapes Structures
Concepts
10
11
Convolutional Neural Networks for DNA sequences
A 1 0 0 0
G 0 0 1 0
T 0 0 0 1
T 0 0 0 1
C 0 1 0 0
A 1 0 0 0
G 0 0 1 0
G 0 0 1 0
T 0 0 0 1
A 1 0 0 0
G 0 0 1 0
C 0 1 0 0
C 0 1 0 0
T 0 0 0 1
C 0 2 0 0
A 1 0 0 0
G 0 0 2 0
G 0 0 3 0
T 0 0 0 1
A/G 1 0 2 0
T/C 0 2 0 1
T/C 0 1 0 1
T/C 0 1 0 1
Pattern detection Combination of
patterns from
previous layers
True splice site
False splice site
Beats state-of-the-art on various datasets with different sizes, class imbalance and sequence lengths
donors acceptors
Results on splice site prediction
A Degroeve et al, 2005 (SVM)
B Lee et al, 2015 (DBN)
C Our approach (CNN)
Positives Negatives
200 to 15 000 1000 to 75 000
240 x
12
A B C A B C
Introduction
Convolutional neural networks for DNA analysis
Visualization of biologically relevant features
Conclusions & future work
13
Goal → which parts of the input impact the prediction, and why?
0.74
0.26
14
Visualization of neural networks
120 206 55 75 85
128 155 23 178 164
250 216 223 217 64
23 54 54 237 253
16 24 101 132 177
0.05 0.01 0.06 0.07 -0.05
0.12 0.68 0.98 0.84 0.06
0.23 0.55 0.84 -0.06 -0.12
0.08 0.21 -0.06 -0.22 -0.23
0.02 0.06 -0.26 -0.83 -0.55
Step 1 --- Forward propagation
Calculate predictions
Step 2 --- Backpropagation
Calculate contribution scores per input
Visualization of neural networks
Images Genomic data
??
15
Source: Visualizing Deep Neural Network Decisions:
Prediction Difference Analysis, Zintgraf et al, 2017
Saliency map for prediction: cuckatoo
(Part of) saliency map for TIS prediction
1. Calculate* contribution scores per nucleotide (𝑐𝑠𝑖) , for each sequence
2. Normalize scores
3. Evaluation, e.g., by averaging over multiple sequences
Making sense of DNA saliency maps
16
𝑤𝑐𝑠𝑖𝑗 = 100 ∗ 𝑚 ∗
𝑐𝑠𝑖𝑗
𝑝=1
𝑚
𝑞=1
𝑛
𝑐𝑠 𝑝𝑞
* Using DeepLIFT
Learning Important Features Through Propagating
Activation Differences, Shrikumar et al., 2017
This gives a more interpretable meaning
to the contribution scores
+ it normalizes scores for different
datasets on the same scale
17
Visualization example: acceptor sites
Branch point detection
CTNA
AG
CAGGTAAG
AG exclusion zone- Polypyrimidine tract (Cs and Ts)
- Acceptor motif CAGG(T)
Presence of a donor pattern:
- Expected towards the end of the sequence
 most exons are <200 nucleotides long
18
Visualization example: donor sites
CAGGTAAG
Donor motif CAGGTAAGT
Presence of a donor pattern:
- Not expected in the
sequence, as this would imply
an unlikely short intron+exon
19
Visualization example: translation initiation sites
Translation initiation site motif (GCCACCATGGCG)
Presence of a donor motif:
- Expected after the site, as the end of the first exon
- Not expected in front of the site
Presence of a stop codon (TGA, TAA, TAG)
- Not expected at any third position (because of translation in codons)
- At other position, no influence
TAA
CAGGTAAG
Introduction
Convolutional neural networks for DNA analysis
Visualization of biologically relevant features
Conclusions & future work
20
Conclusions
Pattern detection with Convolutional Neural Networks = very effective for splice site / TIS prediction
- End-to-end prediction system
- No manual feature engineering needed
 Without any prior knowledge, biologically relevant features are learnt
Publications:
DLB2H 2017 - Interpretable Convolutional Neural Networks for Effective Translation Initiation Site Prediction
Under revision:
Bioinformatics - Interpretable Convolutional Neural Networks for Improved Splice Site Prediction
21
Improving predictions using extra forms of data
 Spatial properties, physicochemical properties, …
 Visualization of which inputs the network uses
Automating the workflow for determining biologically relevant features
 automatically group similar patterns with similar scores
+ look further into the network internals
 Verification on known problems (see previous slides)  detection of previously unknown patterns?
 Verification on an artificially created dataset, seeing if all induced features can be found
Future work
22
Thank you for your attention!
23

More Related Content

What's hot

Neural network-based techniques for the damage identification of bridges: a r...
Neural network-based techniques for the damage identification of bridges: a r...Neural network-based techniques for the damage identification of bridges: a r...
Neural network-based techniques for the damage identification of bridges: a r...StroNGER2012
 
Unsupervised Video Anomaly Detection: A brief overview
Unsupervised Video Anomaly Detection: A brief overviewUnsupervised Video Anomaly Detection: A brief overview
Unsupervised Video Anomaly Detection: A brief overviewRidge-i, Inc.
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET Journal
 
184816386 x mining
184816386 x mining184816386 x mining
184816386 x mining496573
 
NEURAL NETWORKS
NEURAL NETWORKSNEURAL NETWORKS
NEURAL NETWORKSESCOM
 
Improved steganographic security by
Improved steganographic security byImproved steganographic security by
Improved steganographic security byIJNSA Journal
 
CARLsim 3: Concepts, Tools, and Applications
CARLsim 3: Concepts, Tools, and ApplicationsCARLsim 3: Concepts, Tools, and Applications
CARLsim 3: Concepts, Tools, and ApplicationsMichael Beyeler
 
20141003.journal club
20141003.journal club20141003.journal club
20141003.journal clubHayaru SHOUNO
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionIJAEMSJORNAL
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSREHMAT ULLAH
 
Neural Networks for Pattern Recognition
Neural Networks for Pattern RecognitionNeural Networks for Pattern Recognition
Neural Networks for Pattern RecognitionVipra Singh
 
Deep Learning personalised, closed-loop Brain-Computer Interfaces for mu...
Deep  Learning  personalised, closed-loop  Brain-Computer  Interfaces  for mu...Deep  Learning  personalised, closed-loop  Brain-Computer  Interfaces  for mu...
Deep Learning personalised, closed-loop Brain-Computer Interfaces for mu...Willy Marroquin (WillyDevNET)
 
ADAPTIVE CRYPTO-STEGANOSYSTEM FOR VIDEOS BASED ON INFORMATION CONTENT AND VIS...
ADAPTIVE CRYPTO-STEGANOSYSTEM FOR VIDEOS BASED ON INFORMATION CONTENT AND VIS...ADAPTIVE CRYPTO-STEGANOSYSTEM FOR VIDEOS BASED ON INFORMATION CONTENT AND VIS...
ADAPTIVE CRYPTO-STEGANOSYSTEM FOR VIDEOS BASED ON INFORMATION CONTENT AND VIS...Prerana Mukherjee
 

What's hot (17)

Neural network-based techniques for the damage identification of bridges: a r...
Neural network-based techniques for the damage identification of bridges: a r...Neural network-based techniques for the damage identification of bridges: a r...
Neural network-based techniques for the damage identification of bridges: a r...
 
Unsupervised Video Anomaly Detection: A brief overview
Unsupervised Video Anomaly Detection: A brief overviewUnsupervised Video Anomaly Detection: A brief overview
Unsupervised Video Anomaly Detection: A brief overview
 
Basics of Neural Networks
Basics of Neural NetworksBasics of Neural Networks
Basics of Neural Networks
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and Python
 
184816386 x mining
184816386 x mining184816386 x mining
184816386 x mining
 
Mechanical
MechanicalMechanical
Mechanical
 
NEURAL NETWORKS
NEURAL NETWORKSNEURAL NETWORKS
NEURAL NETWORKS
 
Improved steganographic security by
Improved steganographic security byImproved steganographic security by
Improved steganographic security by
 
CARLsim 3: Concepts, Tools, and Applications
CARLsim 3: Concepts, Tools, and ApplicationsCARLsim 3: Concepts, Tools, and Applications
CARLsim 3: Concepts, Tools, and Applications
 
False colouring
False colouringFalse colouring
False colouring
 
20141003.journal club
20141003.journal club20141003.journal club
20141003.journal club
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
 
Exploring EEG for object detection and retrieval
Exploring EEG  for object detection and retrievalExploring EEG  for object detection and retrieval
Exploring EEG for object detection and retrieval
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKS
 
Neural Networks for Pattern Recognition
Neural Networks for Pattern RecognitionNeural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
 
Deep Learning personalised, closed-loop Brain-Computer Interfaces for mu...
Deep  Learning  personalised, closed-loop  Brain-Computer  Interfaces  for mu...Deep  Learning  personalised, closed-loop  Brain-Computer  Interfaces  for mu...
Deep Learning personalised, closed-loop Brain-Computer Interfaces for mu...
 
ADAPTIVE CRYPTO-STEGANOSYSTEM FOR VIDEOS BASED ON INFORMATION CONTENT AND VIS...
ADAPTIVE CRYPTO-STEGANOSYSTEM FOR VIDEOS BASED ON INFORMATION CONTENT AND VIS...ADAPTIVE CRYPTO-STEGANOSYSTEM FOR VIDEOS BASED ON INFORMATION CONTENT AND VIS...
ADAPTIVE CRYPTO-STEGANOSYSTEM FOR VIDEOS BASED ON INFORMATION CONTENT AND VIS...
 

Similar to Learning biologically relevant features using convolutional neural networks for dna sequence analysis

Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...Wesley De Neve
 
Towards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniquesTowards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniquesWesley De Neve
 
DReAMS: High Performance Reconfigurable Computing at NECSTLab
DReAMS: High Performance Reconfigurable Computing at NECSTLabDReAMS: High Performance Reconfigurable Computing at NECSTLab
DReAMS: High Performance Reconfigurable Computing at NECSTLabNECST Lab @ Politecnico di Milano
 
A new multiple classifiers soft decisions fusion approach for exons predictio...
A new multiple classifiers soft decisions fusion approach for exons predictio...A new multiple classifiers soft decisions fusion approach for exons predictio...
A new multiple classifiers soft decisions fusion approach for exons predictio...Ismail M. El-Badawy
 
Pruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferencePruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferenceKaushalya Madhawa
 
1-bit semantic segmentation
1-bit semantic segmentation1-bit semantic segmentation
1-bit semantic segmentationJeonghoonKim30
 
“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...
“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...
“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...Edge AI and Vision Alliance
 
2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptx
2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptx2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptx
2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptxssuser67281d
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
 
Howard University: Center for Computational Biology and Bioinformatics
Howard University: Center for Computational Biology and BioinformaticsHoward University: Center for Computational Biology and Bioinformatics
Howard University: Center for Computational Biology and Bioinformaticskarl.barnes
 
Reservoir computing fast deep learning for sequences
Reservoir computing   fast deep learning for sequencesReservoir computing   fast deep learning for sequences
Reservoir computing fast deep learning for sequencesClaudio Gallicchio
 
Implementation of Feed Forward Neural Network for Classification by Education...
Implementation of Feed Forward Neural Network for Classification by Education...Implementation of Feed Forward Neural Network for Classification by Education...
Implementation of Feed Forward Neural Network for Classification by Education...ijsrd.com
 
NeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximateProgramsNeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximateProgramsMohid Nabil
 
Towards neuralprocessingofgeneralpurposeapproximateprograms
Towards neuralprocessingofgeneralpurposeapproximateprogramsTowards neuralprocessingofgeneralpurposeapproximateprograms
Towards neuralprocessingofgeneralpurposeapproximateprogramsParidha Saxena
 
A Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningA Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningIRJET Journal
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
IRJET- Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...
IRJET-  	  Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...IRJET-  	  Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...
IRJET- Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...IRJET Journal
 
International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER) International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER) ijceronline
 
【Machine Lewarning】 Paper Presentation
【Machine Lewarning】 Paper Presentation【Machine Lewarning】 Paper Presentation
【Machine Lewarning】 Paper PresentationShun YU Ko
 

Similar to Learning biologically relevant features using convolutional neural networks for dna sequence analysis (20)

Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
 
Towards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniquesTowards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniques
 
DReAMS: High Performance Reconfigurable Computing at NECSTLab
DReAMS: High Performance Reconfigurable Computing at NECSTLabDReAMS: High Performance Reconfigurable Computing at NECSTLab
DReAMS: High Performance Reconfigurable Computing at NECSTLab
 
A new multiple classifiers soft decisions fusion approach for exons predictio...
A new multiple classifiers soft decisions fusion approach for exons predictio...A new multiple classifiers soft decisions fusion approach for exons predictio...
A new multiple classifiers soft decisions fusion approach for exons predictio...
 
Pruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferencePruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inference
 
1-bit semantic segmentation
1-bit semantic segmentation1-bit semantic segmentation
1-bit semantic segmentation
 
“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...
“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...
“New Methods for Implementation of 2-D Convolution for Convolutional Neural N...
 
2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptx
2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptx2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptx
2. NEURAL NETWORKS USING GENETIC ALGORITHMS.pptx
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic Computing
 
1.pptx
1.pptx1.pptx
1.pptx
 
Howard University: Center for Computational Biology and Bioinformatics
Howard University: Center for Computational Biology and BioinformaticsHoward University: Center for Computational Biology and Bioinformatics
Howard University: Center for Computational Biology and Bioinformatics
 
Reservoir computing fast deep learning for sequences
Reservoir computing   fast deep learning for sequencesReservoir computing   fast deep learning for sequences
Reservoir computing fast deep learning for sequences
 
Implementation of Feed Forward Neural Network for Classification by Education...
Implementation of Feed Forward Neural Network for Classification by Education...Implementation of Feed Forward Neural Network for Classification by Education...
Implementation of Feed Forward Neural Network for Classification by Education...
 
NeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximateProgramsNeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximatePrograms
 
Towards neuralprocessingofgeneralpurposeapproximateprograms
Towards neuralprocessingofgeneralpurposeapproximateprogramsTowards neuralprocessingofgeneralpurposeapproximateprograms
Towards neuralprocessingofgeneralpurposeapproximateprograms
 
A Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningA Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep Learning
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
IRJET- Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...
IRJET-  	  Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...IRJET-  	  Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...
IRJET- Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...
 
International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER) International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)
 
【Machine Lewarning】 Paper Presentation
【Machine Lewarning】 Paper Presentation【Machine Lewarning】 Paper Presentation
【Machine Lewarning】 Paper Presentation
 

More from Wesley De Neve

Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...
Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...
Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...Wesley De Neve
 
Investigating the biological relevance in trained embedding representations o...
Investigating the biological relevance in trained embedding representations o...Investigating the biological relevance in trained embedding representations o...
Investigating the biological relevance in trained embedding representations o...Wesley De Neve
 
Impact of adversarial examples on deep learning models for biomedical image s...
Impact of adversarial examples on deep learning models for biomedical image s...Impact of adversarial examples on deep learning models for biomedical image s...
Impact of adversarial examples on deep learning models for biomedical image s...Wesley De Neve
 
The 5th Aslla Symposium
The 5th Aslla SymposiumThe 5th Aslla Symposium
The 5th Aslla SymposiumWesley De Neve
 
Ghent University Global Campus 101
Ghent University Global Campus 101Ghent University Global Campus 101
Ghent University Global Campus 101Wesley De Neve
 
Booklet for the First GUGC Research Symposium
Booklet for the First GUGC Research SymposiumBooklet for the First GUGC Research Symposium
Booklet for the First GUGC Research SymposiumWesley De Neve
 
Center for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global CampusCenter for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global CampusWesley De Neve
 
Center for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global CampusCenter for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global CampusWesley De Neve
 
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Wesley De Neve
 
GUGC Info Session - Informatics and Bioinformatics
GUGC Info Session - Informatics and BioinformaticsGUGC Info Session - Informatics and Bioinformatics
GUGC Info Session - Informatics and BioinformaticsWesley De Neve
 
Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...
Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...
Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...Wesley De Neve
 
Ghent University and GUGC-K: Overview of Teaching and Research Activities
Ghent University and GUGC-K: Overview of Teaching and Research ActivitiesGhent University and GUGC-K: Overview of Teaching and Research Activities
Ghent University and GUGC-K: Overview of Teaching and Research ActivitiesWesley De Neve
 
Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...
Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...
Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...Wesley De Neve
 
Exploring Deep Machine Learning for Automatic Right Whale Recognition and No...
 Exploring Deep Machine Learning for Automatic Right Whale Recognition and No... Exploring Deep Machine Learning for Automatic Right Whale Recognition and No...
Exploring Deep Machine Learning for Automatic Right Whale Recognition and No...Wesley De Neve
 
Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...
Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...
Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...Wesley De Neve
 
Towards using multimedia technology for biological data processing
Towards using multimedia technology for biological data processingTowards using multimedia technology for biological data processing
Towards using multimedia technology for biological data processingWesley De Neve
 
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...Wesley De Neve
 
Towards Twitter hashtag recommendation using distributed word representations...
Towards Twitter hashtag recommendation using distributed word representations...Towards Twitter hashtag recommendation using distributed word representations...
Towards Twitter hashtag recommendation using distributed word representations...Wesley De Neve
 
Orientation day at the Ghent University Global Campus in Korea: Introduction
Orientation day at the Ghent University Global Campus in Korea: IntroductionOrientation day at the Ghent University Global Campus in Korea: Introduction
Orientation day at the Ghent University Global Campus in Korea: IntroductionWesley De Neve
 
Background Information & Suggestions for Joint Research Topics IVY Lab & MMLab
Background Information & Suggestions for Joint Research Topics IVY Lab & MMLabBackground Information & Suggestions for Joint Research Topics IVY Lab & MMLab
Background Information & Suggestions for Joint Research Topics IVY Lab & MMLabWesley De Neve
 

More from Wesley De Neve (20)

Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...
Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...
Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional ne...
 
Investigating the biological relevance in trained embedding representations o...
Investigating the biological relevance in trained embedding representations o...Investigating the biological relevance in trained embedding representations o...
Investigating the biological relevance in trained embedding representations o...
 
Impact of adversarial examples on deep learning models for biomedical image s...
Impact of adversarial examples on deep learning models for biomedical image s...Impact of adversarial examples on deep learning models for biomedical image s...
Impact of adversarial examples on deep learning models for biomedical image s...
 
The 5th Aslla Symposium
The 5th Aslla SymposiumThe 5th Aslla Symposium
The 5th Aslla Symposium
 
Ghent University Global Campus 101
Ghent University Global Campus 101Ghent University Global Campus 101
Ghent University Global Campus 101
 
Booklet for the First GUGC Research Symposium
Booklet for the First GUGC Research SymposiumBooklet for the First GUGC Research Symposium
Booklet for the First GUGC Research Symposium
 
Center for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global CampusCenter for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global Campus
 
Center for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global CampusCenter for Biotech Data Science at Ghent University Global Campus
Center for Biotech Data Science at Ghent University Global Campus
 
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
 
GUGC Info Session - Informatics and Bioinformatics
GUGC Info Session - Informatics and BioinformaticsGUGC Info Session - Informatics and Bioinformatics
GUGC Info Session - Informatics and Bioinformatics
 
Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...
Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...
Ghent University Global Campus - Sungkyunkwan University: Workshop on Researc...
 
Ghent University and GUGC-K: Overview of Teaching and Research Activities
Ghent University and GUGC-K: Overview of Teaching and Research ActivitiesGhent University and GUGC-K: Overview of Teaching and Research Activities
Ghent University and GUGC-K: Overview of Teaching and Research Activities
 
Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...
Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...
Biotech Data Science @ GUGC in Korea: Deep Learning for Prediction of Drug-Ta...
 
Exploring Deep Machine Learning for Automatic Right Whale Recognition and No...
 Exploring Deep Machine Learning for Automatic Right Whale Recognition and No... Exploring Deep Machine Learning for Automatic Right Whale Recognition and No...
Exploring Deep Machine Learning for Automatic Right Whale Recognition and No...
 
Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...
Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...
Deep Machine Learning for Automating Biotech Tasks Through Self-Learning Expe...
 
Towards using multimedia technology for biological data processing
Towards using multimedia technology for biological data processingTowards using multimedia technology for biological data processing
Towards using multimedia technology for biological data processing
 
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
Multimedia Lab @ Ghent University - iMinds - Organizational Overview & Outlin...
 
Towards Twitter hashtag recommendation using distributed word representations...
Towards Twitter hashtag recommendation using distributed word representations...Towards Twitter hashtag recommendation using distributed word representations...
Towards Twitter hashtag recommendation using distributed word representations...
 
Orientation day at the Ghent University Global Campus in Korea: Introduction
Orientation day at the Ghent University Global Campus in Korea: IntroductionOrientation day at the Ghent University Global Campus in Korea: Introduction
Orientation day at the Ghent University Global Campus in Korea: Introduction
 
Background Information & Suggestions for Joint Research Topics IVY Lab & MMLab
Background Information & Suggestions for Joint Research Topics IVY Lab & MMLabBackground Information & Suggestions for Joint Research Topics IVY Lab & MMLab
Background Information & Suggestions for Joint Research Topics IVY Lab & MMLab
 

Recently uploaded

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Recently uploaded (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Learning biologically relevant features using convolutional neural networks for dna sequence analysis

  • 1. Learning Biologically Relevant Features Using Convolutional Neural Networks for DNA Sequence Analysis 22/01/2018 Invited Research Talk @ Bayer, Ghent, Belgium Jasper Zuallaert, Wesley De Neve ¹ IDLab, ELIS, Ghent University, Ghent, Belgium ² Center for Biotech Data Science, Ghent University Global Campus (GUGC), Songdo, Korea
  • 2. Introduction Convolutional neural networks for DNA analysis Visualization of biologically relevant features Conclusions & future work 2
  • 3. Introduction Convolutional neural networks for DNA analysis Visualization of biologically relevant features Conclusions & future work 3
  • 4. Automatic genome annotation * Which parts of the genome correspond to which functionalities? * Which anomalies in the genome correspond to diseases? * Can we manipulate the genome to avoid or cure diseases? → First step in mapping functionality to the genome, is to structure it ? Primary structure? Tertiary structure? Binding sites? Exons? Introns? Genes? Secondary structure? 4
  • 5. Expert knowledge on translation initiation & splice sites ExonIntron 10s to 10 000s < 20 Exon Intron G C C G C C C C A T G G … A G G T A G T … A C A G A G C T A … … N A G G … … … A C C C C C C C C C C C C C G T T T T T T T T T T T T T TAA TAG TGA 10s to 100s ~ 20 Exon Translation initiation site Donor splice site Acceptor splice site Polypyrimidine tractBranch point Stop codon 5
  • 6. Dataset composition 6 Fixed length (~ 200 - 400) annotated site Datasets with true and pseudo splice / translation initiation sites …AGCGGCATCCAGGTAAGTTCTTCAACCTGTAAGGGAGGCTTCAGTTAAAGCCATCCGA… …AGCGGCATCCAGGTAAATGTCTTCAA… …CATCCAGATGAAGTTCTTCAACCTAT… …TGTCTTCAACCTGTAAGGGAGGCTTC… …AGGGAGGCTTCAGTTAAAGCCATCCG…
  • 7. Introduction Convolutional neural networks for DNA analysis Visualization of biologically relevant features Conclusions & future work 7
  • 8. 125 126 215 218 158 210 056 089 068 063 066 067 054 065 045 023 085 112 102 106 058 154 156 181 085 084 120 123 The success of Deep Learning 8 Introduction of Deep Learning 1.2 million images ...... ... 1000 classes ImageNet classification competition
  • 9. Neural networks Input Output Self-learning, black-box systems Feedback → update parameters 9
  • 10. Convolutional Neural Networks for images Input image Lines and shapes Structures Concepts 10
  • 11. 11 Convolutional Neural Networks for DNA sequences A 1 0 0 0 G 0 0 1 0 T 0 0 0 1 T 0 0 0 1 C 0 1 0 0 A 1 0 0 0 G 0 0 1 0 G 0 0 1 0 T 0 0 0 1 A 1 0 0 0 G 0 0 1 0 C 0 1 0 0 C 0 1 0 0 T 0 0 0 1 C 0 2 0 0 A 1 0 0 0 G 0 0 2 0 G 0 0 3 0 T 0 0 0 1 A/G 1 0 2 0 T/C 0 2 0 1 T/C 0 1 0 1 T/C 0 1 0 1 Pattern detection Combination of patterns from previous layers True splice site False splice site
  • 12. Beats state-of-the-art on various datasets with different sizes, class imbalance and sequence lengths donors acceptors Results on splice site prediction A Degroeve et al, 2005 (SVM) B Lee et al, 2015 (DBN) C Our approach (CNN) Positives Negatives 200 to 15 000 1000 to 75 000 240 x 12 A B C A B C
  • 13. Introduction Convolutional neural networks for DNA analysis Visualization of biologically relevant features Conclusions & future work 13
  • 14. Goal → which parts of the input impact the prediction, and why? 0.74 0.26 14 Visualization of neural networks 120 206 55 75 85 128 155 23 178 164 250 216 223 217 64 23 54 54 237 253 16 24 101 132 177 0.05 0.01 0.06 0.07 -0.05 0.12 0.68 0.98 0.84 0.06 0.23 0.55 0.84 -0.06 -0.12 0.08 0.21 -0.06 -0.22 -0.23 0.02 0.06 -0.26 -0.83 -0.55 Step 1 --- Forward propagation Calculate predictions Step 2 --- Backpropagation Calculate contribution scores per input
  • 15. Visualization of neural networks Images Genomic data ?? 15 Source: Visualizing Deep Neural Network Decisions: Prediction Difference Analysis, Zintgraf et al, 2017 Saliency map for prediction: cuckatoo (Part of) saliency map for TIS prediction
  • 16. 1. Calculate* contribution scores per nucleotide (𝑐𝑠𝑖) , for each sequence 2. Normalize scores 3. Evaluation, e.g., by averaging over multiple sequences Making sense of DNA saliency maps 16 𝑤𝑐𝑠𝑖𝑗 = 100 ∗ 𝑚 ∗ 𝑐𝑠𝑖𝑗 𝑝=1 𝑚 𝑞=1 𝑛 𝑐𝑠 𝑝𝑞 * Using DeepLIFT Learning Important Features Through Propagating Activation Differences, Shrikumar et al., 2017 This gives a more interpretable meaning to the contribution scores + it normalizes scores for different datasets on the same scale
  • 17. 17 Visualization example: acceptor sites Branch point detection CTNA AG CAGGTAAG AG exclusion zone- Polypyrimidine tract (Cs and Ts) - Acceptor motif CAGG(T) Presence of a donor pattern: - Expected towards the end of the sequence  most exons are <200 nucleotides long
  • 18. 18 Visualization example: donor sites CAGGTAAG Donor motif CAGGTAAGT Presence of a donor pattern: - Not expected in the sequence, as this would imply an unlikely short intron+exon
  • 19. 19 Visualization example: translation initiation sites Translation initiation site motif (GCCACCATGGCG) Presence of a donor motif: - Expected after the site, as the end of the first exon - Not expected in front of the site Presence of a stop codon (TGA, TAA, TAG) - Not expected at any third position (because of translation in codons) - At other position, no influence TAA CAGGTAAG
  • 20. Introduction Convolutional neural networks for DNA analysis Visualization of biologically relevant features Conclusions & future work 20
  • 21. Conclusions Pattern detection with Convolutional Neural Networks = very effective for splice site / TIS prediction - End-to-end prediction system - No manual feature engineering needed  Without any prior knowledge, biologically relevant features are learnt Publications: DLB2H 2017 - Interpretable Convolutional Neural Networks for Effective Translation Initiation Site Prediction Under revision: Bioinformatics - Interpretable Convolutional Neural Networks for Improved Splice Site Prediction 21
  • 22. Improving predictions using extra forms of data  Spatial properties, physicochemical properties, …  Visualization of which inputs the network uses Automating the workflow for determining biologically relevant features  automatically group similar patterns with similar scores + look further into the network internals  Verification on known problems (see previous slides)  detection of previously unknown patterns?  Verification on an artificially created dataset, seeing if all induced features can be found Future work 22
  • 23. Thank you for your attention! 23

Editor's Notes

  1. Automatic Genome Annotation Goal : finding links between genome and functionality, to fight diseases and others Incredibly complex matter: the genome is not just a linear DNA sequence, but a 3D structure with many internal and external dependencies To find dependencies, first we need to find some structure in the genome
  2. Regular approaches in the field require a manual extraction of features based on human experience. In the image you can the typical composition of translation initiation and splice sites
  3. The datasets we work with are built up by extracting all possible splice sites (all canonical splice sites, i.e. with GT in the middle), accompanied by a label indicating whether or not it is indeed a true splice site
  4. Results on splice site prediction The right graph summarizes the results of 24 x 10-fold cross-validation tests, on datasets with varying sizes (see table)
  5. In contrast to perturbation based approaches, backpropagation based approaches involve only one forward propagation and one back propagation per input. A variety of approaches are used, but they all work according to the principal of gradient calculation. We make use of DeepLIFT (Shrikumar et al, 2017), which is a backpropagation-based approach. Using these approaches, a saliency map is produced for each input.
  6. When looking at saliency maps for images, we immediately understand what our network is looking at. However, for DNA sequences, this is not the case. It is very hard to make sense of saliencies of different nucleotides.