SlideShare a Scribd company logo
DNABind: A hybrid algorithm for structure-based prediction of
DNA-binding residues by combining machine learning- and
template-based approaches. Proteins. 2013 Jun 5.

20131019
生物物理若手関西支部 Journal Club
Topics
Prediction of protein-DNA binding residues
Statistics of network
Machine learning
Result: DNABind, a hybrid method of machine learning and template-based
approaches showed excellent performance on predicting DNA-binding residues.
Template

DNABind

EcoRV(1RVE:A)

CprK (3E6C:C)

Machine learning

True positive residues.
DNABind improves classification.
Query protein, Template protein, TP,

, FN
Aim

Protein-DNA interactions is important for cell biology.
Its determination by experiments is time- and cost-consuming.

Computational approaches are desirable.
Computational approaches
Data bank (PDB)
Binding residues characters
Exposed solvents
Higher electrostatics potential
More conserved
Hotspots as clusters of conserved residues

Structural properties (DNA-binding residue vs surface)
Packing density
Surface curvature
B-factor
Residue fluctuation
Hydrogen bond donor
http://www.rcsb.org/pdb/home/home.do
Computational algorithms
Feature-based
Extract effective features

Template-based
Align template and retrieve the best match

Template!!
Computational algorithms
Feature-based
Extract effective features

Template-based
Align template and retrieve the best match

Template!!
Computational algorithms
Feature-based
Extract effective features

Template-based
Align template and retrieve the best match

Template!!
Features used in machine learning
Structure-based
PSSM (position specific scoring matrix)
Evolutionally conservation
Solvent accessibility
Local geometry (depth and protrusion index)
Topological features
degree, closeness, betweenness, clustering coefficient

Relative position (distance to centroid)
Statistical potential (Boltzmann distribution)

Sequence-based (more difficult than structure)
Amino acid identity
Residue physicochemical properties
polarity, secondary structure, molecular volume, codon diversity, electrostatic charge

Predicted structure (Not need 3D structure !!)
Features used in machine learning
Structure-based
PSSM
Relative solvent accessibility
Depth and protrusion index
Topological features
Distance to centroid
Statistical potentials

Sequence-based
PSSM
Predicted structures
Amino acid indices
Statistical potentials

Construct machine learning (SVM)
Template-based approach
Used in image recognition, etc…
Recognition of faces in the camera.
Template!!
Template-based approach
Used in image recognition, etc…
Recognition of faces in the camera.
Match!!

Template!!
Template-based prediction
Template-based
Structural alignment and statistical potential
The binding residue prediction will be conducted only if the
target protein was considered as a DNA-binding protein.

312 templates were selected.
Network

Degree is a commonly used measure to reflect the local
connectivity of a node.
Closeness is a global centrality metric used to determine
how critical a residue is in a residue interaction network.
Betweenness of residue i is defined to be the sum of the
fraction of shortest paths between all pairs of residues
that pass through residue i.
Motif, hub, and community
are also important…

Clustering coefficient (transitivity) quantifies how close
its neighbors are to being a clique. Probability that the
adjacent vertices of a vertex are connected.
Network sample; human protein interactome
Scale-free
Small-world
Cluster
Power law (Pareto distribution)

Bioinformatics. 2012 Jan 1;28(1):84-90.
Machine learning
Example; spam
4601 samples, 57 parameters.
Classification; spam or nonspam
Machine learning
Support vector machine (SVM)
Decision tree
RandomForest
Logistic regression
LASSO (Elastic net and Ridge)
Neural networks (Deep learning)
Evolutionary algorithm
Gaussian processing
k nearest neighbor
Clustering
Bayesian networks
Association rule learning
Inductive logic programming (ILP)
Support vector machine (SVM)
Make hyperplane to divide groups.
Kernel method; non-linear to linear
Easy to do.
Much computational time.
Tuning is very difficult.
Decision tree
Make many trees.
Easy to understand graphically.
Performance is not so good.
RandomForest
Make many decision trees.
Much precise.
A little time consumer.
Logistic regression
Many medical researchers use…
Easy to use but tuning is very difficult.
(to tell the truth…)
LASSO, Elastic net, and Ridge regression
Least Absolute Shrinkage and Selection Operator

LASSO
Elastic Net
Ridge
Neural networks
Artificial mammal brain (perceptron).
Hidden multi-layer.
Deep learning is hot topic!!
(hard to understand…)

http://opencv.jp/opencv-1.0.0/document/opencvref_ml_nn.html
n-fold cross validation
To evaluate how the results of a statistical analysis will
generalize to an independent data set.
n-fold cross validation
To evaluate how the results of a statistical analysis will
generalize to an independent data set.
Train data
n-fold cross validation
To evaluate how the results of a statistical analysis will
generalize to an independent data set.
Train data
n-fold cross validation
To evaluate how the results of a statistical analysis will
generalize to an independent data set.
Train data
n-fold cross validation
To evaluate how the results of a statistical analysis will
generalize to an independent data set.
Train data
n-fold cross validation
To evaluate how the results of a statistical analysis will
generalize to an independent data set.
Train data
n-fold cross validation
To evaluate how the results of a statistical analysis will
generalize to an independent data set.
Train data

Test 1
One-leave out CV
Performance

SVM

Tree

RandomForest

LASSO

Elastic net

Ridge

Logistic

nnet

Recall

0.917

0.872

0.927

0.894

0.892

0.852

0.893

0.930

Precision

0.948

0.914

0.954

0.932

0.926

0.926

0.930

0.935

F

0.932

0.893

0.940

0.913

0.911

0.887

0.911

0.932

MMC

0.890

0.826

0.902

0.858

0.856

0.821

0.856

0.888
Combine two approaches
Statistical features of structure
A: Binding residues are highly solvent
accessible.
B, C: Binding residues have low depth and
high protrusion.
D-G: Not so much difference in networks.
H: Binding residues are less distant to the
centroid.
Performance
Performance

Higher TM score is required for good prediction.

TM-score is a measure of similarity between two protein structures with different tertiary
structures. < 0.2 is random relation and > 0.5 is highly related.
Proteins. 2004 Dec 1;57(4):702-10.
Nucleic Acids Res. 2005 Apr 22;33(7):2302-9.
Performance
Comparison among ML, TL, and DNABind.

Comparison between DNABind and other software.
Result: DNABind, a hybrid method of machine learning and template-based
approaches showed excellent performance on predicting DNA-binding residues.
Template

DNABind

EcoRV(1RVE:A)

CprK (3E6C:C)

Machine learning

True positive residues.
DNABind improves classification.
Query protein, Template protein, TP,

, FN

More Related Content

What's hot

Illumina sequencing introduction
Illumina sequencing introductionIllumina sequencing introduction
Illumina sequencing introduction
University of Allahabad
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
VHIR Vall d’Hebron Institut de Recerca
 
P24120125
P24120125P24120125
P24120125
IJERA Editor
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
Nexgen Technology
 
Deep Learning and Modern NLP
Deep Learning and Modern NLPDeep Learning and Modern NLP
Deep Learning and Modern NLP
Zachary S. Brown
 
Open science resources for `Big Data' Analyses of the human connectome
Open science resources for `Big Data' Analyses of the human connectomeOpen science resources for `Big Data' Analyses of the human connectome
Open science resources for `Big Data' Analyses of the human connectome
Cameron Craddock
 
A new revisited compression technique through innovative partition group binary
A new revisited compression technique through innovative partition group binaryA new revisited compression technique through innovative partition group binary
A new revisited compression technique through innovative partition group binaryIAEME Publication
 
27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap
nooriasukmaningtyas
 
Scaling metagenome assembly
Scaling metagenome assemblyScaling metagenome assembly
Scaling metagenome assembly
c.titus.brown
 
Kefed introduction 12-05-10-2224
Kefed introduction 12-05-10-2224Kefed introduction 12-05-10-2224
Kefed introduction 12-05-10-2224
Gully Burns
 
Recurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text ClassificationRecurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text Classification
Shuangshuang Zhou
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSO
tuxette
 
Finding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/HadoopFinding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/Hadoop
Mahmoud Parsian
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the Transcriptome
Sean Davis
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
c.titus.brown
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
BrianDeCost
 
Mining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity DataMining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity Data
Chris Southan
 

What's hot (20)

Rna seq
Rna seqRna seq
Rna seq
 
Illumina sequencing introduction
Illumina sequencing introductionIllumina sequencing introduction
Illumina sequencing introduction
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
 
P24120125
P24120125P24120125
P24120125
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 
Deep Learning and Modern NLP
Deep Learning and Modern NLPDeep Learning and Modern NLP
Deep Learning and Modern NLP
 
Open science resources for `Big Data' Analyses of the human connectome
Open science resources for `Big Data' Analyses of the human connectomeOpen science resources for `Big Data' Analyses of the human connectome
Open science resources for `Big Data' Analyses of the human connectome
 
A new revisited compression technique through innovative partition group binary
A new revisited compression technique through innovative partition group binaryA new revisited compression technique through innovative partition group binary
A new revisited compression technique through innovative partition group binary
 
27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap27 20 dec16 13794 28120-1-sm(edit)genap
27 20 dec16 13794 28120-1-sm(edit)genap
 
Scaling metagenome assembly
Scaling metagenome assemblyScaling metagenome assembly
Scaling metagenome assembly
 
Kefed introduction 12-05-10-2224
Kefed introduction 12-05-10-2224Kefed introduction 12-05-10-2224
Kefed introduction 12-05-10-2224
 
Recurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text ClassificationRecurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text Classification
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSO
 
Finding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/HadoopFinding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/Hadoop
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the Transcriptome
 
Myers CV_2015
Myers CV_2015Myers CV_2015
Myers CV_2015
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defense
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
Mining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity DataMining Drug Targets, Structures and Activity Data
Mining Drug Targets, Structures and Activity Data
 

Similar to 20131019 生物物理若手 Journal Club

Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
International Journal of Engineering Inventions www.ijeijournal.com
 
Tamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural NetworksTamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural Networks
DR.P.S.JAGADEESH KUMAR
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
Valery Tkachenko
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
Ganesan Narayanasamy
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
Meghaj Mallick
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataDmitry Grapov
 
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Arinze Akutekwe
 
Implementation of energy efficient coverage aware routing protocol for wirele...
Implementation of energy efficient coverage aware routing protocol for wirele...Implementation of energy efficient coverage aware routing protocol for wirele...
Implementation of energy efficient coverage aware routing protocol for wirele...
ijfcstjournal
 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...
Alexander Decker
 
Data mining
Data mining Data mining
Data mining
Jhadesunil
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
c.titus.brown
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
Dr.(Mrs).Gethsiyal Augasta
 
Pattern recognition system based on support vector machines
Pattern recognition system based on support vector machinesPattern recognition system based on support vector machines
Pattern recognition system based on support vector machines
Alexander Decker
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
tsysglobalsolutions
 
Ijetr042111
Ijetr042111Ijetr042111
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
Christopher Neighbor
 

Similar to 20131019 生物物理若手 Journal Club (20)

2224d_final
2224d_final2224d_final
2224d_final
 
2015-03-31_MotifGP
2015-03-31_MotifGP2015-03-31_MotifGP
2015-03-31_MotifGP
 
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
 
Tamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural NetworksTamil Character Recognition based on Back Propagation Neural Networks
Tamil Character Recognition based on Back Propagation Neural Networks
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
PPT
PPTPPT
PPT
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological data
 
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
 
Implementation of energy efficient coverage aware routing protocol for wirele...
Implementation of energy efficient coverage aware routing protocol for wirele...Implementation of energy efficient coverage aware routing protocol for wirele...
Implementation of energy efficient coverage aware routing protocol for wirele...
 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...
 
Data mining
Data mining Data mining
Data mining
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
Pattern recognition system based on support vector machines
Pattern recognition system based on support vector machinesPattern recognition system based on support vector machines
Pattern recognition system based on support vector machines
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
Ijetr042111
Ijetr042111Ijetr042111
Ijetr042111
 
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
Masters Thesis Defense: Minimum Complexity Echo State Networks For Genome and...
 
1207.2600
1207.26001207.2600
1207.2600
 

More from Med_KU

20160730tokyor55
20160730tokyor5520160730tokyor55
20160730tokyor55
Med_KU
 
20151205japanr
20151205japanr20151205japanr
20151205japanr
Med_KU
 
20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワーク
20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワーク20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワーク
20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワーク
Med_KU
 
20131216 Stat Journal
20131216 Stat Journal20131216 Stat Journal
20131216 Stat Journal
Med_KU
 
20131207 Japan.R#4 LT
20131207 Japan.R#4 LT20131207 Japan.R#4 LT
20131207 Japan.R#4 LT
Med_KU
 
20131110 第3回ニコニコ学会β データ研究会
20131110 第3回ニコニコ学会β データ研究会20131110 第3回ニコニコ学会β データ研究会
20131110 第3回ニコニコ学会β データ研究会Med_KU
 
20131109 TokyoR#35 Rでネットワーク解析とGIS
20131109 TokyoR#35 Rでネットワーク解析とGIS20131109 TokyoR#35 Rでネットワーク解析とGIS
20131109 TokyoR#35 Rでネットワーク解析とGISMed_KU
 
20131011 KashiwaR#9
20131011 KashiwaR#920131011 KashiwaR#9
20131011 KashiwaR#9Med_KU
 
20121120 検査と臨床判断
20121120 検査と臨床判断20121120 検査と臨床判断
20121120 検査と臨床判断Med_KU
 
20130701 統計論文勉強会 遺伝的差異の定量的解析法
20130701 統計論文勉強会 遺伝的差異の定量的解析法20130701 統計論文勉強会 遺伝的差異の定量的解析法
20130701 統計論文勉強会 遺伝的差異の定量的解析法Med_KU
 
20130609 アイドルマスター解析
20130609 アイドルマスター解析20130609 アイドルマスター解析
20130609 アイドルマスター解析
Med_KU
 
20130201 脳神経外科 脳腫瘍の浸潤数理モデル
20130201 脳神経外科 脳腫瘍の浸潤数理モデル20130201 脳神経外科 脳腫瘍の浸潤数理モデル
20130201 脳神経外科 脳腫瘍の浸潤数理モデル
Med_KU
 
20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析
20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析
20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析
Med_KU
 
20130608 Kashiwa.R#8 Rでプロット
20130608 Kashiwa.R#8 Rでプロット20130608 Kashiwa.R#8 Rでプロット
20130608 Kashiwa.R#8 Rでプロット
Med_KU
 
20130318 統計手法勉強会 外れ値検出 FRaC
20130318 統計手法勉強会 外れ値検出 FRaC20130318 統計手法勉強会 外れ値検出 FRaC
20130318 統計手法勉強会 外れ値検出 FRaC
Med_KU
 
20130220 Kashiwa.R#6
20130220 Kashiwa.R#620130220 Kashiwa.R#6
20130220 Kashiwa.R#6Med_KU
 
20121210 統計論文勉強会
20121210 統計論文勉強会20121210 統計論文勉強会
20121210 統計論文勉強会
Med_KU
 
20121130 Kashiwa.R#5
20121130 Kashiwa.R#520121130 Kashiwa.R#5
20121130 Kashiwa.R#5
Med_KU
 
20130727niconico
20130727niconico20130727niconico
20130727niconico
Med_KU
 
20130727niconicoLT
20130727niconicoLT20130727niconicoLT
20130727niconicoLT
Med_KU
 

More from Med_KU (20)

20160730tokyor55
20160730tokyor5520160730tokyor55
20160730tokyor55
 
20151205japanr
20151205japanr20151205japanr
20151205japanr
 
20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワーク
20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワーク20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワーク
20140308 第四回 ニコニコ学会β データ研究会 アニメ・声優・二次創作における百合ネットワーク
 
20131216 Stat Journal
20131216 Stat Journal20131216 Stat Journal
20131216 Stat Journal
 
20131207 Japan.R#4 LT
20131207 Japan.R#4 LT20131207 Japan.R#4 LT
20131207 Japan.R#4 LT
 
20131110 第3回ニコニコ学会β データ研究会
20131110 第3回ニコニコ学会β データ研究会20131110 第3回ニコニコ学会β データ研究会
20131110 第3回ニコニコ学会β データ研究会
 
20131109 TokyoR#35 Rでネットワーク解析とGIS
20131109 TokyoR#35 Rでネットワーク解析とGIS20131109 TokyoR#35 Rでネットワーク解析とGIS
20131109 TokyoR#35 Rでネットワーク解析とGIS
 
20131011 KashiwaR#9
20131011 KashiwaR#920131011 KashiwaR#9
20131011 KashiwaR#9
 
20121120 検査と臨床判断
20121120 検査と臨床判断20121120 検査と臨床判断
20121120 検査と臨床判断
 
20130701 統計論文勉強会 遺伝的差異の定量的解析法
20130701 統計論文勉強会 遺伝的差異の定量的解析法20130701 統計論文勉強会 遺伝的差異の定量的解析法
20130701 統計論文勉強会 遺伝的差異の定量的解析法
 
20130609 アイドルマスター解析
20130609 アイドルマスター解析20130609 アイドルマスター解析
20130609 アイドルマスター解析
 
20130201 脳神経外科 脳腫瘍の浸潤数理モデル
20130201 脳神経外科 脳腫瘍の浸潤数理モデル20130201 脳神経外科 脳腫瘍の浸潤数理モデル
20130201 脳神経外科 脳腫瘍の浸潤数理モデル
 
20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析
20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析
20130609 Wako.R トピックモデルを用いたボーカロイド楽曲の流行解析
 
20130608 Kashiwa.R#8 Rでプロット
20130608 Kashiwa.R#8 Rでプロット20130608 Kashiwa.R#8 Rでプロット
20130608 Kashiwa.R#8 Rでプロット
 
20130318 統計手法勉強会 外れ値検出 FRaC
20130318 統計手法勉強会 外れ値検出 FRaC20130318 統計手法勉強会 外れ値検出 FRaC
20130318 統計手法勉強会 外れ値検出 FRaC
 
20130220 Kashiwa.R#6
20130220 Kashiwa.R#620130220 Kashiwa.R#6
20130220 Kashiwa.R#6
 
20121210 統計論文勉強会
20121210 統計論文勉強会20121210 統計論文勉強会
20121210 統計論文勉強会
 
20121130 Kashiwa.R#5
20121130 Kashiwa.R#520121130 Kashiwa.R#5
20121130 Kashiwa.R#5
 
20130727niconico
20130727niconico20130727niconico
20130727niconico
 
20130727niconicoLT
20130727niconicoLT20130727niconicoLT
20130727niconicoLT
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 

20131019 生物物理若手 Journal Club

  • 1. DNABind: A hybrid algorithm for structure-based prediction of DNA-binding residues by combining machine learning- and template-based approaches. Proteins. 2013 Jun 5. 20131019 生物物理若手関西支部 Journal Club
  • 2. Topics Prediction of protein-DNA binding residues Statistics of network Machine learning
  • 3.
  • 4. Result: DNABind, a hybrid method of machine learning and template-based approaches showed excellent performance on predicting DNA-binding residues. Template DNABind EcoRV(1RVE:A) CprK (3E6C:C) Machine learning True positive residues. DNABind improves classification. Query protein, Template protein, TP, , FN
  • 5. Aim Protein-DNA interactions is important for cell biology. Its determination by experiments is time- and cost-consuming. Computational approaches are desirable.
  • 6. Computational approaches Data bank (PDB) Binding residues characters Exposed solvents Higher electrostatics potential More conserved Hotspots as clusters of conserved residues Structural properties (DNA-binding residue vs surface) Packing density Surface curvature B-factor Residue fluctuation Hydrogen bond donor http://www.rcsb.org/pdb/home/home.do
  • 7. Computational algorithms Feature-based Extract effective features Template-based Align template and retrieve the best match Template!!
  • 8. Computational algorithms Feature-based Extract effective features Template-based Align template and retrieve the best match Template!!
  • 9. Computational algorithms Feature-based Extract effective features Template-based Align template and retrieve the best match Template!!
  • 10. Features used in machine learning Structure-based PSSM (position specific scoring matrix) Evolutionally conservation Solvent accessibility Local geometry (depth and protrusion index) Topological features degree, closeness, betweenness, clustering coefficient Relative position (distance to centroid) Statistical potential (Boltzmann distribution) Sequence-based (more difficult than structure) Amino acid identity Residue physicochemical properties polarity, secondary structure, molecular volume, codon diversity, electrostatic charge Predicted structure (Not need 3D structure !!)
  • 11. Features used in machine learning Structure-based PSSM Relative solvent accessibility Depth and protrusion index Topological features Distance to centroid Statistical potentials Sequence-based PSSM Predicted structures Amino acid indices Statistical potentials Construct machine learning (SVM)
  • 12. Template-based approach Used in image recognition, etc… Recognition of faces in the camera. Template!!
  • 13. Template-based approach Used in image recognition, etc… Recognition of faces in the camera. Match!! Template!!
  • 14. Template-based prediction Template-based Structural alignment and statistical potential The binding residue prediction will be conducted only if the target protein was considered as a DNA-binding protein. 312 templates were selected.
  • 15. Network Degree is a commonly used measure to reflect the local connectivity of a node. Closeness is a global centrality metric used to determine how critical a residue is in a residue interaction network. Betweenness of residue i is defined to be the sum of the fraction of shortest paths between all pairs of residues that pass through residue i. Motif, hub, and community are also important… Clustering coefficient (transitivity) quantifies how close its neighbors are to being a clique. Probability that the adjacent vertices of a vertex are connected.
  • 16. Network sample; human protein interactome Scale-free Small-world Cluster Power law (Pareto distribution) Bioinformatics. 2012 Jan 1;28(1):84-90.
  • 17. Machine learning Example; spam 4601 samples, 57 parameters. Classification; spam or nonspam
  • 18. Machine learning Support vector machine (SVM) Decision tree RandomForest Logistic regression LASSO (Elastic net and Ridge) Neural networks (Deep learning) Evolutionary algorithm Gaussian processing k nearest neighbor Clustering Bayesian networks Association rule learning Inductive logic programming (ILP)
  • 19. Support vector machine (SVM) Make hyperplane to divide groups. Kernel method; non-linear to linear Easy to do. Much computational time. Tuning is very difficult.
  • 20. Decision tree Make many trees. Easy to understand graphically. Performance is not so good.
  • 21. RandomForest Make many decision trees. Much precise. A little time consumer.
  • 22. Logistic regression Many medical researchers use… Easy to use but tuning is very difficult. (to tell the truth…)
  • 23. LASSO, Elastic net, and Ridge regression Least Absolute Shrinkage and Selection Operator LASSO Elastic Net Ridge
  • 24. Neural networks Artificial mammal brain (perceptron). Hidden multi-layer. Deep learning is hot topic!! (hard to understand…) http://opencv.jp/opencv-1.0.0/document/opencvref_ml_nn.html
  • 25. n-fold cross validation To evaluate how the results of a statistical analysis will generalize to an independent data set.
  • 26. n-fold cross validation To evaluate how the results of a statistical analysis will generalize to an independent data set. Train data
  • 27. n-fold cross validation To evaluate how the results of a statistical analysis will generalize to an independent data set. Train data
  • 28. n-fold cross validation To evaluate how the results of a statistical analysis will generalize to an independent data set. Train data
  • 29. n-fold cross validation To evaluate how the results of a statistical analysis will generalize to an independent data set. Train data
  • 30. n-fold cross validation To evaluate how the results of a statistical analysis will generalize to an independent data set. Train data
  • 31. n-fold cross validation To evaluate how the results of a statistical analysis will generalize to an independent data set. Train data Test 1 One-leave out CV
  • 34. Statistical features of structure A: Binding residues are highly solvent accessible. B, C: Binding residues have low depth and high protrusion. D-G: Not so much difference in networks. H: Binding residues are less distant to the centroid.
  • 36. Performance Higher TM score is required for good prediction. TM-score is a measure of similarity between two protein structures with different tertiary structures. < 0.2 is random relation and > 0.5 is highly related. Proteins. 2004 Dec 1;57(4):702-10. Nucleic Acids Res. 2005 Apr 22;33(7):2302-9.
  • 37. Performance Comparison among ML, TL, and DNABind. Comparison between DNABind and other software.
  • 38. Result: DNABind, a hybrid method of machine learning and template-based approaches showed excellent performance on predicting DNA-binding residues. Template DNABind EcoRV(1RVE:A) CprK (3E6C:C) Machine learning True positive residues. DNABind improves classification. Query protein, Template protein, TP, , FN