SlideShare a Scribd company logo
Associate professor Juho Rousu
Kernel Methods, Pattern
Analysis and
Computational
Metabolomics (KEPACO)
KEPACO in a Nutshell
We develop kernel-based machine learning methods for predicting of
structured, non-tabular data arising in biomedicine and digital health,
especially applications on small molecules such as drugs and metabolites.
Examples of research
questions:
• How to identify a
metabolite from its
mass spectrum?
• How to find
multivariate
associations
between SNPs and
metabolites?
• How to classify a
enzyme according to
its metabolomic
role?
Flowchart for a typical study
Contact: Juho Rousu
1. Structured Big Data
1 1 11 11 10 0 0 0 0 0 0 111 1 1 1 10 0 0 0 0 0 0
0.6 0.11 0.30.06 0.020.1 0.020 0 0 0 0 0 0 0.20.30.1 0.2 0.11 0.4 0.010 0 0 0 0 0 0Nodes Intensity
(NI)
NB
NI
Nodes Binary (NB)
0 10 20 30 40 50
01020304050
Number of active cell lines
Numberofactivecelllines
0.6
0.7
0.8
0.9
1.0
2. Efficient Kernel (non-linear
similarity metric) computation
3. Definition as a Regularized learning
problem (max-margin and/or sparsity)
4. Efficient training
using convex
optimization
Saddle point
Conditional
gradient
5 10 15 20
Metlin − PubChem
TOP k
percentageofcorrectlyidentified2Dstructures 0%
20%
40%
60%
80%
100%
5 10 15 20
our method
CFM−ID
MIDAS
MetFrag
Random
5. Prediction and interpretation
KEPACO Mission: Predicting
Structured data
Sequences Hierarchies Networks
Many data have logical structure, how can we make use of that in
making models faster and more accurate?
In particular, we are interested of predicting structured output
Example: Predicting Network
Response
Problem:
• To predict the spread of
activation in a complex network,
given an action/stimulus
Example:
• Input: post in a social network
• Output: subnetwork of friends
who share the post
a
i b
h
g
f
e
c
d
Al ice
David
Bob
Jacob
Ben
Dutch get the
championship!
Dutch get the
championship!
Dutch get the
championship!Dutch get the
championship! Soccer is not
my cup of tea.
Su, Hongyu, Aristides Gionis, and Juho Rousu. "Structured prediction of
network response." Proceedings of the 31st International Conference on
Machine Learning (ICML-14). 2014.
Example: Metabolite identification
through machine learning
• From a set of MS/MS spectra (x = structured input), learn a
model to predict molecular fingerprints (y = multilabel output)
• With predicted fingerprints retrieve candidate molecules from a
large molecular database
• Our CSI:Fingerid (http://www.csi-fingerid.org) tool is the most
accurate metabolite identification tool to date
Dührkop, K., Shen, H., Meusel, M., Rousu, J., & Böcker, S. (2015). Searching
molecular structure databases with tandem mass spectra using CSI: FingerID.
Proceedings of the National Academy of Sciences, 112(41), 12580-12585.
Huibin Shen
May 15, 2014
8/27
New fingerprint prediction framework
5 10 15 20
percentageofcorrectlyidentifiedstructures
0%
20%
40%
60%
80%
100%
5 10 15 20
top k
our method
FingerID
CFM−ID
MAGMa
Midas
MetFrag
Random
Example: Drug Bioactivity
Classification
• Data: x = Molecule, y = labeling of a graph of cell lines
• Compatibility score F(x,y): how well subset of cell lines go together
with the drug
• Co-occurring features: molecular fingerprints vs. subsets of cell lines
• Prediction: finding the best scoring labeling of the graph
MCF7
MDA_MB_231
HS578T
BT_549
T47D
SF_268
SF_295
SF_539
SNB_19
SNB_75
U251
COLO205
HCC_2998
HCT_116
HCT_15
HT29
KM12
SW_620
CCRF_CEM
HL_60
K_562
MOLT_4
RPMI_8226
SR
LOXIMVI
MALME_3M
M14
SK_MEL_2
SK_MEL_28
SK_MEL_5
UACC_257
UACC_62
MDA_MB_435
MDA_N
A549
EKVX
HOP_62
HOP_92
NCI_H226
NCI_H23
NCI_H322M
NCI_H460
NCI_H522
IGROV1
OVCAR_3
OVCAR_4
OVCAR_5
OVCAR_8
SK_OV_3
NCI_ADR_RES
PC_3DU_145
786_0
A498
ACHN
CAKI_1
RXF_393
TK_10
UO_31
Su, Hongyu, and Juho Rousu. "Multilabel classification through random
graph ensembles." Machine Learning 99.2 (2013): 231-256.
Example: Multivariate association
meta-analysis
• Problem: find associations
btw. groups of SNPs
(=genotype) and groups of
metabolites (=phenotype)
• Meta-analysis setting
• Several studies analyzed
together
• No original data available, only
summary statistics of indivudual
studies
• Our metaCCA tool can recover
multivariate associations
nearly as accurately as if
original data was available
2.10.2015
Cichonska A, Rousu J, Marttinen P, Kangas AJ, Soininen P, Lehtimäki T,
Raitakari OT, Järvelin MR, Salomaa V, Ala-Korpela M, Ripatti S. metaCCA:
Summary statistics-based multivariate meta-analysis of genome-wide
association studies using canonical correlation analysis. bioRxiv. 2015 Jan
1:022665.
Example: Biological network
reconstruction
Input: genomic measurements Output: biological network
Pitkänen E, Jouhten P, Hou J, Syed MF, Blomberg P, Kludas J, Oja M, Holm L,
Penttilä M, Rousu J, Arvas M. Comparative genome-scale reconstruction of gapless
metabolic networks for present and ancestral species. PLoS Comput Biol. 2014 Feb
1;10(2):1003465.
Example: Synthetic biology
• Goal: energy and carbon efficient
industrial processes through
synthetic biology
• Consortium: VTT, Aalto, U. Turku
Our focus
• In silico design, modelling and
optimization of synthetic biosystems
• Systems Modelling + Machine
Learning
LiF
Fermentation
experiments
Genome-wide
measurements
In silico
experimental
design
Strain/Process
modification
Tekes Strategic Opening “LiF – Living Factories”, 2014-
Kernel Methods, Pattern Analysis &
Computational Metabolomics (KEPACO)
KEPACO research group
Dr. Sandor Szedmak
Dr. Sahely Bhadra (w. S.Kaski)
Dr. Markus Heinonen (w. H. Lähdesmäki)
Dr. Celine Brouard
Dr. Elena Czeizler
Dr. Hongyu Su
MSc Huibin Shen
MSc Anna Cichonska (HIIT&FIMM)
MSc Viivi Uurtio
+ students
Funding
• Academy of Finland
• EU FP7
• Tekes
KEPACO group summer 2015

More Related Content

What's hot

NetBioSIG2013-KEYNOTE Benno Schwikowski
NetBioSIG2013-KEYNOTE Benno SchwikowskiNetBioSIG2013-KEYNOTE Benno Schwikowski
NetBioSIG2013-KEYNOTE Benno Schwikowski
Alexander Pico
 

What's hot (20)

Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3
 
Technology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network RepresentationsTechnology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network Representations
 
Next Generation Sequence with Pathway Studio
Next Generation Sequence with Pathway StudioNext Generation Sequence with Pathway Studio
Next Generation Sequence with Pathway Studio
 
Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
 
NetBioSIG2013-KEYNOTE Benno Schwikowski
NetBioSIG2013-KEYNOTE Benno SchwikowskiNetBioSIG2013-KEYNOTE Benno Schwikowski
NetBioSIG2013-KEYNOTE Benno Schwikowski
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
 
Bioinformatics ppt
Bioinformatics pptBioinformatics ppt
Bioinformatics ppt
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019
 
Systems Immunology -- 2014
Systems Immunology -- 2014Systems Immunology -- 2014
Systems Immunology -- 2014
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Use of data
Use of dataUse of data
Use of data
 
B.3.5
B.3.5B.3.5
B.3.5
 

Viewers also liked

Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational Data
Benjamin Bengfort
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...
Thanh Hieu
 

Viewers also liked (8)

Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational Data
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...
 
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Document Classification with Neo4j
Document Classification with Neo4jDocument Classification with Neo4j
Document Classification with Neo4j
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
Document clustering and classification
Document clustering and classification Document clustering and classification
Document clustering and classification
 
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning MeetupKnowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
 
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
 

Similar to Kernel-based machine learning methods

Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
Chien-Wei Lin
 
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression DatabaseКолкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
bigdatabm
 
Session ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmcSession ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmc
USD Bioinformatics
 
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
Human Variome Project
 
Systems Biology Approaches to Cancer
Systems Biology Approaches to CancerSystems Biology Approaches to Cancer
Systems Biology Approaches to Cancer
Raunak Shrestha
 

Similar to Kernel-based machine learning methods (20)

Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 
Structural bioinformatics and ELIXIR UK by Christine Orengo
Structural bioinformatics and ELIXIR UK by Christine OrengoStructural bioinformatics and ELIXIR UK by Christine Orengo
Structural bioinformatics and ELIXIR UK by Christine Orengo
 
eScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-BrazileScience-School-Oct2012-Campinas-Brazil
eScience-School-Oct2012-Campinas-Brazil
 
Small Molecules in Big Data fTALES Ghent
Small Molecules in Big Data fTALES GhentSmall Molecules in Big Data fTALES Ghent
Small Molecules in Big Data fTALES Ghent
 
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression DatabaseКолкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical Informatics
 
LarsJuhlJensen2020
LarsJuhlJensen2020LarsJuhlJensen2020
LarsJuhlJensen2020
 
Session ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmcSession ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmc
 
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
 
Effects of Network Structure, Competition and Memory Time on Social Spreading...
Effects of Network Structure, Competition and Memory Time on Social Spreading...Effects of Network Structure, Competition and Memory Time on Social Spreading...
Effects of Network Structure, Competition and Memory Time on Social Spreading...
 
Systems Biology Approaches to Cancer
Systems Biology Approaches to CancerSystems Biology Approaches to Cancer
Systems Biology Approaches to Cancer
 
ICCSS2015 talk: Null model for meme popularity
ICCSS2015 talk: Null model for meme popularityICCSS2015 talk: Null model for meme popularity
ICCSS2015 talk: Null model for meme popularity
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
Network Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systemsNetwork Biology: A paradigm for modeling biological complex systems
Network Biology: A paradigm for modeling biological complex systems
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Sabina Leonelli
Sabina LeonelliSabina Leonelli
Sabina Leonelli
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12
 
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 

More from Department of Computer Science, Aalto University

More from Department of Computer Science, Aalto University (14)

Data strategy aija leiponen_01112016
Data strategy aija leiponen_01112016Data strategy aija leiponen_01112016
Data strategy aija leiponen_01112016
 
Tiedon jakaminen: Case Mobility as a Service MaaS
Tiedon jakaminen: Case Mobility as a Service MaaSTiedon jakaminen: Case Mobility as a Service MaaS
Tiedon jakaminen: Case Mobility as a Service MaaS
 
MaaS Global to revolutionize the global transportation market with Whim
MaaS Global to revolutionize the global transportation market with WhimMaaS Global to revolutionize the global transportation market with Whim
MaaS Global to revolutionize the global transportation market with Whim
 
KIRA-digi
KIRA-digiKIRA-digi
KIRA-digi
 
Jakamo - Supply chain collaboration platform
Jakamo - Supply chain collaboration platformJakamo - Supply chain collaboration platform
Jakamo - Supply chain collaboration platform
 
Fingrid ja yhteiskäyttöinen tieto
Fingrid ja yhteiskäyttöinen tieto Fingrid ja yhteiskäyttöinen tieto
Fingrid ja yhteiskäyttöinen tieto
 
Data mining group
Data mining groupData mining group
Data mining group
 
Digital Data-Driven Healthcare and Wellbeing
Digital Data-Driven  Healthcare and WellbeingDigital Data-Driven  Healthcare and Wellbeing
Digital Data-Driven Healthcare and Wellbeing
 
Secure Systems
Secure SystemsSecure Systems
Secure Systems
 
Complex Networks
Complex NetworksComplex Networks
Complex Networks
 
Probabilistic Machine Learning
Probabilistic Machine LearningProbabilistic Machine Learning
Probabilistic Machine Learning
 
Computational Logic
Computational LogicComputational Logic
Computational Logic
 
Distributed Systems, Mobile Computing and Security
Distributed Systems, Mobile Computing and SecurityDistributed Systems, Mobile Computing and Security
Distributed Systems, Mobile Computing and Security
 
Applications of Machine Learning
Applications of Machine LearningApplications of Machine Learning
Applications of Machine Learning
 

Recently uploaded

ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Sérgio Sacani
 

Recently uploaded (20)

Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Transport in plants G1.pptx Cambridge IGCSE
Transport in plants G1.pptx Cambridge IGCSETransport in plants G1.pptx Cambridge IGCSE
Transport in plants G1.pptx Cambridge IGCSE
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptxGLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere University
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
SAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniquesSAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniques
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
 
Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 

Kernel-based machine learning methods

  • 1. Associate professor Juho Rousu Kernel Methods, Pattern Analysis and Computational Metabolomics (KEPACO)
  • 2. KEPACO in a Nutshell We develop kernel-based machine learning methods for predicting of structured, non-tabular data arising in biomedicine and digital health, especially applications on small molecules such as drugs and metabolites. Examples of research questions: • How to identify a metabolite from its mass spectrum? • How to find multivariate associations between SNPs and metabolites? • How to classify a enzyme according to its metabolomic role? Flowchart for a typical study Contact: Juho Rousu 1. Structured Big Data 1 1 11 11 10 0 0 0 0 0 0 111 1 1 1 10 0 0 0 0 0 0 0.6 0.11 0.30.06 0.020.1 0.020 0 0 0 0 0 0 0.20.30.1 0.2 0.11 0.4 0.010 0 0 0 0 0 0Nodes Intensity (NI) NB NI Nodes Binary (NB) 0 10 20 30 40 50 01020304050 Number of active cell lines Numberofactivecelllines 0.6 0.7 0.8 0.9 1.0 2. Efficient Kernel (non-linear similarity metric) computation 3. Definition as a Regularized learning problem (max-margin and/or sparsity) 4. Efficient training using convex optimization Saddle point Conditional gradient 5 10 15 20 Metlin − PubChem TOP k percentageofcorrectlyidentified2Dstructures 0% 20% 40% 60% 80% 100% 5 10 15 20 our method CFM−ID MIDAS MetFrag Random 5. Prediction and interpretation
  • 3. KEPACO Mission: Predicting Structured data Sequences Hierarchies Networks Many data have logical structure, how can we make use of that in making models faster and more accurate? In particular, we are interested of predicting structured output
  • 4. Example: Predicting Network Response Problem: • To predict the spread of activation in a complex network, given an action/stimulus Example: • Input: post in a social network • Output: subnetwork of friends who share the post a i b h g f e c d Al ice David Bob Jacob Ben Dutch get the championship! Dutch get the championship! Dutch get the championship!Dutch get the championship! Soccer is not my cup of tea. Su, Hongyu, Aristides Gionis, and Juho Rousu. "Structured prediction of network response." Proceedings of the 31st International Conference on Machine Learning (ICML-14). 2014.
  • 5. Example: Metabolite identification through machine learning • From a set of MS/MS spectra (x = structured input), learn a model to predict molecular fingerprints (y = multilabel output) • With predicted fingerprints retrieve candidate molecules from a large molecular database • Our CSI:Fingerid (http://www.csi-fingerid.org) tool is the most accurate metabolite identification tool to date Dührkop, K., Shen, H., Meusel, M., Rousu, J., & Böcker, S. (2015). Searching molecular structure databases with tandem mass spectra using CSI: FingerID. Proceedings of the National Academy of Sciences, 112(41), 12580-12585. Huibin Shen May 15, 2014 8/27 New fingerprint prediction framework 5 10 15 20 percentageofcorrectlyidentifiedstructures 0% 20% 40% 60% 80% 100% 5 10 15 20 top k our method FingerID CFM−ID MAGMa Midas MetFrag Random
  • 6. Example: Drug Bioactivity Classification • Data: x = Molecule, y = labeling of a graph of cell lines • Compatibility score F(x,y): how well subset of cell lines go together with the drug • Co-occurring features: molecular fingerprints vs. subsets of cell lines • Prediction: finding the best scoring labeling of the graph MCF7 MDA_MB_231 HS578T BT_549 T47D SF_268 SF_295 SF_539 SNB_19 SNB_75 U251 COLO205 HCC_2998 HCT_116 HCT_15 HT29 KM12 SW_620 CCRF_CEM HL_60 K_562 MOLT_4 RPMI_8226 SR LOXIMVI MALME_3M M14 SK_MEL_2 SK_MEL_28 SK_MEL_5 UACC_257 UACC_62 MDA_MB_435 MDA_N A549 EKVX HOP_62 HOP_92 NCI_H226 NCI_H23 NCI_H322M NCI_H460 NCI_H522 IGROV1 OVCAR_3 OVCAR_4 OVCAR_5 OVCAR_8 SK_OV_3 NCI_ADR_RES PC_3DU_145 786_0 A498 ACHN CAKI_1 RXF_393 TK_10 UO_31 Su, Hongyu, and Juho Rousu. "Multilabel classification through random graph ensembles." Machine Learning 99.2 (2013): 231-256.
  • 7. Example: Multivariate association meta-analysis • Problem: find associations btw. groups of SNPs (=genotype) and groups of metabolites (=phenotype) • Meta-analysis setting • Several studies analyzed together • No original data available, only summary statistics of indivudual studies • Our metaCCA tool can recover multivariate associations nearly as accurately as if original data was available 2.10.2015 Cichonska A, Rousu J, Marttinen P, Kangas AJ, Soininen P, Lehtimäki T, Raitakari OT, Järvelin MR, Salomaa V, Ala-Korpela M, Ripatti S. metaCCA: Summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis. bioRxiv. 2015 Jan 1:022665.
  • 8. Example: Biological network reconstruction Input: genomic measurements Output: biological network Pitkänen E, Jouhten P, Hou J, Syed MF, Blomberg P, Kludas J, Oja M, Holm L, Penttilä M, Rousu J, Arvas M. Comparative genome-scale reconstruction of gapless metabolic networks for present and ancestral species. PLoS Comput Biol. 2014 Feb 1;10(2):1003465.
  • 9. Example: Synthetic biology • Goal: energy and carbon efficient industrial processes through synthetic biology • Consortium: VTT, Aalto, U. Turku Our focus • In silico design, modelling and optimization of synthetic biosystems • Systems Modelling + Machine Learning LiF Fermentation experiments Genome-wide measurements In silico experimental design Strain/Process modification Tekes Strategic Opening “LiF – Living Factories”, 2014-
  • 10. Kernel Methods, Pattern Analysis & Computational Metabolomics (KEPACO) KEPACO research group Dr. Sandor Szedmak Dr. Sahely Bhadra (w. S.Kaski) Dr. Markus Heinonen (w. H. Lähdesmäki) Dr. Celine Brouard Dr. Elena Czeizler Dr. Hongyu Su MSc Huibin Shen MSc Anna Cichonska (HIIT&FIMM) MSc Viivi Uurtio + students Funding • Academy of Finland • EU FP7 • Tekes KEPACO group summer 2015