COMPUTATIONAL CHALLENGES IN
PRECISION MEDICINE AND GENOMICS
GARY BADER
WWW.BADERLAB.ORG
GOOGLE WATERLOO, JUNE 9, 2014
PRECISION MEDICINE
•  TRADITIONAL MEDICINE, WITH MORE DATA
•  DIAGNOSIS: ASSIGNING PATIENTS TO GROUPS
–  BIOLOGY, DISEASE PROGRESSION, TREATMENT RESPONSE
•  PERSONALIZED, BUT NOT EVERYONE HAS A
DIFFERENT DISEASE
NATURE MEDICINE 19, 249 (2013) DOI:10.1038/NM0313-249
NATIONAL COMPREHENSIVE CANCER NETWORK (NCCN)
Breast Cancer
Noninvasive Invasive
Lobular Carcinoma
In Situ
Ductal Carcinoma
In Situ
Lobular Carcinoma Ductal Carcinoma Inflammatory
IMPROVING PRECISION WITH GENOMICS
•  BRCA1/BRCA2 MUTATIONS PREDICT RISK
•  COMMERCIAL PROGNOSTIC TESTS BASED ON GENE
SIGNATURES
HTTP://THEBIGCANDME.BLOGSPOT.CA/
GENOMICS
•  NEW TECHNOLOGY FOR READING/WRITING DNA
•  MEASURE OUR GENETIC CODE AND SYSTEM STATE
•  LOTS OF VARIABLES
– WHOLE GENOME, TRANSCRIPT AND PROTEIN
EXPRESSION, SPLICING, CHROMATIN STRUCTURE,
MOLECULAR INTERACTION, TRANSCRIPTION FACTOR,
METHYLATION, METABOLITE, PATIENT PHENOTYPE
2	
  
HTTP://WWW.LHSC.ON.CA/	
  
SOURCE CODE
ON DISK
LOAD TO ACTIVE
MEMORY
COMPILER
RUNNING
SOFTWARE
ACTIVE
MEMORY
4 LETTER CODE (DNA/RNA BASES) 20 LETTER CODE (AMINO ACIDS)
MEEPQSDPSVEPPLSQETFSDLWKLLPEN…GATGGGATTGGGGTTTTCCCCTCCCAT…
A PROTEIN IS A MOLECULAR MACHINE
DNA SEQUENCING
•  RECENT MASSIVE
BREAKTHROUGH
•  CURRENT TECH:
– ~10 HUMAN GENOMES,
1TB DATA/6 DAY RUN
ILLUMINA,	
  GEORGE	
  CHURCH	
  
FEB. 1, 2013:
DR. LEE HOOD RECEIVES HIS NATIONAL MEDAL OF SCIENCE FROM PRESIDENT OBAMA AT WHITE HOUSE CEREMONY
MORE BREAKTHROUGHS COMING
WWW.NANOPORETECH.COM
20-NODE INSTALLATION = COMPLETE HUMAN GENOME IN 15 MINUTES
MINION = USB CONNECTION, MINIMAL SAMPLE PREPARATION, $1000 DEVICE + CONSUMABLES
WHERE DOES THE DATA COME FROM?
BARODA,	
  INDIA	
  
TORONTO,	
  CANADA	
  
VERMONT,	
  USA	
  
CAMBRIDGE,	
  UK	
  
MOLECULAR	
  BIOLOGY	
  LABS	
  AROUND	
  THE	
  WORLD	
  
BGI,	
  >160	
  MACHINES	
  	
  
THE	
  FACTORY	
  
COMPUTING NEEDS: 1 HUMAN GENOME
•  ~125 BASE READ LENGTH X MILLIONS
•  >30X COVERAGE
•  ALIGNMENT TO REFERENCE GENOME
•  COMPUTE VARIANTS (MUTATIONS)
•  ANNOTATE VARIANTS
•  COMPUTE TIME: UP TO 2 DAYS/GENOME
– OPTIMIZED 4 HOURS: 128G/2CPU/SSD, 3.1GHZ
•  MEDICALLY IMPORTANT TO BE FAST
THE POWER OF GENOMICS IN MEDICINE
•  7000 RARE MONOGENIC DISEASES
– 50% HAVE A KNOWN GENE RESPONSIBLE
– QUADRUPLED RATE OF IDENTIFICATION SINCE 2012
•  BRAIN DOPAMINE-SEROTONIN VESICULAR
TRANSPORT DISEASE AND ITS TREATMENT
– TWO YEARS FROM DISEASE DEFINITION TO GENE
IDENTIFICATION TO TREATMENT
NAT REV GENET. 2013 OCT;14(10):681-91 N ENGL J MED. 2013 FEB 7;368(6):543-50
NON-INVASIVE PRENATAL TEST
HTTP://WWW.PANORAMATEST.COM/
CANCER GENOMICS
•  GERM LINE VS. SOMATIC MUTATIONS
•  AIM: IDENTIFY FREQUENT MUTATIONS IN CANCER
•  >11,000 TUMOUR GENOMES, 9M MUTATIONS
HUMAN COLORECTAL CARCINOMA
HTTPS://DCC.ICGC.ORG/	
  
COMPUTING CHALLENGES
•  EXPONENTIAL DATA GROWTH (>MOORE’S LAW)
– BILLIONS OF GENOMES
– SIZE: >100GB/HUMAN GENOME, 4GB PROCESSED,
MBS (JUST MUTATIONS)
•  HETEROGENEOUS, NOISY, COMPLEX DATA
– DATA SCIENTISTS, DOMAIN EXPERTS
COMPUTING WILL TRANSFORM MEDICINE
GOAL: IMPROVE
PATIENT OUTCOME
COMPUTATIONAL BIOLOGY
•  RESEARCH: USING COMPUTERS TO ANSWER
BIOLOGICAL/BIOMEDICAL QUESTIONS
•  EXPLORE, INTERPRET AND DISCOVER: SEARCH
•  SPEED AND ACCURACY: ALGORITHMS
•  PREDICTING FUNCTIONAL MUTATIONS, PATIENT
CLASSIFICATION: MACHINE LEARNING
•  PRIVACY: DIFFERENTIAL PRIVACY, ENCRYPTION
•  USABLE APPLICATIONS: SOFTWARE ENGINEERING
MedSavant
search engine for
genetic variants
WWW.MEDSAVANT.COM
Developers: Marc Fiume, James Vlasblom, Ron Ammar, Orion Buske, Eric Smith, Andrew Brook, Misko Dzamba,
Khushi Chachcha, Sergiu Dumitriu
Scientific Advisors: Christian Marshall, Kym Boycott, Marta Girdea, Peter Ray, Gary Bader, Michael Brudno
WWW.MEDSAVANT.COM
GENOMIC
READS
GENOMIC
VARIANTS
MARC FIUME, MIKE BRUDNO
GENOMIC
READS
GENOMIC
VARIANTS
MARC FIUME, MIKE BRUDNO
GLIOBLASTOMA MULTIFORME (N=215)
GOLDENBERG, BRUDNO
NATURE METHODS, 2014IDENTIFY DISEASE SUBTYPE
SURVIVAL
CLUSTERING
SPEED
DATA FUSION (NON-LINEAR, MESSAGE PASSING), UNSUPERVISED CLUSTERING
PREDICT TREATMENT RESPONSE
•  SUPERVISED MACHINE LEARNING E.G. RHEUMATOID
ARTHRITIS METHOTREXATE RESPONSE
B
New
A
A
B
B
B
A
Personal Medical Network
Responder
Non-Responder
New
New patient
(Predicted
Non-Responder)
Weakly similar
Highly similar
Response to treatment
A
Similar e.g.
SNP, smoking status
SHIRLEY HUI, RUTH ISSERLIN, HUSSAM KACA, TABITHA KUNG, KATHY SIMINOVITCH	
  
EXPLAINING GENOMICS DATA
•  SNAPSHOTS OF SYSTEM STATE
–  E.G. CANCER VS. NORMAL
•  EXPLAIN WHY STATES DIFFER
–  E.G. REGULATOR PERTURBATION
– CAUSAL MODELING
– PRIOR KNOWLEDGE ABOUT
MECHANISM: PATHWAYS
WITT H ET AL. CANCER CELL. 2011 AUG 16;20(2):143-57
GENOME++
MOLECULAR, PHYSIOLOGICAL
PHENOTYPE
ENCODES
EXPLAINS
ENVIRONMENT
CELL MECHANISM
THE HUMAN BODY
•  A WETWARE COMPUTING SYSTEM
MODULATES
A PROTEIN IS A MOLECULAR MACHINE
1 INTERACTION (EDGE)
HO ET AL. NATURE
415(6868) 2002
LOGIC CIRCUIT (PATHWAY)
HTTP://DISCOVER.NCI.NIH.GOV/KOHNK/INTERACTION_MAPS.HTML
THE CELL
ALAIN VIEL, HARVARD UNIVERSITY, 2007
HTTP://WWW.ENDOSZKOP.COM/
~40 TRILLION CELLS, +TRILLIONS OF MICROBES (PARALLEL PROCESSING)
BIANCONI ET AL. ANN HUM BIOL. 2013 NOV-DEC;40(6):471
Microtubule
Cytoskeleton
Cell Projection
& Cell Motility
Cell Proliferation
Glycosylation
Adhesion
Regulation of GTPase
Kinase Activity/Regulation
CNS Development
Intellectual
Disability
Autism
GTPase/Ras
Signaling
Regulation of cell proliferation
Positive regulation of cell proliferation
Tyrosin kinase
Vasculature develepment
Palate develepment
Organ Morphogenesis
Behavior
Heart develepment
RHO Ras
Membrane
Kinase regulation
Cell Motility
(stricter cluster)
Centrosome
Nucleolus
Cell cycle
Regulation of
hormone levels
Aminoacid
derivative /
amine
metabolism
Synaptic vescicle maturation
Reelin pathway
LIS1 in neuronal
migration and
development
Negative
regulation
of cell cycle
cKIT
pathwaymTor
pathway
Zn finger
domain
Carboxyl
esterase
domain
Ras signaling GTPase regulator
Neuron
migration
Cell Motility
(stricter cluster)
Cell morphogenesis
Cell projection
organization
CNS
development
Brain
development
Neurite development
CNS neuron
differentiation
Axonogenesis
Projection neuron
axonogenesis
Cerebral cortex
cell migration
SMC flexible hinge domain
Urea and amine group metabolism
MHC-I
Zoom of CNS-Development
ID ID
ASDASD
Both
0%
12.5%
Enriched
in deletions
FDR
Known
disease genes
Enriched only
in disease genes
Node type (gene-set)
Edge type (gene-set overlap)
From disease genes
to enriched gene-sets
Between gene-sets
enriched in deletions
Between sets enriched in
deletions and in disease
genes or between disease
sets only
Pinto	
  et	
  al.	
  FuncJonal	
  impact	
  of	
  global	
  rare	
  copy	
  number	
  variaJon	
  in	
  auJsm	
  
spectrum	
  disorders.	
  Nature.	
  2010	
  Jun	
  9.	
  
Adhe
Reelin pathway
development
Nega
regula
of cell
Neuron
migration
Cell Motility
(stricter cluster)
Cell morphogenesis
Cell projection
organization
CNS
development
Brain
development
Neurite development
CNS neuron
differentiation
Axonogenesis
Projection neuron
axonogenesis
Cerebral cortex
cell migration
Zoom of CNS-Development
PATIENT #1 PATIENT #2 PATIENT #3 PATIENT #I
PATHWAYGSI
CNV-AFFECTED GENE
COUNT = 1 COUNT = 1 COUNT = 1 COUNT = 0
•  IF WE HAVE AT LEAST ONE CNV AFFECTING AT LEAST ONE GENE IN A CERTAIN PATHWAY GI,
THEN WE HAVE A PERTURBATION POTENTIAL IN THAT PATHWAY
•  WE COUNT THE PRESENCE / ABSENCE OF SUCH PERTURBATION POTENTIAL IN PATIENTS
PaJent	
  #1	
   PaJent	
  #2	
   PaJent	
  #3	
   …	
   PaJent	
  #i	
   …	
   PaJent	
  #n	
  
GS1	
   1	
   1	
   1	
   …	
   0	
   …	
   0	
  
GS2	
   0	
   0	
   1	
   …	
   1	
   …	
   0	
  
GS3	
   0	
   0	
   0	
   …	
   0	
   …	
   0	
  
DANIELE MERICO	
  
PATHWAY ASSOCIATION TEST
DESCRIPTION:
• THE SIGNIFICANCE OF A GENE-SET IS THEN ASSESSED USING THE FISHER S EXACT TEST FOR ASSOCIATION
• A SIGNIFICANT GENE-SET IS AFFECTED BY A MUTATION POTENTIAL MORE FREQUENTLY IN CASES THAN
CONTROLS
• THE FDR IS ESTIMATED BY SHUFFLING THE COLUMNS IN THE GENE-SET BY PATIENT COUNT TABLE
Case	
   Control	
  
GSi	
   13	
   1	
  
Not	
  in	
  GSi	
   1146	
  -­‐	
  13	
   889	
  -­‐	
  1	
  
PaJent	
  #1	
   PaJent	
  #2	
   PaJent	
  #3	
   …	
   PaJent	
  #i	
   …	
   PaJent	
  #n	
  
GS1	
   1	
   1	
   1	
   …	
   0	
   …	
   0	
  
GS2	
   0	
   0	
   1	
   …	
   1	
   …	
   0	
  
GS3	
   0	
   0	
   0	
   …	
   0	
   …	
   0	
  
PATHWAY ASSOCIATION TEST
BENEFITS OF SYSTEMS THINKING
•  IMPROVES STATISTICAL POWER
– FEWER TESTS
•  MORE REPRODUCIBLE
– E.G. GENE EXPRESSION SIGNATURES
•  EASIER TO INTERPRET
– FAMILIAR CONCEPTS E.G. CELL CYCLE
•  IDENTIFIES MECHANISM
– CAN EXPLAIN CAUSE
VS. PARTS THINKING
DATABASESEXPERIMENTS,
PREDICTIONS
LITERATURE EXPERTS
GENOME++
MOLECULAR, PHYSIOLOGICAL
PHENOTYPE
ENCODES
EXPLAINS
ENVIRONMENT
CELL MECHANISM
MODULATES
HTTP://PATHWAYCOMMONS.ORG
THE FACTOID PROJECT
MAX FRANZ, IGOR RODCHENKOV, OZGUN BABUR, EMEK DEMIR, CHRIS SANDER
HELPING AUTHORS
DIGITIZE THEIR PUBLISHED
KNOWLEDGE
HTTP://FACTOID.BADERLAB.ORG/
NETWORK VISUALIZATION AND ANALYSIS
UCSD, ISB, AGILENT, MSKCC, PASTEUR, UCSF
HTTP://CYTOSCAPE.ORG
PATHWAY COMPARISON
LITERATURE MINING
GENE ONTOLOGY ANALYSIS
ACTIVE MODULES
COMPLEX DETECTION
NETWORK MOTIF SEARCH
CYTOSCAPE.JS: HTML5 – TOUCH
CYTOSCAPE.GITHUB.COM/CYTOSCAPE.JS/ MAX FRANZ
GENE FUNCTION PREDICTION
HTTP://WWW.GENEMANIA.ORG
QUAID MORRIS (DONNELLY)
RASHAD BADRAWI, OVI COMES, SYLVA DONALDSON,
MAX FRANZ, CHRISTIAN LOPES, FARZANA KAZI,
JASON MONTOJO, HAROLD RODRIGUEZ, KHALID ZUBERI
•  GUILT-BY-ASSOCIATION PRINCIPLE
•  BIOLOGICAL NETWORKS ARE COMBINED
INTELLIGENTLY TO OPTIMIZE PREDICTION ACCURACY
•  ALGORITHM IS MORE FAST AND ACCURATE THAN ITS
PEERS
SOCIAL CHALLENGES
•  BIOETHICS AND DATA SHARING
•  ENGAGING RESEARCHERS
– CROWDSOURCING: TCGA PAN CANCER, DREAM
•  ENCOURAGING RESEARCHERS TO EXPLORE
UNCHARTED TERRITORY
•  NEED FOR QUANTITATIVE THINKING IN BIOLOGY
–  NEW PH.D. PROGRAM IN THE MOLECULAR GENETICS
DEPARTMENT AT THE UNIVERSITY OF TORONTO
NATURE. 2011 FEB 10;470(7333):163-5WWW.NATURE.COM/TCGA/
EPENDYMOMA
•  3RD MOST COMMON BRAIN TUMOUR IN CHILDREN
•  INCURABLE IN UP TO 45% OF PATIENTS
STEVE	
  MACK,	
  MICHAEL	
  TAYLOR,	
  RUTH	
  ISSERLIN	
  -­‐	
  CANCER	
  CELL.	
  2011	
  AUG	
  16;20(2):143-­‐57	
  
GENE	
  EXPRESSION	
   PATIENT	
  AGE	
   OVERALL	
  SURVIVAL	
  
EPENDYMOMA	
  GENOMIC	
  ANALYSIS	
  
•  EPENDYMOMA	
  BRAIN	
  CANCER	
  -­‐	
  MOST	
  COMMON	
  AND	
  MORBID	
  LOCATION	
  
FOR	
  CHILDHOOD	
  IS	
  THE	
  POSTERIOR	
  FOSSA	
  (PF	
  =	
  BRAINSTEM	
  +	
  
CEREBELLUM)	
  
•  TWO	
  SUBTYPES	
  BY	
  GENE	
  EXPRESSION:	
  PFA	
  -­‐	
  YOUNG,	
  DISMAL	
  PROGNOSIS,	
  
PFB	
  -­‐	
  OLDER,	
  EXCELLENT	
  PROGNOSIS.	
  
•  WHOLE	
  GENOME	
  SEQUENCING	
  (47	
  SAMPLES)	
  SHOWED	
  ALMOST	
  NO	
  
MUTATIONS,	
  HOWEVER	
  DNA	
  METHYLATION	
  ARRAYS	
  SHOWED	
  CLEAR	
  
CLUSTERING	
  INTO	
  PFA	
  AND	
  PFB	
  (79	
  SAMPLES)	
  
•  PFA	
  MORE	
  TRANSCRIPTIONALLY	
  SILENCED	
  BY	
  CPG	
  METHYLATION	
  
STEVE MACK, MICHAEL TAYLOR, SCOTT ZUYDERDUYN NATURE, FEB. 2014
POLYCOMB REPRESSOR COMPLEX 2 – INHIBITED BY DZNEP AND GSK343 – KILLED PFA CELLS
NO KNOWN TREATMENT, SO NOW GOING TO CLINICAL TRIAL, COMPASSIONATE USE IN ONE PATIENT
2 MONTHS 3 MONTHS
3 CYCLES VIDAZA
9 YO WITH METASTATIC PF EPENDYMOMA TO LUNG TREATED WITH AZACYTIDINE
TREATMENT OF METASTATIC PF EPENDYMOMA WITH VIDAZA
MICHAEL TAYLOR
ACKNOWLEDGEMENTS
BADER LAB
DOMAIN INTERACTION TEAM
SHOBHIT JAIN
BRIAN LAW
JÜRI REIMAND
MOHAMED HELMY
ANDREA UETRECHT
MARINA OLHOVSKY
CANCER GENOMICS
FLORENCE CAVALLI
DAVID SHIH
ASHA ROSTAMIANFAR
PRECISION MEDICINE
RON AMMAR
SHIRLEY HUI
FUNDING
HTTP://BADERLAB.ORG
PATHWAY AND NETWORK
ANALYSIS
RUTH ISSERLIN
IGOR RODCHENKOV
SCOTT ZUYDERDUYN
RUTH WONG
VERONIQUE VOISIN
SHAHEENA BASHIR
KHALID ZHUBERI
CHRISTIAN LOPES
JASON MONTOJO
MAX FRANZ
HAROLD RODRIGUEZ

Computational challenges in precision medicine and genomics

  • 1.
    COMPUTATIONAL CHALLENGES IN PRECISIONMEDICINE AND GENOMICS GARY BADER WWW.BADERLAB.ORG GOOGLE WATERLOO, JUNE 9, 2014
  • 2.
    PRECISION MEDICINE •  TRADITIONALMEDICINE, WITH MORE DATA •  DIAGNOSIS: ASSIGNING PATIENTS TO GROUPS –  BIOLOGY, DISEASE PROGRESSION, TREATMENT RESPONSE •  PERSONALIZED, BUT NOT EVERYONE HAS A DIFFERENT DISEASE NATURE MEDICINE 19, 249 (2013) DOI:10.1038/NM0313-249
  • 3.
    NATIONAL COMPREHENSIVE CANCERNETWORK (NCCN) Breast Cancer Noninvasive Invasive Lobular Carcinoma In Situ Ductal Carcinoma In Situ Lobular Carcinoma Ductal Carcinoma Inflammatory
  • 4.
    IMPROVING PRECISION WITHGENOMICS •  BRCA1/BRCA2 MUTATIONS PREDICT RISK •  COMMERCIAL PROGNOSTIC TESTS BASED ON GENE SIGNATURES HTTP://THEBIGCANDME.BLOGSPOT.CA/
  • 5.
    GENOMICS •  NEW TECHNOLOGYFOR READING/WRITING DNA •  MEASURE OUR GENETIC CODE AND SYSTEM STATE •  LOTS OF VARIABLES – WHOLE GENOME, TRANSCRIPT AND PROTEIN EXPRESSION, SPLICING, CHROMATIN STRUCTURE, MOLECULAR INTERACTION, TRANSCRIPTION FACTOR, METHYLATION, METABOLITE, PATIENT PHENOTYPE
  • 6.
  • 7.
    HTTP://WWW.LHSC.ON.CA/   SOURCE CODE ONDISK LOAD TO ACTIVE MEMORY COMPILER RUNNING SOFTWARE ACTIVE MEMORY 4 LETTER CODE (DNA/RNA BASES) 20 LETTER CODE (AMINO ACIDS) MEEPQSDPSVEPPLSQETFSDLWKLLPEN…GATGGGATTGGGGTTTTCCCCTCCCAT…
  • 8.
    A PROTEIN ISA MOLECULAR MACHINE
  • 9.
    DNA SEQUENCING •  RECENTMASSIVE BREAKTHROUGH •  CURRENT TECH: – ~10 HUMAN GENOMES, 1TB DATA/6 DAY RUN ILLUMINA,  GEORGE  CHURCH  
  • 10.
    FEB. 1, 2013: DR.LEE HOOD RECEIVES HIS NATIONAL MEDAL OF SCIENCE FROM PRESIDENT OBAMA AT WHITE HOUSE CEREMONY
  • 11.
    MORE BREAKTHROUGHS COMING WWW.NANOPORETECH.COM 20-NODEINSTALLATION = COMPLETE HUMAN GENOME IN 15 MINUTES MINION = USB CONNECTION, MINIMAL SAMPLE PREPARATION, $1000 DEVICE + CONSUMABLES
  • 12.
    WHERE DOES THEDATA COME FROM? BARODA,  INDIA   TORONTO,  CANADA   VERMONT,  USA   CAMBRIDGE,  UK   MOLECULAR  BIOLOGY  LABS  AROUND  THE  WORLD  
  • 13.
    BGI,  >160  MACHINES     THE  FACTORY  
  • 14.
    COMPUTING NEEDS: 1HUMAN GENOME •  ~125 BASE READ LENGTH X MILLIONS •  >30X COVERAGE •  ALIGNMENT TO REFERENCE GENOME •  COMPUTE VARIANTS (MUTATIONS) •  ANNOTATE VARIANTS •  COMPUTE TIME: UP TO 2 DAYS/GENOME – OPTIMIZED 4 HOURS: 128G/2CPU/SSD, 3.1GHZ •  MEDICALLY IMPORTANT TO BE FAST
  • 15.
    THE POWER OFGENOMICS IN MEDICINE •  7000 RARE MONOGENIC DISEASES – 50% HAVE A KNOWN GENE RESPONSIBLE – QUADRUPLED RATE OF IDENTIFICATION SINCE 2012 •  BRAIN DOPAMINE-SEROTONIN VESICULAR TRANSPORT DISEASE AND ITS TREATMENT – TWO YEARS FROM DISEASE DEFINITION TO GENE IDENTIFICATION TO TREATMENT NAT REV GENET. 2013 OCT;14(10):681-91 N ENGL J MED. 2013 FEB 7;368(6):543-50
  • 17.
  • 18.
    CANCER GENOMICS •  GERMLINE VS. SOMATIC MUTATIONS •  AIM: IDENTIFY FREQUENT MUTATIONS IN CANCER •  >11,000 TUMOUR GENOMES, 9M MUTATIONS HUMAN COLORECTAL CARCINOMA HTTPS://DCC.ICGC.ORG/  
  • 19.
    COMPUTING CHALLENGES •  EXPONENTIALDATA GROWTH (>MOORE’S LAW) – BILLIONS OF GENOMES – SIZE: >100GB/HUMAN GENOME, 4GB PROCESSED, MBS (JUST MUTATIONS) •  HETEROGENEOUS, NOISY, COMPLEX DATA – DATA SCIENTISTS, DOMAIN EXPERTS
  • 20.
    COMPUTING WILL TRANSFORMMEDICINE GOAL: IMPROVE PATIENT OUTCOME
  • 21.
    COMPUTATIONAL BIOLOGY •  RESEARCH:USING COMPUTERS TO ANSWER BIOLOGICAL/BIOMEDICAL QUESTIONS •  EXPLORE, INTERPRET AND DISCOVER: SEARCH •  SPEED AND ACCURACY: ALGORITHMS •  PREDICTING FUNCTIONAL MUTATIONS, PATIENT CLASSIFICATION: MACHINE LEARNING •  PRIVACY: DIFFERENTIAL PRIVACY, ENCRYPTION •  USABLE APPLICATIONS: SOFTWARE ENGINEERING
  • 22.
    MedSavant search engine for geneticvariants WWW.MEDSAVANT.COM Developers: Marc Fiume, James Vlasblom, Ron Ammar, Orion Buske, Eric Smith, Andrew Brook, Misko Dzamba, Khushi Chachcha, Sergiu Dumitriu Scientific Advisors: Christian Marshall, Kym Boycott, Marta Girdea, Peter Ray, Gary Bader, Michael Brudno
  • 23.
  • 24.
  • 25.
  • 26.
    GLIOBLASTOMA MULTIFORME (N=215) GOLDENBERG,BRUDNO NATURE METHODS, 2014IDENTIFY DISEASE SUBTYPE SURVIVAL CLUSTERING SPEED DATA FUSION (NON-LINEAR, MESSAGE PASSING), UNSUPERVISED CLUSTERING
  • 27.
    PREDICT TREATMENT RESPONSE • SUPERVISED MACHINE LEARNING E.G. RHEUMATOID ARTHRITIS METHOTREXATE RESPONSE B New A A B B B A Personal Medical Network Responder Non-Responder New New patient (Predicted Non-Responder) Weakly similar Highly similar Response to treatment A Similar e.g. SNP, smoking status SHIRLEY HUI, RUTH ISSERLIN, HUSSAM KACA, TABITHA KUNG, KATHY SIMINOVITCH  
  • 28.
    EXPLAINING GENOMICS DATA • SNAPSHOTS OF SYSTEM STATE –  E.G. CANCER VS. NORMAL •  EXPLAIN WHY STATES DIFFER –  E.G. REGULATOR PERTURBATION – CAUSAL MODELING – PRIOR KNOWLEDGE ABOUT MECHANISM: PATHWAYS WITT H ET AL. CANCER CELL. 2011 AUG 16;20(2):143-57
  • 29.
  • 30.
    A PROTEIN ISA MOLECULAR MACHINE
  • 32.
  • 33.
    HO ET AL.NATURE 415(6868) 2002
  • 34.
  • 35.
  • 36.
    ALAIN VIEL, HARVARDUNIVERSITY, 2007
  • 37.
    HTTP://WWW.ENDOSZKOP.COM/ ~40 TRILLION CELLS,+TRILLIONS OF MICROBES (PARALLEL PROCESSING) BIANCONI ET AL. ANN HUM BIOL. 2013 NOV-DEC;40(6):471
  • 38.
    Microtubule Cytoskeleton Cell Projection & CellMotility Cell Proliferation Glycosylation Adhesion Regulation of GTPase Kinase Activity/Regulation CNS Development Intellectual Disability Autism GTPase/Ras Signaling Regulation of cell proliferation Positive regulation of cell proliferation Tyrosin kinase Vasculature develepment Palate develepment Organ Morphogenesis Behavior Heart develepment RHO Ras Membrane Kinase regulation Cell Motility (stricter cluster) Centrosome Nucleolus Cell cycle Regulation of hormone levels Aminoacid derivative / amine metabolism Synaptic vescicle maturation Reelin pathway LIS1 in neuronal migration and development Negative regulation of cell cycle cKIT pathwaymTor pathway Zn finger domain Carboxyl esterase domain Ras signaling GTPase regulator Neuron migration Cell Motility (stricter cluster) Cell morphogenesis Cell projection organization CNS development Brain development Neurite development CNS neuron differentiation Axonogenesis Projection neuron axonogenesis Cerebral cortex cell migration SMC flexible hinge domain Urea and amine group metabolism MHC-I Zoom of CNS-Development ID ID ASDASD Both 0% 12.5% Enriched in deletions FDR Known disease genes Enriched only in disease genes Node type (gene-set) Edge type (gene-set overlap) From disease genes to enriched gene-sets Between gene-sets enriched in deletions Between sets enriched in deletions and in disease genes or between disease sets only Pinto  et  al.  FuncJonal  impact  of  global  rare  copy  number  variaJon  in  auJsm   spectrum  disorders.  Nature.  2010  Jun  9.  
  • 39.
    Adhe Reelin pathway development Nega regula of cell Neuron migration CellMotility (stricter cluster) Cell morphogenesis Cell projection organization CNS development Brain development Neurite development CNS neuron differentiation Axonogenesis Projection neuron axonogenesis Cerebral cortex cell migration Zoom of CNS-Development
  • 40.
    PATIENT #1 PATIENT#2 PATIENT #3 PATIENT #I PATHWAYGSI CNV-AFFECTED GENE COUNT = 1 COUNT = 1 COUNT = 1 COUNT = 0 •  IF WE HAVE AT LEAST ONE CNV AFFECTING AT LEAST ONE GENE IN A CERTAIN PATHWAY GI, THEN WE HAVE A PERTURBATION POTENTIAL IN THAT PATHWAY •  WE COUNT THE PRESENCE / ABSENCE OF SUCH PERTURBATION POTENTIAL IN PATIENTS PaJent  #1   PaJent  #2   PaJent  #3   …   PaJent  #i   …   PaJent  #n   GS1   1   1   1   …   0   …   0   GS2   0   0   1   …   1   …   0   GS3   0   0   0   …   0   …   0   DANIELE MERICO   PATHWAY ASSOCIATION TEST
  • 41.
    DESCRIPTION: • THE SIGNIFICANCE OFA GENE-SET IS THEN ASSESSED USING THE FISHER S EXACT TEST FOR ASSOCIATION • A SIGNIFICANT GENE-SET IS AFFECTED BY A MUTATION POTENTIAL MORE FREQUENTLY IN CASES THAN CONTROLS • THE FDR IS ESTIMATED BY SHUFFLING THE COLUMNS IN THE GENE-SET BY PATIENT COUNT TABLE Case   Control   GSi   13   1   Not  in  GSi   1146  -­‐  13   889  -­‐  1   PaJent  #1   PaJent  #2   PaJent  #3   …   PaJent  #i   …   PaJent  #n   GS1   1   1   1   …   0   …   0   GS2   0   0   1   …   1   …   0   GS3   0   0   0   …   0   …   0   PATHWAY ASSOCIATION TEST
  • 43.
    BENEFITS OF SYSTEMSTHINKING •  IMPROVES STATISTICAL POWER – FEWER TESTS •  MORE REPRODUCIBLE – E.G. GENE EXPRESSION SIGNATURES •  EASIER TO INTERPRET – FAMILIAR CONCEPTS E.G. CELL CYCLE •  IDENTIFIES MECHANISM – CAN EXPLAIN CAUSE VS. PARTS THINKING
  • 44.
  • 45.
  • 46.
    THE FACTOID PROJECT MAXFRANZ, IGOR RODCHENKOV, OZGUN BABUR, EMEK DEMIR, CHRIS SANDER HELPING AUTHORS DIGITIZE THEIR PUBLISHED KNOWLEDGE HTTP://FACTOID.BADERLAB.ORG/
  • 47.
    NETWORK VISUALIZATION ANDANALYSIS UCSD, ISB, AGILENT, MSKCC, PASTEUR, UCSF HTTP://CYTOSCAPE.ORG PATHWAY COMPARISON LITERATURE MINING GENE ONTOLOGY ANALYSIS ACTIVE MODULES COMPLEX DETECTION NETWORK MOTIF SEARCH
  • 48.
    CYTOSCAPE.JS: HTML5 –TOUCH CYTOSCAPE.GITHUB.COM/CYTOSCAPE.JS/ MAX FRANZ
  • 49.
    GENE FUNCTION PREDICTION HTTP://WWW.GENEMANIA.ORG QUAIDMORRIS (DONNELLY) RASHAD BADRAWI, OVI COMES, SYLVA DONALDSON, MAX FRANZ, CHRISTIAN LOPES, FARZANA KAZI, JASON MONTOJO, HAROLD RODRIGUEZ, KHALID ZUBERI •  GUILT-BY-ASSOCIATION PRINCIPLE •  BIOLOGICAL NETWORKS ARE COMBINED INTELLIGENTLY TO OPTIMIZE PREDICTION ACCURACY •  ALGORITHM IS MORE FAST AND ACCURATE THAN ITS PEERS
  • 50.
    SOCIAL CHALLENGES •  BIOETHICSAND DATA SHARING •  ENGAGING RESEARCHERS – CROWDSOURCING: TCGA PAN CANCER, DREAM •  ENCOURAGING RESEARCHERS TO EXPLORE UNCHARTED TERRITORY •  NEED FOR QUANTITATIVE THINKING IN BIOLOGY –  NEW PH.D. PROGRAM IN THE MOLECULAR GENETICS DEPARTMENT AT THE UNIVERSITY OF TORONTO NATURE. 2011 FEB 10;470(7333):163-5WWW.NATURE.COM/TCGA/
  • 51.
    EPENDYMOMA •  3RD MOSTCOMMON BRAIN TUMOUR IN CHILDREN •  INCURABLE IN UP TO 45% OF PATIENTS STEVE  MACK,  MICHAEL  TAYLOR,  RUTH  ISSERLIN  -­‐  CANCER  CELL.  2011  AUG  16;20(2):143-­‐57   GENE  EXPRESSION   PATIENT  AGE   OVERALL  SURVIVAL  
  • 52.
    EPENDYMOMA  GENOMIC  ANALYSIS   •  EPENDYMOMA  BRAIN  CANCER  -­‐  MOST  COMMON  AND  MORBID  LOCATION   FOR  CHILDHOOD  IS  THE  POSTERIOR  FOSSA  (PF  =  BRAINSTEM  +   CEREBELLUM)   •  TWO  SUBTYPES  BY  GENE  EXPRESSION:  PFA  -­‐  YOUNG,  DISMAL  PROGNOSIS,   PFB  -­‐  OLDER,  EXCELLENT  PROGNOSIS.   •  WHOLE  GENOME  SEQUENCING  (47  SAMPLES)  SHOWED  ALMOST  NO   MUTATIONS,  HOWEVER  DNA  METHYLATION  ARRAYS  SHOWED  CLEAR   CLUSTERING  INTO  PFA  AND  PFB  (79  SAMPLES)   •  PFA  MORE  TRANSCRIPTIONALLY  SILENCED  BY  CPG  METHYLATION   STEVE MACK, MICHAEL TAYLOR, SCOTT ZUYDERDUYN NATURE, FEB. 2014
  • 53.
    POLYCOMB REPRESSOR COMPLEX2 – INHIBITED BY DZNEP AND GSK343 – KILLED PFA CELLS NO KNOWN TREATMENT, SO NOW GOING TO CLINICAL TRIAL, COMPASSIONATE USE IN ONE PATIENT
  • 54.
    2 MONTHS 3MONTHS 3 CYCLES VIDAZA 9 YO WITH METASTATIC PF EPENDYMOMA TO LUNG TREATED WITH AZACYTIDINE TREATMENT OF METASTATIC PF EPENDYMOMA WITH VIDAZA MICHAEL TAYLOR
  • 55.
    ACKNOWLEDGEMENTS BADER LAB DOMAIN INTERACTIONTEAM SHOBHIT JAIN BRIAN LAW JÜRI REIMAND MOHAMED HELMY ANDREA UETRECHT MARINA OLHOVSKY CANCER GENOMICS FLORENCE CAVALLI DAVID SHIH ASHA ROSTAMIANFAR PRECISION MEDICINE RON AMMAR SHIRLEY HUI FUNDING HTTP://BADERLAB.ORG PATHWAY AND NETWORK ANALYSIS RUTH ISSERLIN IGOR RODCHENKOV SCOTT ZUYDERDUYN RUTH WONG VERONIQUE VOISIN SHAHEENA BASHIR KHALID ZHUBERI CHRISTIAN LOPES JASON MONTOJO MAX FRANZ HAROLD RODRIGUEZ