useR2011 - Huber

3,859 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,859
On SlideShare
0
From Embeds
0
Number of Embeds
2,421
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

useR2011 - Huber

  1. 1. Genomes and phenotypes Wolfgang Huber EMBLGenome Biology Unit (Heidelberg) & EBI (Cambridge)
  2. 2. What makes us different?Genome-wide genotyping ofindividuals for O(106) commonvariants, by microarray, is acommodity.Genome sequencing alsodetects rare or privatevariants, and structuralvariants.Why is that useful?
  3. 3. Warfarin First use: rat and mice killerAnticoagulant. Prevents embolism andthrombosisDose requirement ~ clinical & demographicvariables;VKORC1 (action)CYP2C9 (metabolism)
  4. 4. HerceptinMonoclonal antibody that interferes withthe ERBB2 receptor.
  5. 5. Tyrosin Kinase Inhibitors •Erlotinib (Tarceva) •Imatinib (Glivec) •Gefitinib (Iressa) •Dasatinib (Sprycel) •Sunitinib (Sutent) •Nilotinib (Tasigna) •Lapatinib (Tyverb) •Sorafenib (Nexavr) •Temsirolimus (Torisel)NSCLC: resistance to TKI therapy heterogeneity and mutationalredundancy of the diseaseIdentify each patient’s specific ‘driver mutations’E.g. Activation of EGFR by exon 19 deletion or exon 21 mutationerlotinib and gefitinib.... etc.
  6. 6. Genome-wide association studies ← thousands of people → ← millions of genomic loci → ~Y X
  7. 7. Genome wide association studies:Identifiability - additive model with nointeractionsFinding important variables (loci):impressivePrediction performance, effect sizes: poor
  8. 8. Genome wide association studies:Identifiability - additive model with nointeractionsFinding important variables (loci):impressivePrediction performance, effect sizes: poor•Have we missed important variables?(rare polymorphisms, structural variants)
  9. 9. Genome wide association studies:Identifiability - additive model with nointeractionsFinding important variables (loci):impressivePrediction performance, effect sizes: poor•Have we missed important variables?(rare polymorphisms, structural variants)•Are we overlooking variables with rare,strong effect (sufficient but not necessary)?
  10. 10. Genome wide association studies:Identifiability - additive model with nointeractionsFinding important variables (loci):impressivePrediction performance, effect sizes: poor•Have we missed important variables?(rare polymorphisms, structural variants)•Are we overlooking variables with rare,strong effect (sufficient but not necessary)?•Interactions (epistasis)
  11. 11. Genome wide association studies:Identifiability - additive model with nointeractionsFinding important variables (loci):impressivePrediction performance, effect sizes: poor•Have we missed important variables?(rare polymorphisms, structural variants)•Are we overlooking variables with rare,strong effect (sufficient but not necessary)?•Interactions (epistasis) Association based approaches do not have enough power - we need perturbation experiments on model systems
  12. 12. Forward genetics wt white curly bt
  13. 13. goal might be to provide a broad overview of the types Fortunately, there is a rich RNAi: targeted depletion of a specific of genes involved in a less well-understood process, as was done in a C. elegans genome-wide screen for genes in C. elegans and Drosophila whole-animal RNAi screening gene’s products (mRNA) involved in endocytosis10. screens might already have bee it can still be useful to repeat th because a different range of h Drosophila Humans ticular, lethal genes or those w missed in classical genetic sc T7 Long dsRNA siRNA Long screen involves identifying w >150 bp phenotype rather than recover dsRNA >100 bp 21 bp Genome-wide easy to spot using this approac Bathing Transfection “libraries” The range of whole-animal vast. These can be simple visu cal defects, changes in the exp synthetic phenotypes, sensitivity small molecules, or any other ducible output. Biological proc impossible to access in cell cul tion or organ formation and bDicer, Dicer, using whole-animal RNAi sc RISC loadingRISC loading RISC loading occur at the level of single ce RNAi screening. Cell culture-based screens high-throughput screening a Specificity able for dissecting basic cellul to whole-animal assays, cel Efficiency comparatively reductionist and taken to choose the appropria Reproducibility simplest cell-based assay is a h assay, in which the phenotype
  14. 14. What do human cells do when you knock down each gene in turn?with F. Fuchs, C. Budjan, Michael Boutros (DKFZ)Genomewide RNAi library (Dharmacon, 22k siRNA-pools)HeLa cells, incubated 48h, then fixed and stainedMicroscopy readout: DNA (DAPI), tubulin (Alexa), actin (TRITC) CD3EAP Molecular Systems Biology, 2010
  15. 15. RNAi perturbation phenotypes are observed by automated microscopywt- wt- wt-CEP164 CD3EAP BTDBD322839 wells, 4 images per welleach with DNA, tubulin, actin (1344 x 1024 pixel at 3 x 12 bit)
  16. 16. SegmentationNuclei are easy (e.g. locally adaptive threshold)But cells touch.How do you draw reasonable boundaries between cells?
  17. 17. Voronoi segmentation
  18. 18. Voronoi segmentation
  19. 19. Voronoi segmentation
  20. 20. Voronoi segmentation But we only used the nuclei. The boundaries are artificially straight. How can we better use the information in the actin and tubulin channels?
  21. 21. The same in white: Riemann metric on the 2 2 2 2 topographic surface dr = dx + dy + g dz (manifold) g dzGenetic Interactions dij = ω µi µj (m, m , w) = argmin ˆ ˆ ˆ |log ddijk − w − ijk log dijk with w + mi + ˆ ˆ mj ˆ EBImage::propagate Y = Y0 + bi xi + bij xi xj + bijk
  22. 22. The same in white: Riemann metric on the 2 2 2 2 topographic surface dr = dx + dy + g dz (manifold) g dzGenetic Interactions dij = ω µi µj (m, m , w) = argmin ˆ ˆ ˆ |log ddijk − w − ijk log dijk with w + mi + ˆ ˆ mj ˆ EBImage::propagate Y = Y0 + bi xi + bij xi xj + bijk
  23. 23. The same in white: Riemann metric on the 2 2 2 2 topographic surface dr = dx + dy + g dz (manifold) g dzGenetic Interactions dij = ω µi µj (m, m , w) = argmin ˆ ˆ ˆ |log ddijk − w − ijk log dijk with w + mi + ˆ ˆ mj ˆ EBImage::propagate Y = Y0 + bi xi + bij xi xj + bijk
  24. 24. Converting images into quantitative features cell size 289 cell intensity 34.33118 eccentricity 0.472934 nucleus size 2857.356 DNA content 485.2710 actin content 0.828876 tubulin content 0.098647 actin F11 0.049594 actin F12 0.081746 actin F21 0.158817 actin F22 0.179339 tubulin F11 0.009249 tubulin F12 0.219697 … … 178 features per cellEBImage::computeFeatures
  25. 25. Cells are classified into predefined classes Actin Fiber n n n af n n n n n n Big Cell n n n p n n n n n n n n n Debris n n n n n n bc n n af n n n c n n n n Lamellipodia n af c bc n n n n n nn n n n n n Metaphase bc n n n n n n n n n n n n n n n m c n Normal178 features per cellRadial-kernel SVMManually annotated training set of ~3000 cells ProtrusionsAccuracy: ~ 90 %
  26. 26. The image is now represented by a 13-dim vector: “phenotypic profile” n 289 ext 34.33118 ecc 0.472934 Next 2857.356 Nint 485.2710 AtoTint 0.828876 NtoATsz 0.098647 AF % 0.049594 BC % 0.081746 C% 0.158817 M% 0.179339 LA % 0.009249 P% 0.219697
  27. 27. How do you measure distance and similarityin a 13-dimensional phenotypic profile space?
  28. 28. Similarity depends on the choice and weighting of descriptors
  29. 29. stance metric learningmetric learning Distance n ext ecc 289 34.33118 0.472934 Next 2857.356 Nint a2i 485.2710 0.828876 d(x, y) = |fk (xk ) − fk (yk )| Next2 0.098647 x= AF % 0.049594 k BC % 0.081746 C% 0.158817 1 M% 0.179339 fk (x) = LA % 0.009249 1 + exp (−ηk (x − αk )) P% 0.219697netic Interactions that are somehow ‘related’: EMBL STRING Training set: pairs of genes Get (η, α) by minimizing average distance between training set genes, keeping average distance of ω µigenes fixed. dij = all µj y+ 1 − !− !+ + (m, m , w) = argmin ˆ ˆ ˆ |log ddijk − w − mi − n mj | 1 200 y− n ijk 150 ! ! Frequency log dijk with w + mi + mj 100 ! ˆ ˆ ! ˆ 50 Y = Y0 + bi xi + bij xi xj + bijk xi xj xk 0 100 200 300 400 500 n
  30. 30. Phenotype landscape: by graphlayout or MDS
  31. 31. SummaryAutomated phenotyping of cells upon genetic perturbations bymicroscopy and image analysisSegmentation, feature extraction, classification, distance metriclearning, multi-dimensional scaling, clustering.“Phenotypic map” is useful to biologistsMethod is also being applied to drugs
  32. 32. Collaboration with Michael Boutros, German Cancer ResearchCentre (Heidelberg)Fuchs, Pau et al., Molecular Systems Biology (2010)All data and software available athttp://www.cellmorph.orgpackages EBImage and imageHTS Gregoire Pau
  33. 33. Focus on the analysis of genomic dataBased on R and CRANSix-monthly release cycle, in sync with RReleases:• 1.0 in March 2003 (15 packages), …,• 2.8 in April 2011 (466 software packages)‫‏‬
  34. 34. What’s the added value?Complex data containers (S4 classes) for new experimental technologies (microarrays, sequencing) shared between packages - even from different authors.metadata packages: gene annotation, pathways, genomesexperiment data packages: landmark datasetsstronger emphasis on vignette-style documentationstricter submission review (much more could be done)more package interdependence → releasestraining coursesmailing list is amenable to software and domain (bio) questionsPush new technologies: S4, vignettes, string handling, computations with ranges, out-of-RAM objects
  35. 35. Interactive ReportsDistinguish•interactive exploration by data analyst•reports (presentation graphics)
  36. 36. Interactive ReportsDistinguish•interactive exploration by data analyst•reports (presentation graphics)Everybody has a PDF reader.
  37. 37. Interactive ReportsDistinguish•interactive exploration by data analyst•reports (presentation graphics)Everybody has a PDF reader.Everbody has a web browser.
  38. 38. Interactive ReportsDistinguish•interactive exploration by data analyst•reports (presentation graphics)Everybody has a PDF reader.Everbody has a web browser.Web browsers are turning into an operating system.
  39. 39. PDF viewer
  40. 40. HTML PDF viewer
  41. 41. arrayQualityMetricsReports on Quality of Microarray Datasetseffort to collect all extant, useful quality metrics for microarraysfunding by EU FP7 and by Genentechused by public databases (EBI::ArrayExpress) to annotate their dataofferings for users Example report
  42. 42. arrayQualityMetricsReports on Quality of Microarray Datasets •mouseover → tooltip (rendered as aneffort to collect all extant, useful quality metrics for microarraysfunding by EU FP7 andtable next to the plot) HTML by Genentechused by public databases (EBI::ArrayExpress) to annotate their data •click → select highlightofferings for users (propagated to several plots, tables) •expand, collapse sections •use HTML elements (checkboxes) to control plots Example report
  43. 43. Comments and outlookSVG is part of HTML 5:•linked plots and brushing•HTML widgets as controllers (checkboxes, wheels)SVG/HTML post-processing via the XML packageCallback processing currently in JavaScript.Use R? On server: googleVis talk by Markus Gesmann, Diego deCastillo; locally: browser pluginDuncan Temple Lang’s SVGAnnotation package: works for any Rgraphic (incl. base), but depends on undocumented / changeablebehavior of cairo.Paul Murrell’s gridSVG package: cleaner and more durableapproach, based on grid graphics.
  44. 44. Generalisation?arrayQualityMetrics is for microarraysSoftware sees:•a set of items (arrays)•a set of modules that compute the sections of the report (PCA,boxplots, scatterplots)This could be generalised to reports on very different types of subjectmatter - I will be happy to discuss this.
  45. 45. What makes us different? From Genome Wide Association Studies, ~400 variants that contribute to common traits and diseases are known Individual and the cumulative effects are disappointingly small2
  46. 46. What makes us different? From Genome Wide Association Studies, ~400 variants that contribute to common traits and diseases are known Individual and the cumulative effects are disappointingly Epistasis, interactions small ...2
  47. 47. Take a step back...Genetic interactions• only pairwise• for a simple phenotype• in a simple model system
  48. 48. Simplest “model system”: pairwise gene knock- down interactions and a scalar phenotyped ΔA A ΔB B 0 Fluc 80% 62.5% 100%Phenotype 12.5% 25% 50% 75% 100%Interactions –2 0 +1 Aggravating Interaction Alleviating (negative) score ( A,B) (positive)
  49. 49. A combinatorial RNAi screen 93 Dm kinases and phosphatases Each targeted by two independent dsRNA designs Validation of knock-down by qPCR 96 plates (~37.000 wells) 4.600 distinct gene pairs with Bernd Fischer (EMBL) and M. Boutros, Thomas Sandmann, Thomas Horn (DKFZ ) Nature Methods 4/20113 01/23/11
  50. 50. Image analysis and feature extraction (version of 2010) number of cells DAPI intensity for each cell DAPI area for each cell 04/19/11
  51. 51. Modelling Genetic InteractionsFor many phenotypes, the perturbation effects combine multiplicatively for non-interacting genes i, j: dij = ω µi µj... i.e. additive on a logarithmic scale log dijk = w + mi + m j +gij + εijk Nij ∼ NB(λij , α(λij )) baseline main effect measurement error of dsRNA j interactiont log µ measurement (nr cells, growth rate, …) ij = s + βi0 + x βi main effect j of dsRNA i €
  52. 52. Thus we get a matrix of interaction parameters: profile clustering reflects functional modules (3 5=! !) #;% #:3 ,$+1 7-#1 278/98 5#6 (+# 34 2)- 0+!1 0.5 ,-./0 *+ () % #$ #%$$(!)*% gij ! -0.5 @) #! $-!) )-? # !-
  53. 53. Classification of genes by function through their interaction profiles cross-validated performance functional prediction applied to on training set new genes circle sizes ~ cross-validated posterior probabilities of the classifier26 01/23/11
  54. 54. Classification of genes by function through their interactionMe Your Friends and profiles Show cross-validated performance functional prediction applied to on training set Ill Tell You Who You Are new genes circle sizes ~ cross-validated posterior probabilities of the classifier26 01/23/11
  55. 55. Genetic interactions in 3 dimensions Different phenotypes produce different sets of interactions For each set, significant overlap with known genetic interactions and with human interologs14 01/23/11
  56. 56. Interaction matricesnumber of cells intensity area nrCells intensity area Correlation matrices nrCells intensity area
  57. 57. Interaction matrices number of cells intensity area nrCells intensity area Correlation matrices nrCells intensity areaRas/MAPK inhibitors JNK Ras/MAPK
  58. 58. dict the phenotype(s) for an -unseen genetic underlying Network learning identify the statederlying regulatory molecular modules network from the data area area nrCells number of cells observed phenotypes (p) phenotype level (observed) activity (a) of core hidden: modules (e.g. complexes, activation level expression/ ʻpath-waysʼ) network structure (hidden) activation level prior knowledge on network structure binary genetic observed gene knock down perturbation (g) (observed) 0/1 0/1 0/111
  59. 59. Ongoing: a much bigger matrix• Larger matrix, again Dmel2 cells• ~1500 chromatin-related genes x 100 query genes• full microscopic readout (4x and 20x), 3 channels: DAPI ✴ ✴ phospho-His3 (mitosis marker) ✴ aTubulin (for spindle phenotypes)• 1600 384-well plates, ~ 300.000 measurements ctrl dsRNA Rho1 dsRNA Dynein light chain dsRNA31 01/23/11
  60. 60. Outlook: genetic interactions from model system experiments as regularisation/priors for the identification of genetic interactions in observational studies X Y← genomic loci → Genetic Interactions dij = ω µi µj ˆ ˆ ˆ L (m, m , w) = argmin sparsitydijk − w − mi − mj |1 |log d constraints 1 ijk on b log dijk with w + mi + mj ˆ ˆ ˆ GWAS dataset Y = Y0 + bi xi + bij xi xj + bijk xi xj xk ← individuals →
  61. 61. SummaryQuantitative, combinatorial RNAi works in metazoan cells.Technological tour de force; data exploration, QA/QC,normalisation and transformation....Individual genetic interactions vs interaction profiles.Data are high-dimensional and complicated:• dose effects,• different / multivariate phenotypes• relative timingreveal non-redundant interactions.All data code available from Bernd Fischer, Thomas Horn, Thomas Sandmann, Michael Boutros Nature Methods 2011(4)
  62. 62. Simon Anders Joseph Barry Bernd Fischer Ishaan Gupta Felix Klein Gregoire Pau Aleksandra Pekowska Paul-Theodor Pyl Alejandro Reyes Collaborators Lars Steinmetz Michael Boutros (DKFZ)Robert Gentleman (Genentech) Jan Korbel Michael Knop (Uni HD) Jan Ellenberg Kathryn Lilley (Cambridge) Anne-Claude Gavin Alvis Brazma (EBI) Paul Bertone (EBI) Ewan Birney (EBI)
  63. 63. Ras85D and drk: concentration dependence strength, presence and direction of an interaction can depend on reagent concentration (cf. drug-drug interactions)16 01/23/11
  64. 64. Sign inversion for different phenotypes15 01/23/11
  65. 65. Hidden Markov Model on class labels: parameters summarise the data Learn HMM on class labels Mad2 Bub1control
  66. 66. Interaction scores Correlations
  67. 67. Screen Plot of Interaction Score (#cells) within screen replicates (cor=0.968) independent daRNA designs (cor=0.902) between screen replicates (cor=0.948) 01/23/11
  68. 68. Screen Plot of Read-out (Number of Cells) within screen replicates (cor=0.97) independent dsRNA designs (cor=0.90) between screen replicates (cor=0.95) 01/23/11

×