Using network information with gene
expression data

Jean Yee Hwa Yang
School of Mathematics and Statistics
Central dogma of molecular biology	
Microarrays	
  are	
  used	
  to	
  detect	
  the	
  	
  
extent	
  to	
  which	
  gen...
Technologies ~ measuring expression	
N samples!

*! *! *!
*! *!
microarray
!

Next-gen
Sequencing
!

Variable (5000-30000 ...
Motivation: Melanoma prognosis	

›  Melanomas are common in a large demographic of the population, especially
in Caucasian...
Research aims
›  New prognostic markers
-  To determine whether there are significant biomarker and pathway
differences be...
Survival outcome	

20

Survival time of stage III melanoma patients

10
5

	
  

0

Frequency

15

Two	
  survival	
  grou...
Gene expression (microarray data)	
No correlation with BRAF mutation	
GP	

Expression	
  	
  values	

PP	

Pink:	
  no	
  ...
Gene expression : DE analysis
Three main types of questions

	

	

1.  Differential expression (DE) analysis: finding DE g...
Gene expression : cluster analysis
Three main types of questions

	

1.  Differential expression (DE) analysis: finding DE...
Gene expression : classification 	
Three main types of questions

	

1.  Differential expression (DE) analysis: finding DE...
Most of the approches to date can be
considered as “single gene” analysis. 	

11
Three different levels of DE analysis	
Lets think of performing DE analysis at 3 different levels:	
1. Single gene level: ...
Networks	
A network is made up of nodes and edges:	
Network

RHOGLCK
THY1

ALK
RAC1

DVL2

GRAP2
KLK3
SHB

ELAVL4
VAV3
SWA...
Network discovery vs perdefined networks	
We have used two different methods for defining networks:
›  Network discovery: ...
Protein-protein interaction data

›  Human Protein Reference Database
-  Keshava Prasad et al. 2009

›  iRefWeb
-  Turner ...
Metacore network dataset	
VAV3 hub subnetwork

RAC1

›  Split the network into subnetworks,
containing a central hub gene ...
Talyor et al, Nature Biotech, 2009

NATURE BIOTECH.|Vol 27|2009
Talyor et al, Nature Biotech, 2009

NATURE BIOTECH.|Vol 27|2009
Finding hubs of interest	
For a given sub-network (predefined hub) i:	

Hub	
  gene	
Interactor	
  gene	
  i	

19
Finding hubs of interest	
For a given sub-network (predefined hub) i:	

Hub	
  gene	
Interactor	
  gene	
  i	

›  For each...
Finding hubs of interest	
For a given sub-network (predefined hub) i:	
	

Δ PP,GP,i = PPcori − GPcori

›  For each Rank th...
Applying to Melanoma gene expression data
Results – gene co-expression networks are significantly disturbed among
patients...
Software: VAN	
VAN: Identifying biologically perturbed networks using differential variability
analysis
Transcriptomics da...
Hub and interactors	

ANSR	

DM	

24
Software: VAN	
VAN: Identifying biologically perturbed networks using differential variability
analysis
Transcriptome
data...
Moving to classification
Three main categories of question

	

1.  Differential expression (DE) analysis: finding DE genes...
DE analysis  Classification	
Constructing features for the network approach in two main ways:
1.  Gene-based features: ra...
Talyor et al: feature	

›  Instead of using the top ranked networks as the classification features,
Taylor et al. use the ...
Looking at one hub	
›  Some individual networks are capable of separating the classes
reasonably well, by considering the ...
Classification procedure	

30
Other network based approaches	

Winter	
  et	
  al,	
  Plos	
  ComputaQonal	
  Biology,	
  2012	
31
Weighted lasso (all)

Network−based
features

Weighted lasso (hub)

BSS/WSS

Inner product

Gene
set

Rapaport

Taylor

Si...
›  Error rates for Taylor's method are only slightly better than for the classical
single-gene moderated-t method.
›  Howe...
Summary and discussion	

›  VAN (R package) enables the testing of modules for dysregulation based
on two or more conditio...
Acknowledgements
›  School of Mathematics and Statistics
(Usyd)

›  Graham Mann (Usyd)
-  Gulietta Pupo & Varsha Tembe

- ...
Upcoming SlideShare
Loading in …5
×

Using Network Information With Gene Expression Data - Jean Yee Hwa Yang

824
-1

Published on

Large-scale molecular interaction networks are dynamic in nature and changes in these networks, rather than changes in individual genes/proteins, are often drivers of complex diseases such as cancer. In this talk, I use data from stage III melanoma patients provided by Prof. Mann lab that comprise of clinical, mRNA and miRNA data to discuss how network information can be utilise in the analysis of gene expression analysis to aid in biological interpretation. I will also present an R software package, Variability Analysis in Networks (VAN), that enables an integrative analysis of protein-protein or microRNA-gene networks and expression data to identify hubs (i.e. highly connected proteins/microRNAs in a network) that are dysregulated, in terms of expression correlation with their interaction partners.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
824
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Using Network Information With Gene Expression Data - Jean Yee Hwa Yang

  1. 1. Using network information with gene expression data Jean Yee Hwa Yang School of Mathematics and Statistics
  2. 2. Central dogma of molecular biology Microarrays  are  used  to  detect  the     extent  to  which  genes  are  being   expressed.   *! *! *! *! *! Image  source:  Central  dogma  of  molecular   biology,  Wikipedia;  h<p://en.wikipedia.org/ wiki/Central  dogma  of  molecular  biology 2
  3. 3. Technologies ~ measuring expression N samples! *! *! *! *! *! microarray ! Next-gen Sequencing ! Variable (5000-30000 genes) or (2000 miRNAs)! mRNA microRNA Expression! data! Count! data! 3
  4. 4. Motivation: Melanoma prognosis ›  Melanomas are common in a large demographic of the population, especially in Caucasians living in sunny climates. Of those that metastasise (Stage III), about 40% go on to live cancer free, but another 40% succumb to the disease in less than 1 year. ›  Samples were obtained from Professor Graham Mann's group from the Westmead Institute for Cancer Research and Melanoma Institute Australia. ›  Aim: To predict survival prognosis for Stage III melanoma patients. Currently, we have gene expression data for 79 Stage III individuals. In addition, we have clinical data consisted of patient stage at diagnosis, survival status as well as histology, pathological and mutation information. 4
  5. 5. Research aims ›  New prognostic markers -  To determine whether there are significant biomarker and pathway differences between melanomas of good and bad prognosis after resection of nodal metastatic disease; ›  New therapeutic targets -  To identify and validate the principal regulatory pathway abnormalities that characterise metastatic (stage III and IV) melanomas; -  To investigate novel genomic drivers of melanoma tumour progression and outcome. Provided by Sara-Jane Schramm  (Usyd)
  6. 6. Survival outcome 20 Survival time of stage III melanoma patients 10 5   0 Frequency 15 Two  survival  groups     Bad  prognosis:   Survival  <  1  year  and   died  due  to   melanoma     Good  prognosis:     Survival  ≥  4  years   with  no  sign  of   relapse   0 2 4 6 8 Survival time (years) 10 12
  7. 7. Gene expression (microarray data) No correlation with BRAF mutation GP Expression    values PP Pink:  no  BRAF  mutaQon   Gray:  BRAF  mutaQon 7
  8. 8. Gene expression : DE analysis Three main types of questions 1.  Differential expression (DE) analysis: finding DE genes between two classes (e.g. good prognosis vs poor prognosis). 2.  Cluster analysis: finding common patterns between samples / genes. 3.  Classification & prediction: predicting an outcome based on a set of explanatory variables (features) and a model (classifier). Image reproduced from JOURNAL OF INVESTIGATIVE DERMATOLOGY|Vol 133|2013 8
  9. 9. Gene expression : cluster analysis Three main types of questions 1.  Differential expression (DE) analysis: finding DE genes between two classes (e.g. good prognosis vs poor prognosis). 2.  Cluster analysis: finding common patterns between samples / genes. 3.  Classification & prediction: predicting an outcome based on a set of explanatory variables (features) and a model (classifier).
  10. 10. Gene expression : classification Three main types of questions 1.  Differential expression (DE) analysis: finding DE genes between two classes (e.g. good prognosis vs poor prognosis). 2.  Cluster analysis: finding common patterns between samples / genes. 3.  Classification & prediction: predicting an outcome based on a set of explanatory variables (features) and a model (classifier). Error  rate   Gene  1   Mi1  <  -­‐0.67   yes   Gene  2   Mi2  >  0.18   Number  of  features  (genes) yes   B-­‐ALL   no   AML   no   T-­‐ALL   10
  11. 11. Most of the approches to date can be considered as “single gene” analysis. 11
  12. 12. Three different levels of DE analysis Lets think of performing DE analysis at 3 different levels: 1. Single gene level: this is gene-by-gene analysis (individual node) 2. Gene set level: the features are subsets of genes (set of nodes), e.g. gene set test. 3. Network level: examine a subsets of genes (nodes in the network) together with information on relationships between the genes (the edges in the network). 12
  13. 13. Networks A network is made up of nodes and edges: Network RHOGLCK THY1 ALK RAC1 DVL2 GRAP2 KLK3 SHB ELAVL4 VAV3 SWAP70 GRB2 MAP4K1 RHOA LCP2 CDC42 VAV2 VAV1 EGFR SYK CD6 YWHAQ ZAP70 NCK1 CD38 CD19 LYN ABL1 13
  14. 14. Network discovery vs perdefined networks We have used two different methods for defining networks: ›  Network discovery: use microarray information to find genes with highly correlated gene expression probes, and define edges accordingly (e.g. WGCNA). ›  Predefined networks: use predefined gene interaction databases such as MetaCore or iRefWeb. -  E.g. protein-protein interaction networks: a node represents a protein-coding gene, and an edge between two nodes represents an interaction between the proteins coded for by the genes. 14
  15. 15. Protein-protein interaction data ›  Human Protein Reference Database -  Keshava Prasad et al. 2009 ›  iRefWeb -  Turner et al. 2010 ›  BioGRID -  Chatr-aryamontri et al. 2013 ›  MetaCore Hairball image generated using Cytoscape (Smoot et al. 2011) -  From GeneGo Inc. Thanks to Simone Li and Drs Igy Pang and David Fung at the Systems Biology Initiative, the University of New South Wales
  16. 16. Metacore network dataset VAV3 hub subnetwork RAC1 ›  Split the network into subnetworks, containing a central hub gene (a gene with 5 interactors) and its immediate interactors. GRB2 KLK3 RHOA RHOG LCK VAV3 EGFR SYK LCP2 CDC42 ALK ›  For example, one network dataset from Metacore database consists of 1273 hub subnetworks with a total of 3607 genes in common with the microarray dataset. VAV3 hub subnetwork 16
  17. 17. Talyor et al, Nature Biotech, 2009 NATURE BIOTECH.|Vol 27|2009
  18. 18. Talyor et al, Nature Biotech, 2009 NATURE BIOTECH.|Vol 27|2009
  19. 19. Finding hubs of interest For a given sub-network (predefined hub) i: Hub  gene Interactor  gene  i 19
  20. 20. Finding hubs of interest For a given sub-network (predefined hub) i: Hub  gene Interactor  gene  i ›  For each edge, k , the correlation difference between the two classes (GP and PP) was calculated. Δ PP,GP,k = PPcork − GPcork 20
  21. 21. Finding hubs of interest For a given sub-network (predefined hub) i: Δ PP,GP,i = PPcori − GPcori ›  For each Rank the hub subnetworks based on their AveHubDiff sub-network i , calculate the average absolute difference in hub – interactor values or use permutation test to determine the correlation: ni ∑i=1 Δ statistical significance of each hub.PP,GP,k AveHubDiffi = ni −1 where ni is the number of interactors of the central hub gene in the network i . 21
  22. 22. Applying to Melanoma gene expression data Results – gene co-expression networks are significantly disturbed among patients with good and poor clinical outcomes ›  A: Patients surviving >4yr post resection of metastatic disease ›  B: Patients surviving <1yr post resection of metastatic disease ›  C & D: ›  Enlarged view (HDAC) PIG. CELL & MEL. RES.|In press|2013 Provided by Sara-Jane Schramm  (Usyd)
  23. 23. Software: VAN VAN: Identifying biologically perturbed networks using differential variability analysis Transcriptomics data Network data Cancer gene census data Data analysis p 23 r r
  24. 24. Hub and interactors ANSR DM 24
  25. 25. Software: VAN VAN: Identifying biologically perturbed networks using differential variability analysis Transcriptome data Network data Cancer gene data Data analsysis 25
  26. 26. Moving to classification Three main categories of question 1.  Differential expression (DE) analysis: finding DE genes between two classes (e.g. good prognosis vs poor prognosis). 2.  Cluster analysis: finding common patterns between genes. 3.  Classification & prediction: predicting an outcome based on a set of explanatory variables (features) and model (classifier). How to extend this concept from DE analysis to classification and prediction 26
  27. 27. DE analysis  Classification Constructing features for the network approach in two main ways: 1.  Gene-based features: rank genes using network information (e.g. NetRank), or construct weights for genes using network information (e.g. weighted lasso). 2.  Network-based features: dene some network measure which can be used to quantify network perturbation between the two classes; rank the networks accordingly (e.g. Rapaport et al., Taylor et al., BSS=WSS). ›  Note that it is surprisingly difficult to come up with a network measure which can be translated from a DE framework into a classification framework. 27
  28. 28. Talyor et al: feature ›  Instead of using the top ranked networks as the classification features, Taylor et al. use the edges in the top ranked networks. ›  Each edge k in the selected networks is assigned the feature value. I1 I2 H I5 I4 I3 28
  29. 29. Looking at one hub ›  Some individual networks are capable of separating the classes reasonably well, by considering the difference between hub and interactor expression (the LDA method). 0.6 0.8 1.0 1.2 GP PP 0.4 Median absolute expression for the interactors 1.4 CEBPB (49 interactors) −2 −1 0 1 2 Expression for the CEBPB gene 29
  30. 30. Classification procedure 30
  31. 31. Other network based approaches Winter  et  al,  Plos  ComputaQonal  Biology,  2012 31
  32. 32. Weighted lasso (all) Network−based features Weighted lasso (hub) BSS/WSS Inner product Gene set Rapaport Taylor Single−gene Average expression Unweighted lasso Mod−t Classification error Cross-validation error rate Random forest Gene−based features 0.6 0.5 0.4 0.3 0.2 32
  33. 33. ›  Error rates for Taylor's method are only slightly better than for the classical single-gene moderated-t method. ›  However, the two methods are capturing dierent information: they are correctly classifying dierent subsets of patients. 33
  34. 34. Summary and discussion ›  VAN (R package) enables the testing of modules for dysregulation based on two or more conditions, it is also suitable for the examination of changes across developmental timelines. ›  Majority of network methods based on the discovery network do not perform as well as methods based on the predefined network. ›  Combining Taylor's method and the single-gene method could yield a more accurate classier. ›  Using the LDA method, some hub subnetworks independently act as accurate prognostic predictors. ›  The best performing network feature selection methods only select small hub subnetworks. 34
  35. 35. Acknowledgements ›  School of Mathematics and Statistics (Usyd) ›  Graham Mann (Usyd) -  Gulietta Pupo & Varsha Tembe -  Samuel Mueller ›  Sara-Jane Schramm -  Vivek Jayaswal ›  John Thompson -  Kaushala Jayawardana -  Rebecca Barter ›  Richard Scolyer (RPA) ›  Marc Wilkins (UNSW) -  Shila Ghanazfar -  Simone Li -  Anna Campain -  Chi Nam Ignatius Pang -  David Fung -  Apurv Goel -  Natalie Twine
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×