Gene Expression Data Analysis

2,065 views
1,789 views

Published on

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,065
On SlideShare
0
From Embeds
0
Number of Embeds
197
Actions
Shares
0
Downloads
91
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Gene Expression Data Analysis

  1. 1. Analysis ofGene Expression Data _______________________ Jhoirene B. Clemente Algorithms and Complexity Lab University of the Philippines Diliman
  2. 2. Overview● Definitions● Clustering of Gene Expression Data● Visualizations of Gene Expression Data
  3. 3. DefinitionsGeneBasic unit of heredity in a living organism.It is normally a stretch of DNA that codesfor a type of protein or for an RNA chainthat has a function in the organism.Gene Expression DataExpression level of genes in an individualthat is measured through Microarray
  4. 4. Definitions
  5. 5. Definitions
  6. 6. DefinitionsGene Expression Data Gene Gene Expression a b c ... n
  7. 7. DefinitionsGene Expression Data 1 Sample Gene Gene Expression a b n Samples c ... n
  8. 8. Definitions (n x m) Data Matrix m Samples Gene Sample Sample ..... Sample 1 1 m a b nSamples c ... n
  9. 9. Definitions (n x m) Data Matrix m Samples Gene Sample Sample ..... Sample 1 1 m a b nSamples c ... n
  10. 10. ClusteringClustering is the unsupervised classification ofpatterns including observations, data sets andfeature vectors into groups called clusters,such that objects in the same cluster are similar toeach other while objects in different clusters aredissimilar as possible.
  11. 11. ClusteringClustering is the unsupervised classification ofpatterns including observations, data sets andfeature vectors into groups called clusters,such that objects in the same cluster are similar toeach other while objects in different clusters aredissimilar as possible.
  12. 12. Cluster AnalysisPreprocessing ● Filtering ● Normalization Clustering Analysis
  13. 13. ClusteringPartitional● K-means Algorithm● X-means AlgorithmHierarchical
  14. 14. ClusteringGiven the (n x m) data matrix, we can● Cluster the set of genes● Cluster the set of samples● Cluster the set of genes and samples simultaneously.
  15. 15. Data SetData set is a time series gene expression data froma synchronized population of yeast.
  16. 16. Data SetData set is a time series gene expression data froma synchronized population of yeast.
  17. 17. PreprocessingFiltering ● Removed genes not involved in cell cycle regulation ● Removed genes belonging to more than one groupNormalization● All gene expression values range from -1.0 to 1.0.
  18. 18. Data SetData matrix (384 genes and 17 samples) with 5classifications.Groupings based from cell cycle phase activation.
  19. 19. Data SetGroup 1: Resting Phase
  20. 20. Data SetGroup 2: First Growth Phase
  21. 21. Data SetGroup 3: Synthesis Phase
  22. 22. Data SetGroup 4: Second Growth Phase
  23. 23. Data SetGroup 5: Cell Division
  24. 24. Clustering of genesK-means AlgorithmGiven n data points in Rd1. Assign k initial centers of the k clusters2. Assign all the data points to the nearest cluster (Euclidean distance, Manhattan distance, etc.)3. Adjust the k centers4. Repeat steps 2 and 3 until convergence
  25. 25. Clustering of genesK-means AlgorithmGiven n data points in Rd1. Assign k initial centers of the k clusters2. Assign all the data points to the nearest cluster (Euclidean distance, Manhattan distance, etc.)3. Adjust the k centers4. Repeat steps 2 and 3 until convergence k =5 since we want to approximate the 5
  26. 26. Clustering of genesInitialization1. Choose the first k centers that will maximize the distance between the clusters2. Sort the distances between all the data points and then choose the k initial points at constant intervals from the sorted list3. Use the first k points in the data set as the first k centers
  27. 27. Clustering of genesUsing k-means clustering, with k =5
  28. 28. Clustering of genes● Clustering may suggest possible roles for genes with unknown functions● Clustering the samples or experiments may shed light on new subtypes of diseases.● Identify which type of treatment is suited for a specific type of cancer.● Building genetic networks
  29. 29. visualizationVector FusionNon-metric Multidimensional Scaling (nMDS)Principal Components Analysis (PCA)
  30. 30. Vector fusionVisualization technique that uses the Single pointbroken line parallel algorithm
  31. 31. nMDS visualizationInput (Dissimilarity Matrix=|ij|) actual distance ● In nMDS, only the rank order of entries is assumed to contain the significant information. ● Thus, the purpose of the non-metric MDS algorithm is to find a configuration of points whose distances reflect as closely as possible the rank order of the data. ● The transformation is by using a non parametric function f. (monotone regression) dij= f(dij) pseudo-distance
  32. 32. PCA
  33. 33. vector fusionvisualization
  34. 34. nmds visualization
  35. 35. nmds visualization
  36. 36. nmds visualization
  37. 37. nmds visualization
  38. 38. nmds visualization
  39. 39. nmds visualization
  40. 40. nmds visualization
  41. 41. References2010: "Non-Metric Multidimensional Scaling and VectorFusion Visualization of Cell Cycle Independent GeneExpressions for Gene Function Analysis", Clemente J.,Salido J.A., (2010), Published in the conferenceproceedings of National Conference on InformationTechnology for Education(NCITE) 2010 and Philippine ITJournal Feb 2011 Issue.2010: "Cluster Analysis for Identifying Genes HighlyCorrelated with a Phenotype", Clemente J.,Undergraduate thesis, Department of Computer Science,University of the Philippines Diliman
  42. 42. Thank you for Listening

×