Gene expression profiling i


Published on

Unsupervised learning algorithms of DNA microarray analysis

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Gene expression profiling i

  1. 1. S.Prasanth Kumar, Bioinformatician Gene Expression Studies Gene Expression Profiling Microarray Data Analysis: Unsupervised Learning Algorithms S.Prasanth Kumar, Bioinformatician S.Prasanth Kumar Dept. of Bioinformatics Applied Botany Centre (ABC) Gujarat University, Ahmedabad, INDIA Sivakumar FOLLOW ME ON ACCESS MY RESOURCES IN SLIDESHARE prasanthperceptron CONTACT ME [email_address]
  2. 2. DNA Microarray Analysis … .ATAGCGGATC…. (partial cDNAs or oligonucleotides) Array (glass slide) PCR … .ATAGCGGATC…. … .ATAGCGGATC…. … .ATAGCGGATC…. … .ATAGCGGATC…. PCR amplification Robotics Direct Chemical Synthesis on Slide Robotics 1 2 The array represents a grid of thousands of different gene sequences bound to closely spaced regions on the surface of the glass slide
  3. 3. DNA Microarray Analysis Glass slide Gene Sequence Heat DNA / Chemically denature attaches DNA to the glass slide Preparation of Microarray Slide RNA Extraction Normal Cell Culture Tumor Cell Culture
  4. 4. DNA Microarray Analysis RNA RNA extraction cDNA synthesis RTase dNTP + dNTP Cy3 dNTP + dNTP Cy5 Cy3 Cy3 Cy3 Cy5 Cy5 Cy5 RNase H to digest RNA
  5. 5. DNA Microarray Analysis Cy3 Cy3 Cy3 Cy5 Cy5 Cy5 Cy3 Cy3 Cy3 Cy3 Cy3 Cy3 Cy5 Cy5 Cy5 Cy5 Cy5 Cy5 Combine equal amount Array (glass slide) Expanded View … .ATAGCGGATC… .....TATCGCCTAG… Hybridization Cy5 Cy5
  6. 6. DNA Microarray Analysis Digital Imaging The result is a grid of fluorescent spots that represent hybridization of complementary sequences to the array, therefore indicating that a particular gene was expressed in this cell type Fluorescence Microscopy bright green spot indicates greater expression in cancer cells bright red spot indicates greater expression in normal cells bright yellow spot indicates expression in both cell types
  7. 7. Differential Gene Expression
  8. 8. Expression array analysis CONDITIONS GENES Gene profile Condition profile A row is a gene expression profile A column is a profile for a condition across all genes
  9. 9. Unsupervised grouping: clustering Clustering methods help to simplify data sets by grouping profiles Theoretically if genes have similar expression over a large number of conditions, it is possible that they may be regulated by similar mechanisms and they may have similar function Clustering algorithms group similar profiles together based on a distance metric—a formula for calculating the similarity between two profiles
  10. 10. Unsupervised grouping: clustering correlation coefficient Many clustering algorithms are based on the statistical correlation coefficient (ranging from 1 to -1) x and y are vectors containing the expression values for two different genes e.g. x=8.9 y=8.6 D(x,y)=1 means x and y genes are co-expressed
  11. 11. Unsupervised grouping: clustering Euclidean distance The square root of the sum of the squared differences x and y are vectors containing the expression values for two different genes The clustering methods most commonly applied to gene expression data are hierarchical clustering, self organizing maps, and k-means clustering
  12. 12. Sensitivity of Clustering e.g. A study of the response of cancer profiles to a pharmacological agent A feature set including the entire genomic expression profile might not be appropriate because the response might depend only on a handful of target and transport genes, and inclusion of thousands of other genes might make similarities or differences difficult to extract from the noise of the irrelevant genes
  13. 13. K-means clustering K-means clustering requires a parameter k, the number of expected clusters Correct selection of k can dramatically affect the final clustering results and unfortunately it is often difficult to know a priori what an appropriate choice for k is. Gene Expression Profile of say 34 th lymphoma
  14. 14. K-means clustering k cluster centers (randomly selected expression profiles taken from the data set) ……… c1 c2 ck Distance Calculation e.g. gene 1 of cluster 1 === gene 1 of cluster 2  Distance value e.g. 25
  15. 15. K-means clustering For each gene x j and each center c i 1 st iteration Cluster genes based on distance values Cluster 1 Distance 25 Cluster 32 Distance 86 Gene 1, 45, 89,…. Gene 5, 65, 349,…. e.g. ……. But some are out of cluster Gene 34 has a distance value 29 Not Clustered
  16. 16. K-means clustering Cluster un-clustered genes to nearby cluster e.g. Gene 1, 29 , 45, 89,…. Previously Not Clustered Gene Cluster 1 Distance 20-30 Iterate until no un-clustered genes found Cluster 1 Cluster 2 ………………. Cluster n
  17. 17. Self-organizing maps a topology between clusters is predefined 2 x 3 grid Gene 1,2,3 Lymphoma 25 & 26 …….. Gene 1,2,3 Lymphoma 25 ,26, & 46 Cluster 1 Cluster 456 Cluster 1 …… Cluster 25 Close Clusters
  18. 18. Self-organizing maps After the algorithm is run, expression profiles are organized into clusters. The profiles are most similar within the cluster. Clusters that are near to each other in the predefined topology are relatively similar to each other.
  19. 19. Self-organizing maps Self-organizing map of yeast gene expression data. 2467 genes were clustered over 79 conditions. Clusters were arranged in a 5 x 5 grid
  20. 20. Hierarchical clustering distance matrix between all of the genes two nearest genes are merged
  21. 21. Hierarchical clustering Again the nearest two entities in the redefined distance matrix are identified Dendrogram is created
  22. 22. Hierarchical clustering
  23. 23. Thank You For Your Attention !!!