Your SlideShare is downloading. ×
Gene Expression Data Analysis
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Gene Expression Data Analysis

1,078
views

Published on

Published in: Education

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,078
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
46
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Analysis ofGene Expression Data _______________________ Jhoirene B. Clemente Algorithms and Complexity Lab University of the Philippines Diliman
  • 2. Overview● Definitions● Clustering of Gene Expression Data● Visualizations of Gene Expression Data
  • 3. DefinitionsGeneBasic unit of heredity in a living organism.It is normally a stretch of DNA that codesfor a type of protein or for an RNA chainthat has a function in the organism.Gene Expression DataExpression level of genes in an individualthat is measured through Microarray
  • 4. Definitions
  • 5. Definitions
  • 6. DefinitionsGene Expression Data Gene Gene Expression a b c ... n
  • 7. DefinitionsGene Expression Data 1 Sample Gene Gene Expression a b n Samples c ... n
  • 8. Definitions (n x m) Data Matrix m Samples Gene Sample Sample ..... Sample 1 1 m a b nSamples c ... n
  • 9. Definitions (n x m) Data Matrix m Samples Gene Sample Sample ..... Sample 1 1 m a b nSamples c ... n
  • 10. ClusteringClustering is the unsupervised classification ofpatterns including observations, data sets andfeature vectors into groups called clusters,such that objects in the same cluster are similar toeach other while objects in different clusters aredissimilar as possible.
  • 11. ClusteringClustering is the unsupervised classification ofpatterns including observations, data sets andfeature vectors into groups called clusters,such that objects in the same cluster are similar toeach other while objects in different clusters aredissimilar as possible.
  • 12. Cluster AnalysisPreprocessing ● Filtering ● Normalization Clustering Analysis
  • 13. ClusteringPartitional● K-means Algorithm● X-means AlgorithmHierarchical
  • 14. ClusteringGiven the (n x m) data matrix, we can● Cluster the set of genes● Cluster the set of samples● Cluster the set of genes and samples simultaneously.
  • 15. Data SetData set is a time series gene expression data froma synchronized population of yeast.
  • 16. Data SetData set is a time series gene expression data froma synchronized population of yeast.
  • 17. PreprocessingFiltering ● Removed genes not involved in cell cycle regulation ● Removed genes belonging to more than one groupNormalization● All gene expression values range from -1.0 to 1.0.
  • 18. Data SetData matrix (384 genes and 17 samples) with 5classifications.Groupings based from cell cycle phase activation.
  • 19. Data SetGroup 1: Resting Phase
  • 20. Data SetGroup 2: First Growth Phase
  • 21. Data SetGroup 3: Synthesis Phase
  • 22. Data SetGroup 4: Second Growth Phase
  • 23. Data SetGroup 5: Cell Division
  • 24. Clustering of genesK-means AlgorithmGiven n data points in Rd1. Assign k initial centers of the k clusters2. Assign all the data points to the nearest cluster (Euclidean distance, Manhattan distance, etc.)3. Adjust the k centers4. Repeat steps 2 and 3 until convergence
  • 25. Clustering of genesK-means AlgorithmGiven n data points in Rd1. Assign k initial centers of the k clusters2. Assign all the data points to the nearest cluster (Euclidean distance, Manhattan distance, etc.)3. Adjust the k centers4. Repeat steps 2 and 3 until convergence k =5 since we want to approximate the 5
  • 26. Clustering of genesInitialization1. Choose the first k centers that will maximize the distance between the clusters2. Sort the distances between all the data points and then choose the k initial points at constant intervals from the sorted list3. Use the first k points in the data set as the first k centers
  • 27. Clustering of genesUsing k-means clustering, with k =5
  • 28. Clustering of genes● Clustering may suggest possible roles for genes with unknown functions● Clustering the samples or experiments may shed light on new subtypes of diseases.● Identify which type of treatment is suited for a specific type of cancer.● Building genetic networks
  • 29. visualizationVector FusionNon-metric Multidimensional Scaling (nMDS)Principal Components Analysis (PCA)
  • 30. Vector fusionVisualization technique that uses the Single pointbroken line parallel algorithm
  • 31. nMDS visualizationInput (Dissimilarity Matrix=|ij|) actual distance ● In nMDS, only the rank order of entries is assumed to contain the significant information. ● Thus, the purpose of the non-metric MDS algorithm is to find a configuration of points whose distances reflect as closely as possible the rank order of the data. ● The transformation is by using a non parametric function f. (monotone regression) dij= f(dij) pseudo-distance
  • 32. PCA
  • 33. vector fusionvisualization
  • 34. nmds visualization
  • 35. nmds visualization
  • 36. nmds visualization
  • 37. nmds visualization
  • 38. nmds visualization
  • 39. nmds visualization
  • 40. nmds visualization
  • 41. References2010: "Non-Metric Multidimensional Scaling and VectorFusion Visualization of Cell Cycle Independent GeneExpressions for Gene Function Analysis", Clemente J.,Salido J.A., (2010), Published in the conferenceproceedings of National Conference on InformationTechnology for Education(NCITE) 2010 and Philippine ITJournal Feb 2011 Issue.2010: "Cluster Analysis for Identifying Genes HighlyCorrelated with a Phenotype", Clemente J.,Undergraduate thesis, Department of Computer Science,University of the Philippines Diliman
  • 42. Thank you for Listening