Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Introduction to Network Medicine by Marc Santolini 1218 views
- RT-PCR by Noha Lotfy Ibrahim 5317 views
- The Opera of Phantome - 2017 (prese... by Ramy K. Aziz 189 views
- Gene expression concept and analysis by Noha Lotfy Ibrahim 1690 views
- Artificial Intelligence in Data Cur... by Novartis Institut... 1609 views
- Graph properties of biological netw... by ngulbahce 1954 views

3,848 views

Published on

Published in:
Education

No Downloads

Total views

3,848

On SlideShare

0

From Embeds

0

Number of Embeds

240

Shares

0

Downloads

197

Comments

0

Likes

6

No embeds

No notes for slide

- 1. Analysis ofGene Expression Data _______________________ Jhoirene B. Clemente Algorithms and Complexity Lab University of the Philippines Diliman
- 2. Overview● Definitions● Clustering of Gene Expression Data● Visualizations of Gene Expression Data
- 3. DefinitionsGeneBasic unit of heredity in a living organism.It is normally a stretch of DNA that codesfor a type of protein or for an RNA chainthat has a function in the organism.Gene Expression DataExpression level of genes in an individualthat is measured through Microarray
- 4. Definitions
- 5. Definitions
- 6. DefinitionsGene Expression Data Gene Gene Expression a b c ... n
- 7. DefinitionsGene Expression Data 1 Sample Gene Gene Expression a b n Samples c ... n
- 8. Definitions (n x m) Data Matrix m Samples Gene Sample Sample ..... Sample 1 1 m a b nSamples c ... n
- 9. Definitions (n x m) Data Matrix m Samples Gene Sample Sample ..... Sample 1 1 m a b nSamples c ... n
- 10. ClusteringClustering is the unsupervised classiﬁcation ofpatterns including observations, data sets andfeature vectors into groups called clusters,such that objects in the same cluster are similar toeach other while objects in different clusters aredissimilar as possible.
- 11. ClusteringClustering is the unsupervised classiﬁcation ofpatterns including observations, data sets andfeature vectors into groups called clusters,such that objects in the same cluster are similar toeach other while objects in different clusters aredissimilar as possible.
- 12. Cluster AnalysisPreprocessing ● Filtering ● Normalization Clustering Analysis
- 13. ClusteringPartitional● K-means Algorithm● X-means AlgorithmHierarchical
- 14. ClusteringGiven the (n x m) data matrix, we can● Cluster the set of genes● Cluster the set of samples● Cluster the set of genes and samples simultaneously.
- 15. Data SetData set is a time series gene expression data froma synchronized population of yeast.
- 16. Data SetData set is a time series gene expression data froma synchronized population of yeast.
- 17. PreprocessingFiltering ● Removed genes not involved in cell cycle regulation ● Removed genes belonging to more than one groupNormalization● All gene expression values range from -1.0 to 1.0.
- 18. Data SetData matrix (384 genes and 17 samples) with 5classifications.Groupings based from cell cycle phase activation.
- 19. Data SetGroup 1: Resting Phase
- 20. Data SetGroup 2: First Growth Phase
- 21. Data SetGroup 3: Synthesis Phase
- 22. Data SetGroup 4: Second Growth Phase
- 23. Data SetGroup 5: Cell Division
- 24. Clustering of genesK-means AlgorithmGiven n data points in Rd1. Assign k initial centers of the k clusters2. Assign all the data points to the nearest cluster (Euclidean distance, Manhattan distance, etc.)3. Adjust the k centers4. Repeat steps 2 and 3 until convergence
- 25. Clustering of genesK-means AlgorithmGiven n data points in Rd1. Assign k initial centers of the k clusters2. Assign all the data points to the nearest cluster (Euclidean distance, Manhattan distance, etc.)3. Adjust the k centers4. Repeat steps 2 and 3 until convergence k =5 since we want to approximate the 5
- 26. Clustering of genesInitialization1. Choose the first k centers that will maximize the distance between the clusters2. Sort the distances between all the data points and then choose the k initial points at constant intervals from the sorted list3. Use the first k points in the data set as the first k centers
- 27. Clustering of genesUsing k-means clustering, with k =5
- 28. Clustering of genes● Clustering may suggest possible roles for genes with unknown functions● Clustering the samples or experiments may shed light on new subtypes of diseases.● Identify which type of treatment is suited for a specific type of cancer.● Building genetic networks
- 29. visualizationVector FusionNon-metric Multidimensional Scaling (nMDS)Principal Components Analysis (PCA)
- 30. Vector fusionVisualization technique that uses the Single pointbroken line parallel algorithm
- 31. nMDS visualizationInput (Dissimilarity Matrix=|ij|) actual distance ● In nMDS, only the rank order of entries is assumed to contain the significant information. ● Thus, the purpose of the non-metric MDS algorithm is to find a configuration of points whose distances reflect as closely as possible the rank order of the data. ● The transformation is by using a non parametric function f. (monotone regression) dij= f(dij) pseudo-distance
- 32. PCA
- 33. vector fusionvisualization
- 34. nmds visualization
- 35. nmds visualization
- 36. nmds visualization
- 37. nmds visualization
- 38. nmds visualization
- 39. nmds visualization
- 40. nmds visualization
- 41. References2010: "Non-Metric Multidimensional Scaling and VectorFusion Visualization of Cell Cycle Independent GeneExpressions for Gene Function Analysis", Clemente J.,Salido J.A., (2010), Published in the conferenceproceedings of National Conference on InformationTechnology for Education(NCITE) 2010 and Philippine ITJournal Feb 2011 Issue.2010: "Cluster Analysis for Identifying Genes HighlyCorrelated with a Phenotype", Clemente J.,Undergraduate thesis, Department of Computer Science,University of the Philippines Diliman
- 42. Thank you for Listening

No public clipboards found for this slide

Be the first to comment