Graph based approaches to Gene Expression Clustering
GENE EXPRESSIONCLUSTERINGGRAPH BASED APPROACHES A P R E S E N T A T I O N B Y GOVIND M (M120432CS) MTECH COMPUTER SCIENCE AND ENGINEERING N AT I O N A L I N S T I T U T E O F T E C H N O L O G Y C A L I C U T email@example.com
Clustering and Graph Theory Using Graphs in Clustering Simple Graph Partitioning Outline Spectral Graph PartitioningConclusion
Clustering• Process of Grouping a set of data objects, in terms of similarity• Same Cluster => Similar Objects and vice versa.• Widely used in data mining, market analysis etc.• Used to make sense of Bioinformatics data.• Two major purposes, in Bioinformatics • Find properties of genes ( Relationship among genes, deduce the functions of genes etc) • Predict more relevant factors (eg. Clustering cancerous and non cancerous genes, finding the effect of a medication)
Graphs• Data Structure• Used in multiple domains• Key Terms • Edge • Vertex • Weighted Graph
Clustering using Graphs Involves 3 steps1. Preprocessing ◦ Convert data set into a graph ◦ Using Adjacency matrix and Degree Matrix representation ◦ Similarity between nodes can be taken as the weight of an edge.2. Partitioning ◦ Partition the graph3. Clustering ◦ Repeat until required number of clusters are obtained ◦ Alternatively, extra iterations followed by joinings may also be implemented.
Simple Graph Partitioning• Weight of an edge = Similarity between the nodes• Find Minimum Cut• Edge Value decreases, cluster differs
Simple Graph Partitioning : TheAlgorithmInput : Graph G<V,E>, Number of Clusters kOutput: Cluster of GraphsRepeat k-1 times Low_val = infinity For each edge e of the graph Calculate Cut_Cost, cost of a CUT at that edge if Cut_Cost < Low_val Low_Val = cut_cost Cut_Edge = e Cut at edge e
Simple Graph Partitioning (cont..)• Advantage • Simple to implement • Uses the concept of Min Cut.• Disadvantage • What about intra-cluster similarity..?
Spectral Graph Partitioning• Is widely used• Uses Eigen Vectors of Laplacian Matrix• Recursive algorithm• Qualitatively Good• Computationally Better than SGP.
Some more Graph Theory…• Spectrum : Eigen vectors, arranged in the order of magnitude of eigen values.• Eigen Values of Graphs • Calculated as Eigen values of Laplacian matrix of the graph • Corresponidngly Eigen Vectors too• Fiedler Theorm • Correlation b/w eigen vectors and graph properties • Principal Eigen Vectors. Kth Principal Eigen Vector. • Principal Eigen Vector : Centrality of Vertices• 2nd Principal Eigen Vector : algebraic connectivity • Called Fiedler Vector • Matrix of positive and negative values • Partition is decided by the Sign of the value.
Spectral Graph PartitioningInput : Graph G<V,E>Output: Graphs G1< V1,E1>, G2< V2,E2> Create the Laplacian Vector L, of the Graph G. Calculate the Fiedler Vector F for each vertex vi in G if F[i]>0 V1.append(v) else V2.append(v)
SPG : Example 2nd Principal Vector = <0.415, 0.309, 0.069, −0.221, 0.221, −0.794> 2nd Principal Vector = <0.415, 0.309, -0.190, 0.169, > (of 1235)
SGP : Bipartitioning Method (contd.)• Recursive Algorithm• Although better than Simple Graph Partitioning, not optimum• Multiple times bipartitioning.• Can be improved by Multipartitioning• Use more eigen vectors.
Conclusion• Clustering is Based on simple concepts of graph theory• Optimal results (Spectral methods)• Can give better performance than traditional clustering.• Preprocessing overhead.
References1. Yanhua Chen; Ming Dong; Rege, M., "Gene Expression Clustering: a Novel Graph Partitioning Approach," Neural Networks, 2007. IJCNN 2007. International Joint Conference on , vol., no., pp.1542,1547, 12-17 Aug. 2007, doi: 10.1109/IJCNN.2007.4371187 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4371187&isnumber=4370 8912. Hagen, L.; Kahng, A.B., "New spectral methods for ratio cut partitioning and clustering," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on , vol.11, no.9, pp.1074,1085, Sep 1992, doi: 10.1109/43.159993 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=159993&isnumber=41903. Donath, W.E.; Hoffman, A.J., "Lower Bounds for the Partitioning of Graphs," IBM Journal of Research and Development, vol. 17, pp. 420-425, 1973.4. Pavla Kabel´ıková , “Graph Partitioning Using Spectral Methods”, Thesis, VˇSB - Technical University of Ostrava, 2006.5. Chung, F.R.K., "Spectral Graph Theory," American Mathematical Society, 1997.