Sanjaka, Malinda, M.S., Department of Computer Science, College of Science and Mathematics, North Dakota State University, April 2013. Protein Functional Site Prediction Using the Shortest-Path Graph Kernel Method. Major Professor: Dr. Changhui Yan.
Over the past decade Structural Genomics projects have accumulated structural data for over 75,000 proteins, but the function of most of them are unknown or uncertain due to limitation of laboratory approaches for discovering the functionality of proteins. Computational methods play key roles to minimize this gap. Graphs are often used to describe and analyze the geometry and physicochemical composition of bimolecular structures such as, chemical compounds and protein active sites (phosphorylation and enzyme catalytic sites). A key problem in graph-based structure analysis is to deﬁne a measure of similarity that enables a meaningful comparison of such structures. In this regard, kernel functions have attracted a lot of attention, especially since they allow for the application of a rich repertoire of methods from the ﬁeld of kernel-based machine learning. In this study, we developed an innovative graph method to represent protein surface based on how amino acid residues contact with each other. Then, we implemented a shortest-path graph kernel function to calculate similarities between the graphs. We implemented three variants of the nearest-neighbor method to predict functional sites on protein using the similarity measure given by the shortest-path graph kernel. The prediction methods were evaluated on two datasets using the leave-one-out approach. The best method achieved accuracy as high as 78%. We sorted all examples in the order of decreasing prediction scores. The results revealed that the positive examples (functional sites) were associated with high prediction scores and the functional sites were enriched in the region of top 10 percentile. This project showed that the proposed method were able to capture the similarity between protein functional sites and would provide a useful tool for functional site prediction.