2. • Importance of understanding the role of genetics in diseases
• Difficulty to determine disease-related genes manually
• Usefulness of predicting good candidate genes
Motivation
3. The most central genes
in an interaction network for a disease
are likely to be related to the disease
Hypothesis
4. 1. Collect data.
2. Make interaction network.
3. Evaluate centrality of genes.
4. Verify results of the centralities.
Experiment Step
5. 1. Collect known disease-related genes(seed genes).
• OMIM
2. Normalize gene names.
• HGNC database
3. Collect potential interaction sentences.
• PMC Open Access corpus
1. Collect data
6. 1. Make a list of interaction words.
2. Select sentences that contain a seed
gene, other genes and interaction words.
3. Classify sentences based on dependency
parsing and SVM.
4. Link two genes that are related.
2. Make interaction network
7. • Degree Centrality
• Count of neighbors
• Eigenvector Centrality
• Sum of the centralities of neighbors
• Closeness Centrality
• Sum of the distances from it to other nodes
• Betweenness Centrality
• Sum of the number of shortest paths that pass through it
3. Evaluate centrality of genes
8. • Collect confirmed data for evaluation.
• Prostate Gene Database (PGDB), PubMed, KEGG
• Set baseline.
• Appearance count with seed genes
4.Verify results of the centralities