6. Web Structure Mining
1. Page Rank
i. PageRank Algorithm
ii. Standing of a Node
2. Traversing and Intrinsic Links
3. Reference Nodes and Index Nodes
i. Index nodes
ii. Reference Nodes
4. Clustering and Determining Similar pages
i. Bibliographic Coupling
Bibliographic coupling occurs when two works reference a common third work in their bibliographies.
ii. Co-citation
Co-citation is defined as the frequency with which two documents are cited together by other documents.
[1]
If at least
one other document cites two documents in common these documents are said to be co-cited.
11. Unstructured Text
● Features
○ Word Occurrences
○ Stop Words
○ Latent Semantic Indexing
○ Stemming
○ n-GRAM
○ POS (Part-of-Speech)
○ Positional Collocations
○ Higher Order Features