This document summarizes an analysis of the ArXiv dataset using social network analysis techniques. Key findings include:
1) The ArXiv citation network has over 1 million nodes and 7 million edges, with a low average degree and clustering coefficient.
2) PageRank scores show it takes papers an average of 17 years to achieve high scores, though some fields like cs.SI achieve this faster.
3) Title similarity networks group papers into 182 communities using LDA topic modeling.
4) Unsupervised GraphSAGE embeddings cluster the papers into 10 groups with related topics centered around high PageRank papers.