This document summarizes a project analyzing GitHub user connection data to identify influential users and communities. The project processed over 1TB of GitHub event data from the past 6 months involving over 2 million users and 16 million events to construct a user collaboration graph. Insights from the graph found on average each user collaborates with 6 others, with some users connected to over 1,700 others. Challenges included the unstructured data and optimizing Spark jobs to handle the large data volumes within memory constraints.