Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
CommonCrawl
Building an open Web-Scale crawl using Hadoop.
Ahad Rana
Architect / Engineer at CommonCrawl
ahad@commoncrawl....
Who is CommonCrawl ?
• A 501(c)3 non-profit “dedicated to building, maintaining and
making widely available a comprehensiv...
Motivations Behind CommonCrawl
• Internet is a massively disruptive force.
• Exponential advances in computing capacity, s...
1 of 17 Ad