Building a Scalable Web Crawler with Hadoop
- 19,441 views
Building a Scalable Web Crawler with Hadoop by Ahad Rana from CommonCrawl...
Building a Scalable Web Crawler with Hadoop by Ahad Rana from CommonCrawl
Ahad Rana, engineer at CommonCrawl, will go over CommonCrawl’s extensive use of Hadoop to fulfill their mission of building an open, and accessible Web-Scale crawl. He will discuss their Hadoop data processing pipeline, including their PageRank implementation, describe techniques they use to optimize Hadoop, discuss the design of their URL Metadata service, and conclude with details on how you can leverage the crawl (using Hadoop) today.
- Total Views
- Views on SlideShare
- Embed Views