Building a Scalable Web Crawler with Hadoop
by Hadoop User Group
- 12,347 views
Building a Scalable Web Crawler with Hadoop by Ahad Rana from CommonCrawl...
Building a Scalable Web Crawler with Hadoop by Ahad Rana from CommonCrawl
Ahad Rana, engineer at CommonCrawl, will go over CommonCrawl’s extensive use of Hadoop to fulfill their mission of building an open, and accessible Web-Scale crawl. He will discuss their Hadoop data processing pipeline, including their PageRank implementation, describe techniques they use to optimize Hadoop, discuss the design of their URL Metadata service, and conclude with details on how you can leverage the crawl (using Hadoop) today.
Accessibility
Categories
Upload Details
Uploaded via SlideShare as Microsoft PowerPoint
Usage Rights
© All Rights Reserved
Statistics
- Likes
- 23
- Downloads
- 260
- Comments
- 0
- Embed Views
- Views on SlideShare
- 11,068
- Total Views
- 12,347