SlideShare for iOS
by Linkedin Corporation
FREE - On the App Store
Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.
Building a Scalable Web Crawler with Hadoop by Ahad Rana from CommonCrawl...
Building a Scalable Web Crawler with Hadoop by Ahad Rana from CommonCrawl
Ahad Rana, engineer at CommonCrawl, will go over CommonCrawl’s extensive use of Hadoop to fulfill their mission of building an open, and accessible Web-Scale crawl. He will discuss their Hadoop data processing pipeline, including their PageRank implementation, describe techniques they use to optimize Hadoop, discuss the design of their URL Metadata service, and conclude with details on how you can leverage the crawl (using Hadoop) today.