• Email
  • Like
  • Save
  • Private Content
  • Embed

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Building a Scalable Web Crawler with Hadoop

by

  • 12,347 views

Building a Scalable Web Crawler with Hadoop by Ahad Rana from CommonCrawl...

Building a Scalable Web Crawler with Hadoop by Ahad Rana from CommonCrawl
Ahad Rana, engineer at CommonCrawl, will go over CommonCrawl’s extensive use of Hadoop to fulfill their mission of building an open, and accessible Web-Scale crawl. He will discuss their Hadoop data processing pipeline, including their PageRank implementation, describe techniques they use to optimize Hadoop, discuss the design of their URL Metadata service, and conclude with details on how you can leverage the crawl (using Hadoop) today.

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel

10 Embeds 1,279

http://commoncrawl.org 615
http://www.commoncrawl.org 596
http://blog.ownlinux.net 28
http://127.0.0.1:4000 17
http://paper.li 12
http://feeds.feedburner.com 3
http://nourlcn.github.com 3
http://webcache.googleusercontent.com 2
http://storify.com 2
http://twitter.com 1

More...

Statistics

Likes
23
Downloads
260
Comments
0
Embed Views
1,279
Views on SlideShare
11,068
Total Views
12,347
Post Comment
Edit your comment

Building a Scalable Web Crawler with Hadoop Building a Scalable Web Crawler with Hadoop Presentation Transcript