SlideShare is now on Android. 15 million presentations at your fingertips.  Get the app

×
  • Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Building a Scalable Web Crawler with Hadoop

by on Oct 27, 2010

  • 17,907 views

Building a Scalable Web Crawler with Hadoop by Ahad Rana from CommonCrawl...

Building a Scalable Web Crawler with Hadoop by Ahad Rana from CommonCrawl
Ahad Rana, engineer at CommonCrawl, will go over CommonCrawl’s extensive use of Hadoop to fulfill their mission of building an open, and accessible Web-Scale crawl. He will discuss their Hadoop data processing pipeline, including their PageRank implementation, describe techniques they use to optimize Hadoop, discuss the design of their URL Metadata service, and conclude with details on how you can leverage the crawl (using Hadoop) today.

Statistics

Views

Total Views
17,907
Views on SlideShare
16,359
Embed Views
1,548

Actions

Likes
25
Downloads
339
Comments
0

14 Embeds 1,548

http://commoncrawl.org 880
http://www.commoncrawl.org 596
http://blog.ownlinux.net 28
http://127.0.0.1:4000 17
http://paper.li 12
http://nourlcn.github.com 3
http://feeds.feedburner.com 3
http://webcache.googleusercontent.com 2
http://storify.com 2
http://twitter.com 1
https://duckduckgo.com 1
http://192.168.1.130 1
http://translate.googleusercontent.com 1
http://10.100.0.241:15871 1
More...

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
Post Comment
Edit your comment

Building a Scalable Web Crawler with Hadoop Building a Scalable Web Crawler with Hadoop Presentation Transcript