SlideShare is now on Android. 15 million presentations at your fingertips.  Get the app

×
  • Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

Hands-on Hadoop: An intro for Web developers

by engineer/evangelist at Yahoo! on Sep 30, 2009

  • 11,442 views

Hadoop is a powerful tool for performing computation on a large amount of data using multiple computers. Getting started with Hadoop, however, is very easy. The simplest introduction uses a virtual ...

Hadoop is a powerful tool for performing computation on a large amount of data using multiple computers. Getting started with Hadoop, however, is very easy. The simplest introduction uses a virtual machine (VM) available for free from the Yahoo! Developer Network. In "Hands-on Hadoop: An intro for Web developers", we explore Hadoop using this VM. Data for the examples comes from Apache log files, generated by a simple process described in the presentation. Server log analysis is a very natural use-case for Hadoop, and, it is hoped, should convey the utility of Hadoop to a majority of Web developers. The Yahoo! VM is available here: http://developer.yahoo.com/hadoop/tutorial/module3.html#vm-setup

Statistics

Views

Total Views
11,442
Views on SlideShare
11,005
Embed Views
437

Actions

Likes
13
Downloads
432
Comments
4

6 Embeds 437

http://understeer.hatenablog.com 285
http://hadoopbigdata.wordpress.com 88
http://www.slideshare.net 59
http://webcache.googleusercontent.com 3
http://localhost 1
https://twimg0-a.akamaihd.net 1

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

14 of 4 previous next Post a comment

  • jeovana2 jeovana2 gostaria muito de ver os slides em português, parece interessante. não entendo nada em inglês. 3 years ago
    Are you sure you want to
    Your message goes here
    Processing…
  • reavels Simon Reavely, Software Architect at NAVTEQ Nice use of VM image to get people started 3 years ago
    Are you sure you want to
    Your message goes here
    Processing…
  • erikeldridge Erik Eldridge, engineer/evangelist at Yahoo! Abstract from Paul Tarjan’s (http://paulisageek.com) talk at the University of Waterloo: ’Hadoop: Map and Reduce in the real world At Yahoo, we have TONS of data to crawl through. Log files, Flickr Photos, Delicious Bookmarks, Yahoo answeres, Advertising bids, etc. Doing that on one machine would take forever, and writing a distributed program would have to do a TON of housekeeping (like shuttling files, splitting a job, redoing a job when a machine dies, etc). Enter Hadoop. It does all the crappy work of the distributed system for you. What you’ll learn : How to setup a hadoop image on your machine to learn, How to program in Map-Reduce, Some things I use hadoop for at Yahoo!. Materials: http://www.slideshare.net/erikeldridge/hands-on-hadoop-intro-for-web-developers-2094304 , http://blog.paulisageek.com/2009/09/hadoop-hacking-on-yahoo-ad-data.html 4 years ago
    Are you sure you want to
    Your message goes here
    Processing…
  • erikeldridge Erik Eldridge, engineer/evangelist at Yahoo! Note: slide 20 references an incorrect data set. The file /data/ydata/ydata-ysm-keyphrase-bid-imp-click-v1_0 was only available to users of the CMU Hack U temporary cluster. The input file should be input/access.log, loaded into hdfs in slide 15. 4 years ago
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hands-on Hadoop: An intro for Web developers Hands-on Hadoop: An intro for Web developers Presentation Transcript