• Email
  • Like
  • Save
  • Private Content
  • Embed
 

Hands-on Hadoop: An intro for Web developers

by

  • 10,073 views

Hadoop is a powerful tool for performing computation on a large amount of data using multiple computers. Getting started with Hadoop, however, is very easy. The simplest introduction uses a virtual ...

Hadoop is a powerful tool for performing computation on a large amount of data using multiple computers. Getting started with Hadoop, however, is very easy. The simplest introduction uses a virtual machine (VM) available for free from the Yahoo! Developer Network. In "Hands-on Hadoop: An intro for Web developers", we explore Hadoop using this VM. Data for the examples comes from Apache log files, generated by a simple process described in the presentation. Server log analysis is a very natural use-case for Hadoop, and, it is hoped, should convey the utility of Hadoop to a majority of Web developers. The Yahoo! VM is available here: http://developer.yahoo.com/hadoop/tutorial/module3.html#vm-setup

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Microsoft PowerPoint

Usage Rights

CC Attribution License

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel

6 Embeds 405

http://understeer.hatenablog.com 253
http://hadoopbigdata.wordpress.com 88
http://www.slideshare.net 59
http://webcache.googleusercontent.com 3
http://localhost 1
https://twimg0-a.akamaihd.net 1

Statistics

Likes
12
Downloads
336
Comments
4
Embed Views
405
Views on SlideShare
9,668
Total Views
10,073

14 of 4 previous next Post a comment

  • jeovana2 jeovana2 gostaria muito de ver os slides em português, parece interessante. não entendo nada em inglês. 2 years ago
    Are you sure you want to
  • reavels Simon Reavely, Software Architect at NAVTEQ Nice use of VM image to get people started 3 years ago
    Are you sure you want to
  • erikeldridge Erik Eldridge, engineer/evangelist at Yahoo! Abstract from Paul Tarjan’s (http://paulisageek.com) talk at the University of Waterloo: ’Hadoop: Map and Reduce in the real world At Yahoo, we have TONS of data to crawl through. Log files, Flickr Photos, Delicious Bookmarks, Yahoo answeres, Advertising bids, etc. Doing that on one machine would take forever, and writing a distributed program would have to do a TON of housekeeping (like shuttling files, splitting a job, redoing a job when a machine dies, etc). Enter Hadoop. It does all the crappy work of the distributed system for you. What you’ll learn : How to setup a hadoop image on your machine to learn, How to program in Map-Reduce, Some things I use hadoop for at Yahoo!. Materials: http://www.slideshare.net/erikeldridge/hands-on-hadoop-intro-for-web-developers-2094304 , http://blog.paulisageek.com/2009/09/hadoop-hacking-on-yahoo-ad-data.html 3 years ago
    Are you sure you want to
  • erikeldridge Erik Eldridge, engineer/evangelist at Yahoo! Note: slide 20 references an incorrect data set. The file /data/ydata/ydata-ysm-keyphrase-bid-imp-click-v1_0 was only available to users of the CMU Hack U temporary cluster. The input file should be input/access.log, loaded into hdfs in slide 15. 3 years ago
    Are you sure you want to
Post Comment
Edit your comment

Hands-on Hadoop: An intro for Web developers Hands-on Hadoop: An intro for Web developers Presentation Transcript