Your SlideShare is downloading. ×
0
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Aws Quick Dirty Hadoop Mapreduce Ec2 S3

4,062

Published on

Aws Quick Dirty Hadoop Mapreduce Ec2 S3

Aws Quick Dirty Hadoop Mapreduce Ec2 S3

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,062
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • So without further ado lets get this show on the road and run a job concurrently on a few virtual machines.
  • Transcript

    • 1.  
    • 2. QUICK AND DIRTY PARALLEL PROCESSING ON THE CLOUD Daniel Sikar
    • 3. EC2 S3
    • 4.  
    • 5. Tools <ul><li>AWS Command line tools
    • 6. Elastic MapReduce Ruby library
    • 7. Hadoop
    • 8. s3cmd </li></ul>
    • 9. Hadoop MapReduce Job Tracker HDFS – Distributed file system
    • 10. Hadoop MapReduce usage Data crunching in general Clicks Statistics etc
    • 11. Hadoop Project Mgmt Committee
    • 12. MapReduce ?
    • 13. MapReduce Key Pairs <key,value>
    • 14. MapReduce
    • 15. HTTP Logs Log file A: (...) FreeTouchScreenNokia5230 (...) (...) GetRidofAllSpeedCameras(...) (...) USManWinsLottery (...) (...) BNPToLaunchElectionManifesto (...) Log file B: (...) FreeTouchScreenNokia5230 (...) (...) BodyLanguageTellsAll (...)
    • 16. MapReduce <FreeTouchScreenNokia5230, 1> + <FreeTouchScreenNokia5230, 1> = <FreeTouchScreenNokia5230, 2>
    • 17. Hadoop Streaming Running MapReduce jobs with .exe fiels and scripts $ <list> | mapper | reducer
    • 18. Hadoop Streaming Running MapReduce jobs with .exe fiels and scripts $ <list> | mapper | reducer
    • 19. Real life example of Hadoop Streaming usage
    • 20. Wikipedia Page Access Logs
    • 21. Wine Grape Varieties
    • 22. Wikipedia WGV Page Access Stats
    • 23. Business Decisions
    • 24. Launching a virtual Hadoop Cluster $ elastic-mapreduce --create --name &quot;Wiki log crunch&quot; --alive --num-instances –instance-type c1.medium 20 Created job flow <job flow id> $ ec2din (...)
    • 25.  
    • 26. Hadoop <ul><li>Standalone Operation
    • 27. Pseudo-Distributed Operation
    • 28. Fully-Distributed Operation
    • 29. NameNode
    • 30. JobTracker
    • 31. DataNode + TaskTracker </li></ul>
    • 32. Hadoop <ul><li>Standalone Operation
    • 33. Pseudo-Distributed Operation
    • 34. Fully-Distributed Operation
    • 35. NameNode
    • 36. JobTracker
    • 37. DataNode + TaskTracker </li></ul>
    • 38. Add a step $ elastic-mapreduce --jobflow <jfid> --stream --step-name &quot;Wiki log crunch&quot; --input s3n://dsikar-wikilogs-2009/dec/ --output s3n://dsikar-wikilogs-output/21 --mapper s3n://dsikar-wiki-scripts/wikidictionarymap.pl --reducer s3n://dsikar-wiki-scripts/wikireduce.pl http://<instance public dns>:9100
    • 39. s3cmd # make bucket $ s3cmd mb s3://dsikar-wikilogs # put log files $ s3cmd put pagecounts-200912*.gz s3://dsikar-wikilogs/dec $ s3cmd put pagecounts-201004*.gz s3://dsikar-wikilogs/apr # list log files $ s3cmd ls s3://dsikar-wikilogs/ # put scripts $ s3cmd put *.pl s3://dsikar-wiki-scripts/ # delete log files $ s3cmd del --recursive --force s3://dsikar-wikilogs/ # remove bucket $ s3cmd rb s3://dsikar-wikilogs/
    • 40. Elastic MapReduce --create --list --jobflow --describe --stream --terminate
    • 41. Output files part-00000 part-00001 part-00002 (...)
    • 42. Further aggregation
    • 43. Conclusion Hadoop MapReduce provides out-of-the-box ready-to-go distributed computing.
    • 44. That's all folks and thanks for attending: QUICK AND DIRTY PARALLEL PROCESSING ON THE CLOUD Daniel Sikar
    • 45.  

    ×