Your SlideShare is downloading. ×
Large-scale data processing with Hadoop in the Cloud
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Large-scale data processing with Hadoop in the Cloud

1,044

Published on

A lightning talk that I gave at CloudCamp San Diego on how Hadoop can be used in the cloud to process more than 100,000,000 records

A lightning talk that I gave at CloudCamp San Diego on how Hadoop can be used in the cloud to process more than 100,000,000 records

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,044
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript

    • 1. Large-scale data processing with Hadoop in the Cloud by Patrick Salami Senior Software Engineer
    • 2. 100,000,000+ records
    • 3. < 1h processing time
    • 4. HadoopDistributed computing frameworkRuns on commodity hardwareSplits up large data setsEasy to deploy in AWS with ElasticMapReduce
    • 5. Java Code Raw Product Normalize Data Filter Categorize S3 Shipping Coupons Hadoop Cluster Group Post-Process Solr S3 WWW Index
    • 6. x32 nodescompute time $0.17/h $5.44/hEMR $0.03/h $0.96/hTotal $0.20/h $6.40/h Source: http://aws.amazon.com/ec2/#pricing
    • 7. twitter.com/psalamilinkedin.com/in/psalami

    ×