Large-scale data processing with Hadoop in the Cloud

1,270 views

Published on

A lightning talk that I gave at CloudCamp San Diego on how Hadoop can be used in the cloud to process more than 100,000,000 records

Published in: Technology, Business
  • Be the first to comment

Large-scale data processing with Hadoop in the Cloud

  1. 1. Large-scale data processing with Hadoop in the Cloud by Patrick Salami Senior Software Engineer
  2. 2. 100,000,000+ records
  3. 3. < 1h processing time
  4. 4. HadoopDistributed computing frameworkRuns on commodity hardwareSplits up large data setsEasy to deploy in AWS with ElasticMapReduce
  5. 5. Java Code Raw Product Normalize Data Filter Categorize S3 Shipping Coupons Hadoop Cluster Group Post-Process Solr S3 WWW Index
  6. 6. x32 nodescompute time $0.17/h $5.44/hEMR $0.03/h $0.96/hTotal $0.20/h $6.40/h Source: http://aws.amazon.com/ec2/#pricing
  7. 7. twitter.com/psalamilinkedin.com/in/psalami

×