Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop and DynamoDB

17,014 views

Published on

Analytics with Hadoop and DynamoDB in the AWS Cloud.

Published in: Technology, Business

Hadoop and DynamoDB

  1. 1. Hadoop and DynamoDBDr. Matt WoodT E C H N O LO G Y E VA N G E L I S T
  2. 2. Hello.
  3. 3. Thank you.
  4. 4. Consumer Seller business business
  5. 5. Decades of experience Operations, management and scale
  6. 6. Programmatic access
  7. 7. Unexpected innovation
  8. 8. Blinding flash of the obvious
  9. 9. 5 years young
  10. 10. Compute Storage ServicesDatabases & Support
  11. 11. Analytics
  12. 12. AvailableLow cost Flexible
  13. 13. Data
  14. 14. Data is valuable
  15. 15. Data is plentiful
  16. 16. Data is complex
  17. 17. Data is in flux
  18. 18. Data is fast moving
  19. 19. Capturing and managing data is challenging
  20. 20. Scalable storage can help
  21. 21. Database is a bottleneck
  22. 22. Degraded performance
  23. 23. A very common problem
  24. 24. DynamoDB
  25. 25. NoSQL database service
  26. 26. Unlimited storageConsistent performance
  27. 27. Provisioned throughput
  28. 28. Scale up and down
  29. 29. Scale without downtime
  30. 30. Read unit
  31. 31. Write unit
  32. 32. Provision read and write performance
  33. 33. Highly available
  34. 34. Consistent reads
  35. 35. Eventual consistency
  36. 36. Preset alarms
  37. 37. NoSQL data model
  38. 38. “ImageID” = “1” “Date” = “20100915”“Title” = “flower”“Tags” = “flower”,“jasmine”, “white”
  39. 39. “ImageID” = “1” “ImageID” =”2” “ImageID” =”3” “Date” = “Date” = “Date” = “20100915” “20100916” “20100917”“Title” = “flower” “Title” = “car” “Title” = “coffee”“Tags” = “flower”, “Tags” = “car”, “Tags” = “coffee”,“jasmine”, “white” “italian” “drink”, “delicious”
  40. 40. “ImageID” = “1” Primary key “Date” = “20100915”“Title” = “flower”“Tags” = “flower”,“jasmine”, “white”
  41. 41. “ImageID” = “1” Composite keys “Date” = “20100915”“Title” = “flower”“Tags” = “flower”,“jasmine”, “white”
  42. 42. Range queries
  43. 43. Compute
  44. 44. Elastic Map Reduce
  45. 45. ManagedHadoop
  46. 46. Without the ‘muck’
  47. 47. S3Input data
  48. 48. S3 Input dataCode Elastic MapReduce
  49. 49. S3 Input dataCode Elastic Name MapReduce node
  50. 50. S3 Input dataCode Elastic Name MapReduce node Elastic cluster
  51. 51. S3 Input dataCode Elastic Name MapReduce node HDFS Elastic cluster
  52. 52. S3 Input dataCode Elastic Name MapReduce node Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  53. 53. S3 Input dataCode Elastic Name Output MapReduce node S3 + SimpleDB Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  54. 54. S3Input data Output S3 + SimpleDB
  55. 55. It’s all just Hadoop
  56. 56. Hive, Pig,Cascading,Streaming
  57. 57. API driven
  58. 58. Data movement
  59. 59. Import/Export
  60. 60. Multipart upload
  61. 61. Scale control
  62. 62. Resize running job flows
  63. 63. 14 hoursTime remaining: 14 hours
  64. 64. 14 hoursTime remaining: 7 hours
  65. 65. Time remaining: 3 hours
  66. 66. Balance cost and performance
  67. 67. Resize based on usage patterns
  68. 68. Steady state Steady state Batch processing
  69. 69. Integrated with DynamoDB
  70. 70. Analytics
  71. 71. Integrate
  72. 72. Backup and restore
  73. 73. HiveQL
  74. 74. Live data in DynamoDBCREATE EXTERNAL TABLE orders_ddb_2012_01 ( order_idstring, customer_id string, order_date bigint, totaldouble )STORED BYorg.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler TBLPROPERTIES ("dynamodb.table.name" = "Orders-2012-01","dynamodb.column.mapping" = "order_id:OrderID,customer_id:Customer ID,order_date:OrderDate,total:Total");
  75. 75. Query DynamoDBSELECT customer_id, sum(total) spend, count(*)order_countFROM orders_ddb_2012_01WHERE order_date >= unix_timestamp(2012-01-01, yyyy-MM-dd)AND order_date < unix_timestamp(2012-01-08, yyyy-MM-dd)GROUP BY customer_idORDER BY spend descLIMIT 5 ;
  76. 76. Archived data in S3CREATE EXTERNAL TABLE orders_s3_export ( order_idstring, customer_id string, order_date int, totaldouble )PARTITIONED BY (year string, month string)ROW FORMAT DELIMITEDFIELDS TERMINATED BY tLOCATION s3://elastic-mapreduce/samples/ddb-orders ;
  77. 77. Query S3SELECT year, month, customer_id, sum(total) spend,count(*) order_countFROM orders_s3_exportWHERE customer_id = c-2cC5fF1bBAND month >= 6AND year = 2011GROUP BY customer_id, year, monthORDER by month desc;
  78. 78. Export to S3CREATE EXTERNAL TABLE orders_s3_new_export ( order_idstring, customer_id string, order_date int, totaldouble )PARTITIONED BY (year string, month string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ,LOCATION s3://;INSERT OVERWRITE TABLEorders_s3_new_exportPARTITION (year=2012, month=01)SELECT * from orders_ddb_2012_01;
  79. 79. Perfect match
  80. 80. Thank you!
  81. 81. Q U E S T I O N S + C O M M E N T Smatthew@amazon.com @mza O N T W I T T E R

×