Hadoop and DynamoDB

13,838
-1

Published on

Analytics with Hadoop and DynamoDB in the AWS Cloud.

Published in: Technology, Business
1 Comment
19 Likes
Statistics
Notes
No Downloads
Views
Total Views
13,838
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
88
Comments
1
Likes
19
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Hadoop and DynamoDB

    1. 1. Hadoop and DynamoDBDr. Matt WoodT E C H N O LO G Y E VA N G E L I S T
    2. 2. Hello.
    3. 3. Thank you.
    4. 4. Consumer Seller business business
    5. 5. Decades of experience Operations, management and scale
    6. 6. Programmatic access
    7. 7. Unexpected innovation
    8. 8. Blinding flash of the obvious
    9. 9. 5 years young
    10. 10. Compute Storage ServicesDatabases & Support
    11. 11. Analytics
    12. 12. AvailableLow cost Flexible
    13. 13. Data
    14. 14. Data is valuable
    15. 15. Data is plentiful
    16. 16. Data is complex
    17. 17. Data is in flux
    18. 18. Data is fast moving
    19. 19. Capturing and managing data is challenging
    20. 20. Scalable storage can help
    21. 21. Database is a bottleneck
    22. 22. Degraded performance
    23. 23. A very common problem
    24. 24. DynamoDB
    25. 25. NoSQL database service
    26. 26. Unlimited storageConsistent performance
    27. 27. Provisioned throughput
    28. 28. Scale up and down
    29. 29. Scale without downtime
    30. 30. Read unit
    31. 31. Write unit
    32. 32. Provision read and write performance
    33. 33. Highly available
    34. 34. Consistent reads
    35. 35. Eventual consistency
    36. 36. Preset alarms
    37. 37. NoSQL data model
    38. 38. “ImageID” = “1” “Date” = “20100915”“Title” = “flower”“Tags” = “flower”,“jasmine”, “white”
    39. 39. “ImageID” = “1” “ImageID” =”2” “ImageID” =”3” “Date” = “Date” = “Date” = “20100915” “20100916” “20100917”“Title” = “flower” “Title” = “car” “Title” = “coffee”“Tags” = “flower”, “Tags” = “car”, “Tags” = “coffee”,“jasmine”, “white” “italian” “drink”, “delicious”
    40. 40. “ImageID” = “1” Primary key “Date” = “20100915”“Title” = “flower”“Tags” = “flower”,“jasmine”, “white”
    41. 41. “ImageID” = “1” Composite keys “Date” = “20100915”“Title” = “flower”“Tags” = “flower”,“jasmine”, “white”
    42. 42. Range queries
    43. 43. Compute
    44. 44. Elastic Map Reduce
    45. 45. ManagedHadoop
    46. 46. Without the ‘muck’
    47. 47. S3Input data
    48. 48. S3 Input dataCode Elastic MapReduce
    49. 49. S3 Input dataCode Elastic Name MapReduce node
    50. 50. S3 Input dataCode Elastic Name MapReduce node Elastic cluster
    51. 51. S3 Input dataCode Elastic Name MapReduce node HDFS Elastic cluster
    52. 52. S3 Input dataCode Elastic Name MapReduce node Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
    53. 53. S3 Input dataCode Elastic Name Output MapReduce node S3 + SimpleDB Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
    54. 54. S3Input data Output S3 + SimpleDB
    55. 55. It’s all just Hadoop
    56. 56. Hive, Pig,Cascading,Streaming
    57. 57. API driven
    58. 58. Data movement
    59. 59. Import/Export
    60. 60. Multipart upload
    61. 61. Scale control
    62. 62. Resize running job flows
    63. 63. 14 hoursTime remaining: 14 hours
    64. 64. 14 hoursTime remaining: 7 hours
    65. 65. Time remaining: 3 hours
    66. 66. Balance cost and performance
    67. 67. Resize based on usage patterns
    68. 68. Steady state Steady state Batch processing
    69. 69. Integrated with DynamoDB
    70. 70. Analytics
    71. 71. Integrate
    72. 72. Backup and restore
    73. 73. HiveQL
    74. 74. Live data in DynamoDBCREATE EXTERNAL TABLE orders_ddb_2012_01 ( order_idstring, customer_id string, order_date bigint, totaldouble )STORED BYorg.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler TBLPROPERTIES ("dynamodb.table.name" = "Orders-2012-01","dynamodb.column.mapping" = "order_id:OrderID,customer_id:Customer ID,order_date:OrderDate,total:Total");
    75. 75. Query DynamoDBSELECT customer_id, sum(total) spend, count(*)order_countFROM orders_ddb_2012_01WHERE order_date >= unix_timestamp(2012-01-01, yyyy-MM-dd)AND order_date < unix_timestamp(2012-01-08, yyyy-MM-dd)GROUP BY customer_idORDER BY spend descLIMIT 5 ;
    76. 76. Archived data in S3CREATE EXTERNAL TABLE orders_s3_export ( order_idstring, customer_id string, order_date int, totaldouble )PARTITIONED BY (year string, month string)ROW FORMAT DELIMITEDFIELDS TERMINATED BY tLOCATION s3://elastic-mapreduce/samples/ddb-orders ;
    77. 77. Query S3SELECT year, month, customer_id, sum(total) spend,count(*) order_countFROM orders_s3_exportWHERE customer_id = c-2cC5fF1bBAND month >= 6AND year = 2011GROUP BY customer_id, year, monthORDER by month desc;
    78. 78. Export to S3CREATE EXTERNAL TABLE orders_s3_new_export ( order_idstring, customer_id string, order_date int, totaldouble )PARTITIONED BY (year string, month string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ,LOCATION s3://;INSERT OVERWRITE TABLEorders_s3_new_exportPARTITION (year=2012, month=01)SELECT * from orders_ddb_2012_01;
    79. 79. Perfect match
    80. 80. Thank you!
    81. 81. Q U E S T I O N S + C O M M E N T Smatthew@amazon.com @mza O N T W I T T E R

    ×