Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(BDT313) Amazon DynamoDB For Big Data


Published on

NoSQL is an important part of many big data strategies. Attend this session to learn how Amazon DynamoDB helps you create fast ingest and response data sets. We demonstrate how to use DynamoDB for batch-based query processing and ETL operations (using a SQL-like language) through integration with Amazon EMR and Hive. Then, we show you how to reduce costs and achieve scalability by connecting data to Amazon ElasticCache for handling massive read volumes. We’ll also discuss how to add indexes on DynamoDB data for free-text searching by integrating with Elasticsearch using AWS Lambda and DynamoDB streams. Finally, you’ll find out how you can take your high-velocity, high-volume data (such as IoT data) in DynamoDB and connect it to a data warehouse (Amazon Redshift) to enable BI analysis.

Published in: Technology

(BDT313) Amazon DynamoDB For Big Data

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Nate Slater, AWS Solutions Architect October 2015 BDT313 Amazon DynamoDB for Big Data A Hands-on Look at Using Amazon DynamoDB for Big Data Workloads
  2. 2. What to Expect from the Session • A focus on the “how” not the “what”: • We look at fully functional implementations of several big data architectures. • Learn how AWS services abstract much of the complexity of big data without sacrificing power and scale. • Demonstrate how combinations of services from the AWS data ecosystem can be used to create feature rich systems for analyzing data.
  3. 3. What is “Big Data?” • Like many technology catch-phrases, “big data” tends to be defined in many different ways. • Most definitions will include mention of two primary characteristics: • Size • Velocity
  4. 4. Characteristics of Big Data • The quantity of data is increasing at a rapid rate. • Raw data from a variety of sources is increasingly being used to answer key business questions: • Log files • How are your applications being used and who is using them? • Application performance monitoring • What is the extent that poorly performing apps are affecting my business? • Application metrics • How will users respond to this new feature? • Security • Who has access to my infrastructure, what do they have access to, and how are they accessing it? Is this a threat?
  5. 5. Characteristics of Big Data • The growth in data volume means the flow of data is moving at an ever faster rate: • MB/s is normal • GB/s are increasingly common. • Number of connected users is growing at an amazing rate: • Estimates of 75 billion connected devices by 2020. • 105 or 106 transactions per second are not uncommon in big data applications.
  6. 6. The “Sweet Spot” of Big Data Size StructureVelocity DynamoDB
  7. 7. Transactional Data Processing DynamoDB is well-suited for transactional processing: • High concurrency • Strong consistency • Atomic updates of single items • Conditional updates for de-dupe and optimistic concurrency • Supports both key/value and JSON document schema • Capable of handling large table sizes with low latency data access
  8. 8. Demo 1: Store and Index Metadata for Objects Stored in Amazon S3
  9. 9. Demo 1: Use Case We have a large number of digital audio files stored in Amazon S3 and we want to make them searchable: • Use DynamoDB as the primary data store for the metadata. • Index and query the metadata using Elasticsearch.
  10. 10. Demo 1: Steps to Implement 1. Create a Lambda function that reads the metadata from the ID3 tag and inserts it into a DynamoDB table. 2. Enable S3 notifications on the S3 bucket storing the audio files. 3. Enable streams on the DynamoDB table. 4. Create a second Lambda function that takes the metadata in DynamoDB and indexes it using Elasticsearch. 5. Enable the stream as the event source for the Lambda function.
  11. 11. Demo 1: Key Takeaways 1. DynamoDB + Elasticsearch = Durable, scalable, highly-available database with rich query capabilities. 2. Use Lambda functions to respond to events in both DynamoDB streams and Amazon S3 without having to manage any underlying compute infrastructure.
  12. 12. Demo 2 – Execute Queries Against Multiple Data Sources Using DynamoDB and Hive
  13. 13. Demo 2: Use Case We want to enrich our audio file metadata stored in DynamoDB with additional data from the Million Song dataset: • Million song data set is stored in text files. • ID3 tag metadata is stored in DynamoDB. • Use Amazon EMR with Hive to join the two datasets together in a query.
  14. 14. Demo 2: Steps to Implement 1. Spin up an Amazon EMR cluster with Hive. 2. Create an external Hive table using the DynamoDBStorageHandler. 3. Create an external Hive table using the Amazon S3 location of the text files containing the Million Song project metadata. 4. Create and run a Hive query that joins the two external tables together and writes the joined results out to Amazon S3. 5. Load the results from Amazon S3 into DynamoDB.
  15. 15. Demo 2: Key Takeaways 1. Use Amazon EMR to quickly provision a Hadoop cluster with Hive and to tear it down when done. 2. Use of Hive with DynamoDB allows items in DynamoDB tables to be queried/joined with data from a variety of sources.
  16. 16. Demo 3 – Store and Analyze Sensor Data with DynamoDB and Amazon Redshift
  17. 17. Demo 3: Use Case A large number of sensors are taking readings at regular intervals. You need to aggregate the data from each reading into a data warehouse for analysis: • Use Amazon Kinesis to ingest the raw sensor data. • Store the sensor readings in DynamoDB for fast access and real- time dashboards. • Store raw sensor readings in Amazon S3 for durability and backup. • Load the data from Amazon S3 into Amazon Redshift using AWS Lambda.
  18. 18. Demo 3: Steps to Implement 1. Create two Lambda functions to read data from the Amazon Kinesis stream. 2. Enable the Amazon Kinesis stream as an event source for each Lambda function. 3. Write data into DynamoDB in one of the Lambda functions. 4. Write data into Amazon S3 in the other Lambda function. 5. Use the aws-lambda-redshift-loader to load the data in Amazon S3 into Amazon Redshift in batches.
  19. 19. Demo 3: Key Takeaways 1. Amazon Kinesis + Lambda + DynamoDB = Scalable, durable, highly available solution for sensor data ingestion with very low operational overhead. 2. DynamoDB is well-suited for near-realtime queries of recent sensor data readings. 3. Amazon Redshift is well-suited for deeper analysis of sensor data readings spanning longer time horizons and very large numbers of records. 4. Using Lambda to load data into Amazon Redshift provides a way to perform ETL in frequent intervals.
  20. 20. Summary • The versatility of DynamoDB makes it a cornerstone component of many data architectures. • “Big data” solutions usually involve a number of different tools for storage, processing, and analysis. • The AWS ecosystem offers a rich and powerful set of services that make it possible to build scalable and durable “big data” architectures with ease.
  21. 21. Remember to complete your evaluations!
  22. 22. Thank you!