0
Designing for Scale
Three steps to optimal data performance
using DynamoDB and Redshift
David Pearson
Business Development
Amazon RDS
Amazon DynamoDB Amazon Redshift
Amazon ElastiCache
Compute Storage
AWS Global Infrastructure
Database
Applicati...
provision
manage
scale
EFFORT
differentiated?
Introduction to AWS Big Data Services
RedshiftDynamoDB
Elastic MapReduce Amazon S3
Object
Storage
Batch
Processing
Real-Ti...
Amazon
DynamoDB
NoSQL Database
Predictable performance
Seamless & massive scalability
Fully managed; zero admin
Amazon
DynamoDB
Amazon’s Path to DynamoDB
RDBMS
DynamoDB
Amazon
DynamoDB
DEVS
OPS
USERS
Fast Application
Development
Time to Build
New Applications
• Flexible data models
• Simple API
• High-scale queries
• Lap...
Amazon
DynamoDB
DEVS
OPS
USERS
Admin-Free
(at any scale)
request-based capacity provisioning model
Provisioned Throughput
Throughput is declared and updated via the API or the con...
Amazon
DynamoDB
DEVS
OPS
USERS Durable
Low Latency
WRITES
Replicated continuously to 3 AZ’s
Persisted to disk (custom SSD)
READS
Strongly or eventually consistent
No latency...
Latest News… DynamoDB Local
• Disconnected development
• Full API support
• Download from http://aws.amazon.com/dynamodb/r...
“Compared to similar products, DynamoDB
provides an amazing feature set, including super
low latencies, (literally) push-b...
AD SERVING
EC2
Profiles Database
ad request
ad url
visitor
Ad Servers
DynamoDB
1. Visitor loads a web page
2. Web page issues a reque...
EC2
Profiles DatabaseAd Servers
DynamoDB
Real-time bidding
platform
Bidder DynamoDB
Ads ProfilesQueues
and
BufferBid respo...
EC2
Profiles Database
ad request
ad url
visitor
Ad Servers
DynamoDB
1. Ad files are downloaded from CloudFront
2. Impressi...
CloudFront
advertisement
impression logs
Static Repository Files
Amazon S3
Profiles Database
EC2 (MAZ)
ad request
ad url
A...
Amazon
Redshift
Relational data warehouse
Massively parallel
Petabyte scale
Fully managed; zero admin
Amazon
Redshift
• Direct-attached storage
• Large data block sizes
• Columnar storage
• Data compression
• Zone maps
Redshift dramatically...
• Load
• Query
• Resize
• Backup
• Restore
Redshift parallelizes and distributes everything
Compute
Node
16TB
10 GigE
(HPC...
Start small and grow big
Eight Extra Large Node (HS1.8XL)
24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE
Cluster 2-100 ...
Monitor query performance
View explain plans
Redshift works with existing BI tools
JDBC/ODBC
Amazon Redshift
More coming soon…
Redshift is Priced to Analyze All Your Data
$0.85 per hour for on-demand (2TB)
$999 per TB per year (3-yr reservation)
“Amazon Redshift introduces a major
opportunity to improve the performance of
our real-time reporting, allowing us to run
...
Analysis
CloudFront
advertisement
impression logs
Static Repository Files
Amazon S3
Profiles Database
EC2 (MAZ)
ad request
ad url
A...
Amazon Redshift
Drive qualified users to
advertiser’s sites
• Ad server logs
• 3rd party data
• Bid history
• User history...
1. Describe the full lifecycle of data
 Identify data consumption patterns, expected data volumes and
SLAs (latency, avai...
3. Optimize access patterns
 Design database schemas for maximum efficiency
DynamoDB
» minimize payloads
» separate hot d...
DynamoDB
• Best Practices, How-Tos, and Tools
• http://aws.amazon.com/dynamodb/resources/
• Download DynamoDB Local
• http...
Questions
Upcoming SlideShare
Loading in...5
×

AWS Webinar - Dynamo DB + Redshift 13_09_19

4,065

Published on

Learn how Digital Advertising customers are leveraging the integration between Amazon DynamoDB and Amazon Redshift to manage their high scale data, from creation to analysis. In this session, we will describe the three essential ingredients of efficient data flow in the cloud, and introduce a reference architecture that enables customers to meet the demands for low latency and high volume encountered in the Digital Advertising industry. Using existing SQL-based tools and business intelligence systems, you will learn how to gain deeper insight from your data at lower cost. The design principles presented here will be useful to every environment where managing data at scale is a challenge.

Published in: Technology, Business

Transcript of "AWS Webinar - Dynamo DB + Redshift 13_09_19"

  1. 1. Designing for Scale Three steps to optimal data performance using DynamoDB and Redshift David Pearson Business Development
  2. 2. Amazon RDS Amazon DynamoDB Amazon Redshift Amazon ElastiCache Compute Storage AWS Global Infrastructure Database Application Services Deployment & Administration Networking AWS Database Services Scalable High Performance Application Storage in the Cloud
  3. 3. provision manage scale EFFORT differentiated?
  4. 4. Introduction to AWS Big Data Services RedshiftDynamoDB Elastic MapReduce Amazon S3 Object Storage Batch Processing Real-Time Transactions Online Analysis and Reporting
  5. 5. Amazon DynamoDB
  6. 6. NoSQL Database Predictable performance Seamless & massive scalability Fully managed; zero admin Amazon DynamoDB
  7. 7. Amazon’s Path to DynamoDB RDBMS DynamoDB
  8. 8. Amazon DynamoDB DEVS OPS USERS
  9. 9. Fast Application Development Time to Build New Applications • Flexible data models • Simple API • High-scale queries • Laptop development Amazon DynamoDB DEVS OPS USERS
  10. 10. Amazon DynamoDB DEVS OPS USERS Admin-Free (at any scale)
  11. 11. request-based capacity provisioning model Provisioned Throughput Throughput is declared and updated via the API or the console CreateTable (foo, reads/sec = 100, writes/sec = 150) UpdateTable (foo, reads/sec=10000, writes/sec=4500) DynamoDB handles the rest Capacity is reserved and available when needed Scaling-up triggers repartitioning and reallocation No impact to performance or availability
  12. 12. Amazon DynamoDB DEVS OPS USERS Durable Low Latency
  13. 13. WRITES Replicated continuously to 3 AZ’s Persisted to disk (custom SSD) READS Strongly or eventually consistent No latency trade-off
  14. 14. Latest News… DynamoDB Local • Disconnected development • Full API support • Download from http://aws.amazon.com/dynamodb/resources/#testing
  15. 15. “Compared to similar products, DynamoDB provides an amazing feature set, including super low latencies, (literally) push-button scaling, automatic data persistence, and seamless integration with Redshift and other AWS services.” Peter Bogunovich, RightAction Inc
  16. 16. AD SERVING
  17. 17. EC2 Profiles Database ad request ad url visitor Ad Servers DynamoDB 1. Visitor loads a web page 2. Web page issues a request to ad servers on EC2 3. Query to DynamoDB returns the ad to display 4. Link is returned to visitor cookie hash=userid range=timestamp user-profile hash=userid
  18. 18. EC2 Profiles DatabaseAd Servers DynamoDB Real-time bidding platform Bidder DynamoDB Ads ProfilesQueues and BufferBid response 20 ms 20 ms 20 ms 40 ms Request network transit Response network transit Decision on best ad and bid price based on optimization that needs multiple data look-ups Contingency time buffer … Bid request real-time bidding
  19. 19. EC2 Profiles Database ad request ad url visitor Ad Servers DynamoDB 1. Ad files are downloaded from CloudFront 2. Impressions captured in logs to S3 CloudFront advertisement impression logs Static Repository Files Amazon S3
  20. 20. CloudFront advertisement impression logs Static Repository Files Amazon S3 Profiles Database EC2 (MAZ) ad request ad url Ad Servers DynamoDB Elastic Load Balancing visitor Click-through Servers click through log files click through requests Elastic Load Balancing
  21. 21. Amazon Redshift
  22. 22. Relational data warehouse Massively parallel Petabyte scale Fully managed; zero admin Amazon Redshift
  23. 23. • Direct-attached storage • Large data block sizes • Columnar storage • Data compression • Zone maps Redshift dramatically reduces I/O Id Age State 123 20 CA 345 25 WA 678 40 FL Row storage Column storage
  24. 24. • Load • Query • Resize • Backup • Restore Redshift parallelizes and distributes everything Compute Node 16TB 10 GigE (HPC) Ingestion Backup Restore SQL Clients / BI Tools Amazon S3 Client VPC Compute Node 16TB Compute Node 16TB Leader Node
  25. 25. Start small and grow big Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE Cluster 2-100 Nodes (32 TB – 1.6 PB) note: nodes not to scale Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores, 10 GigE Cluster 2-100 Nodes (32 TB – 1.6 PB)
  26. 26. Monitor query performance
  27. 27. View explain plans
  28. 28. Redshift works with existing BI tools JDBC/ODBC Amazon Redshift More coming soon…
  29. 29. Redshift is Priced to Analyze All Your Data $0.85 per hour for on-demand (2TB) $999 per TB per year (3-yr reservation)
  30. 30. “Amazon Redshift introduces a major opportunity to improve the performance of our real-time reporting, allowing us to run queries up to 50 times faster than our current OLAP solution.” – Niek Sanders, VP Engineering Realized a 20x – 40x reduction in query times “Redshift is the real deal”
  31. 31. Analysis
  32. 32. CloudFront advertisement impression logs Static Repository Files Amazon S3 Profiles Database EC2 (MAZ) ad request ad url Ad Servers DynamoDB Elastic Load Balancing visitor Amazon Redshift bid history user history ETLClick-through Servers click through log files click through requests Elastic Load Balancing Amazon EMR updated profiles impressions new requests user history
  33. 33. Amazon Redshift Drive qualified users to advertiser’s sites • Ad server logs • 3rd party data • Bid history • User history Bid Optimization Optimizing with Redshift Optimize return on advertising expenditure • Impressions • 3rd party data • User history • Enrichment Cost Optimization
  34. 34. 1. Describe the full lifecycle of data  Identify data consumption patterns, expected data volumes and SLAs (latency, availability, durability) at each point on the timeline 2. Leverage specialized options DynamoDB – real-time transaction processing Redshift – online reporting and analysis EMR – enrichment S3 – data staging Three steps to optimal data performance
  35. 35. 3. Optimize access patterns  Design database schemas for maximum efficiency DynamoDB » minimize payloads » separate hot data from cold Redshift » good distribution and sort key selection – test as needed » efficient ingestion (from DynamoDB and S3) Three steps to optimal data performance
  36. 36. DynamoDB • Best Practices, How-Tos, and Tools • http://aws.amazon.com/dynamodb/resources/ • Download DynamoDB Local • http://aws.amazon.com/dynamodb/resources/#testing Redshift • Best practices for loading data • http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html • Best practices for designing tables • http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best- practices.html Resources
  37. 37. Questions
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×