• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
AWS Webinar - Dynamo DB + Redshift 13_09_19
 

AWS Webinar - Dynamo DB + Redshift 13_09_19

on

  • 2,114 views

Learn how Digital Advertising customers are leveraging the integration between Amazon DynamoDB and Amazon Redshift to manage their high scale data, from creation to analysis. In this session, we will ...

Learn how Digital Advertising customers are leveraging the integration between Amazon DynamoDB and Amazon Redshift to manage their high scale data, from creation to analysis. In this session, we will describe the three essential ingredients of efficient data flow in the cloud, and introduce a reference architecture that enables customers to meet the demands for low latency and high volume encountered in the Digital Advertising industry. Using existing SQL-based tools and business intelligence systems, you will learn how to gain deeper insight from your data at lower cost. The design principles presented here will be useful to every environment where managing data at scale is a challenge.

Statistics

Views

Total Views
2,114
Views on SlideShare
2,041
Embed Views
73

Actions

Likes
4
Downloads
81
Comments
0

1 Embed 73

http://www.scoop.it 73

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    AWS Webinar - Dynamo DB + Redshift 13_09_19 AWS Webinar - Dynamo DB + Redshift 13_09_19 Presentation Transcript

    • Designing for Scale Three steps to optimal data performance using DynamoDB and Redshift David Pearson Business Development
    • Amazon RDS Amazon DynamoDB Amazon Redshift Amazon ElastiCache Compute Storage AWS Global Infrastructure Database Application Services Deployment & Administration Networking AWS Database Services Scalable High Performance Application Storage in the Cloud
    • provision manage scale EFFORT differentiated?
    • Introduction to AWS Big Data Services RedshiftDynamoDB Elastic MapReduce Amazon S3 Object Storage Batch Processing Real-Time Transactions Online Analysis and Reporting
    • Amazon DynamoDB
    • NoSQL Database Predictable performance Seamless & massive scalability Fully managed; zero admin Amazon DynamoDB
    • Amazon’s Path to DynamoDB RDBMS DynamoDB
    • Amazon DynamoDB DEVS OPS USERS
    • Fast Application Development Time to Build New Applications • Flexible data models • Simple API • High-scale queries • Laptop development Amazon DynamoDB DEVS OPS USERS
    • Amazon DynamoDB DEVS OPS USERS Admin-Free (at any scale)
    • request-based capacity provisioning model Provisioned Throughput Throughput is declared and updated via the API or the console CreateTable (foo, reads/sec = 100, writes/sec = 150) UpdateTable (foo, reads/sec=10000, writes/sec=4500) DynamoDB handles the rest Capacity is reserved and available when needed Scaling-up triggers repartitioning and reallocation No impact to performance or availability
    • Amazon DynamoDB DEVS OPS USERS Durable Low Latency
    • WRITES Replicated continuously to 3 AZ’s Persisted to disk (custom SSD) READS Strongly or eventually consistent No latency trade-off
    • Latest News… DynamoDB Local • Disconnected development • Full API support • Download from http://aws.amazon.com/dynamodb/resources/#testing
    • “Compared to similar products, DynamoDB provides an amazing feature set, including super low latencies, (literally) push-button scaling, automatic data persistence, and seamless integration with Redshift and other AWS services.” Peter Bogunovich, RightAction Inc
    • AD SERVING
    • EC2 Profiles Database ad request ad url visitor Ad Servers DynamoDB 1. Visitor loads a web page 2. Web page issues a request to ad servers on EC2 3. Query to DynamoDB returns the ad to display 4. Link is returned to visitor cookie hash=userid range=timestamp user-profile hash=userid
    • EC2 Profiles DatabaseAd Servers DynamoDB Real-time bidding platform Bidder DynamoDB Ads ProfilesQueues and BufferBid response 20 ms 20 ms 20 ms 40 ms Request network transit Response network transit Decision on best ad and bid price based on optimization that needs multiple data look-ups Contingency time buffer … Bid request real-time bidding
    • EC2 Profiles Database ad request ad url visitor Ad Servers DynamoDB 1. Ad files are downloaded from CloudFront 2. Impressions captured in logs to S3 CloudFront advertisement impression logs Static Repository Files Amazon S3
    • CloudFront advertisement impression logs Static Repository Files Amazon S3 Profiles Database EC2 (MAZ) ad request ad url Ad Servers DynamoDB Elastic Load Balancing visitor Click-through Servers click through log files click through requests Elastic Load Balancing
    • Amazon Redshift
    • Relational data warehouse Massively parallel Petabyte scale Fully managed; zero admin Amazon Redshift
    • • Direct-attached storage • Large data block sizes • Columnar storage • Data compression • Zone maps Redshift dramatically reduces I/O Id Age State 123 20 CA 345 25 WA 678 40 FL Row storage Column storage
    • • Load • Query • Resize • Backup • Restore Redshift parallelizes and distributes everything Compute Node 16TB 10 GigE (HPC) Ingestion Backup Restore SQL Clients / BI Tools Amazon S3 Client VPC Compute Node 16TB Compute Node 16TB Leader Node
    • Start small and grow big Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE Cluster 2-100 Nodes (32 TB – 1.6 PB) note: nodes not to scale Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores, 10 GigE Cluster 2-100 Nodes (32 TB – 1.6 PB)
    • Monitor query performance
    • View explain plans
    • Redshift works with existing BI tools JDBC/ODBC Amazon Redshift More coming soon…
    • Redshift is Priced to Analyze All Your Data $0.85 per hour for on-demand (2TB) $999 per TB per year (3-yr reservation)
    • “Amazon Redshift introduces a major opportunity to improve the performance of our real-time reporting, allowing us to run queries up to 50 times faster than our current OLAP solution.” – Niek Sanders, VP Engineering Realized a 20x – 40x reduction in query times “Redshift is the real deal”
    • Analysis
    • CloudFront advertisement impression logs Static Repository Files Amazon S3 Profiles Database EC2 (MAZ) ad request ad url Ad Servers DynamoDB Elastic Load Balancing visitor Amazon Redshift bid history user history ETLClick-through Servers click through log files click through requests Elastic Load Balancing Amazon EMR updated profiles impressions new requests user history
    • Amazon Redshift Drive qualified users to advertiser’s sites • Ad server logs • 3rd party data • Bid history • User history Bid Optimization Optimizing with Redshift Optimize return on advertising expenditure • Impressions • 3rd party data • User history • Enrichment Cost Optimization
    • 1. Describe the full lifecycle of data  Identify data consumption patterns, expected data volumes and SLAs (latency, availability, durability) at each point on the timeline 2. Leverage specialized options DynamoDB – real-time transaction processing Redshift – online reporting and analysis EMR – enrichment S3 – data staging Three steps to optimal data performance
    • 3. Optimize access patterns  Design database schemas for maximum efficiency DynamoDB » minimize payloads » separate hot data from cold Redshift » good distribution and sort key selection – test as needed » efficient ingestion (from DynamoDB and S3) Three steps to optimal data performance
    • DynamoDB • Best Practices, How-Tos, and Tools • http://aws.amazon.com/dynamodb/resources/ • Download DynamoDB Local • http://aws.amazon.com/dynamodb/resources/#testing Redshift • Best practices for loading data • http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html • Best practices for designing tables • http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best- practices.html Resources
    • Questions