In this session, we discuss the benefits of NoSQL databases and take a tour of the main NoSQL services offered by AWS—Amazon DynamoDB and Amazon ElastiCache. Then, we hear from two leading customers, Expedia and Mapbox, about their use cases and architectural challenges, and how they addressed them using AWS NoSQL services, including design patterns and best practices. You will walk out of this session having a better understanding of NoSQL and its powerful capabilities, ready to tackle your database challenges with confidence.
2. What to expect from the session
• NoSQL
• Why managed database service?
• AWS managed services – DynamoDB and ElastiCache
• Customer use case – Expedia
• Customer use case – Mapbox
4. NoSQL vs. SQL for a new app: how to choose?
• Schema-less, easy reads
and writes, simple data
model
• Scaling is easy
• Focus on performance and
availability at any scale
• Strong schema, complex
relationships,
transactions and joins
• Scaling is difficult
• Focus on consistency
over scale and availability
NoSQL SQL
7. If you host your databases on premises
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
OS installation
you
App optimization
8. If you host your databases on premises
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
OS installation
you
App optimization
9. If you host your databases on Amazon EC2
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
OS installation
you
App optimization
10. OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
you
App optimization
Power, HVAC, net
Rack & stack
Server maintenance
OS installation
If you host your databases on Amazon EC2
11. If you choose a managed database service
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB s/w patches
Database backups
App optimization
High availability
DB s/w installs
OS installation
you
Scaling
14. Amazon DynamoDB
• Managed NoSQL database service
• Highly scalable
• Consistent, single-digit millisecond latency at any scale
• Highly durable and available—3x replication
• Accessible via simple and powerful APIs
• Supports both document and key-value data models
• No table size or throughout limits
15. Table Table
Items
Attributes
Hash
Key
Range
Key
Mandatory
Key-value access pattern
Determines data distribution Optional
Model 1:N relationships
Enables rich query
capabilities
All items for a hash key
==, <, >, >=, <=
“begins with”
“between”
sorted results
counts
top/bottom N values
paged responses
16. Data types
String (S)
Number (N)
Binary (B)
String Set (SS)
Number Set (NS)
Binary Set (BS)
Boolean (BOOL)
Null (NULL)
List (L)
Map (M)
Used for storing nested JSON documents
18. Provisioned throughput model and scaling
Throughput provisioned at the table level
WCU and RCU are independent
Consumed capacity is measured per operation
Scaling is achieved through automatic partitioning
WCURCU
21. Why in-memory?
• Everything is connected - phones, tablets, cars, air
conditioners, toasters
• Demand for real-time performance – online games, ad
tech, eCommerce, social apps, etc.
• Load is spikey and unpredictable
• Database performance often the bottleneck
22. Amazon ElastiCache
and it comes in two flavors:
• AWS managed service that you
use to easily create, use, and
scale in-memory key-value
stores in the cloud
Memcached
24. asynchronousreplication
Redis HA on ElastiCache
Availability Zone #1 Availability Zone #2
writes
use “Primary
Endpoint”
from Node Group
reads
use ‘replica’ endpoints
from Node Group
*can use ‘primary’ also
Auto-Failover
Goes to replica with
lowest replication lag
No changes in DNS
25. Exciting stuff! How do I learn more?
• DynamoDB:
– DAT401 - Amazon DynamoDB Deep Dive: Schema Design, Indexing, JSON, Search, and More –
will be available on YouTube
– WRK302 - Event-Driven Programming – Wednesday, 10/7 3:15pm-5:15pm, Galileo 1006
– BDT313 - Amazon DynamoDB for Big Data – Thursday, 10/8 11am-12pm, Lando 4306
• ElastiCache
– DAT407 - Amazon ElastiCache: Deep Dive, Thursday, 10/8 11am-12pm, Lando 4301B
– “Performance at scale with Amazon ElastiCache” whitepaper.
27. What to expect
Introduction
Overview of a real-time analytics system that leverages DynamoDB and
ElastiCache
Lessons learned
Recommendations on using NoSQL offerings in AWS
28. Who I am
An Engineering Manager at Expedia, Inc.
• Lead a team of engineers who provide a self-service
platform for teams in our organization to use the
services in AWS
• Leading an effort to Dockerize our micro services
Prior to Expedia Inc., I worked at
• Shell
• Microsoft
@307redirect
29. One of the world’s leading travel companies
Our mission is to revolutionize travel through the power of technology
Passionate
We are passionate about travel.
We are a company of travelers
who come to work every day with
passion to make travel better.
Innovative
We are innovative. We use our collective
intelligence to invent technology and
create products to simplify and improve
travel for our customers and partners.
30. AWS services that we use
EC2 ECS S3
DynamoDB ElastiCache Lambda
RDS
EMR
Auto Scaling
Route 53
CloudFormation
Direct Connect
33. What is it?
• The application collects data for our test &
learn experiments that run on Expedia* sites
and processes them and stores them in a DB
• The application processes ~200 million
messages a day
• The application uses Apache Storm,
DynamoDB, and ElastiCache Redis
• Traffic pattern varies on our Expedia* sites and
the application handles burst traffic accordingly
34. What is it? (continued)
Continuous flow of data
ElastiCache – Redis
DynamoDB
Apache Storm infrastructure running
on distributed EC2 instances
Hit / Miss
Request
Response
35. Why DynamoDB?
• We initially started out setting up Cassandra cluster for our use case
• We spent more than a week setting up 3 node Cassandra ring and we
were nowhere close to setting up Cassandra in AWS with clustering,
monitoring for scaling out
• With DynamoDB we were up and running in a less than day
• Setup, monitoring, ease to scale made us choose DynamoDB
over Cassandra
• With DynamoDB there is no need for a team to maintain
36. Initial design
Single table
• Primary key (String) (Experiment Date)
• Secondary key (Number)
• No secondary indexes
The table was designed with the below assumptions
• ~1.5 GB/day
• Data needs to be stored for few months
• Shouldn’t affect performance significantly
37. Initial design
Row creation
• Perform read using primary key (strongly consistent)
• If item doesn't exist perform write request to create
Row update
• Retrieve all rows corresponding to primary key
• Check if update is required
• Write updated values back to table
38. Challenges with initial design
• Due to request throttling increase in
response times causing backup in the
real-time analytics application
• Throttling seemed to occur more
frequently as table size grew larger
• Table size grew to > 4.6 TB
• We had to increase the write throughput
on the DynamoDB table to 35000 to
handle burst traffic when 3500 was
sufficient to handle the sustained traffic
39. Lessons learned
• Take a closer look at the application access patterns
• Most of the read/write requests are directed towards ‘most
recent’ data
• A lot of ‘repeat’ requests – wasting throughput
• Read the documentation carefully
• DynamoDB creates partitions for every 10 GB and in our case we
have 460 partitions which caused increased response time in the
application
40. Current design
To reduce the increased response time
• We changed the design and introduced a
caching layer for read
• We are leveraging ElastiCache (Redis) as the
caching layer
• The conditional reads are read from Redis vs.
from DynamoDB
• With caching layer in place, the write
throughput has been set to 3500 now, down
from 35000
41. Why ElastiCache?
• Managed
• Fault detection and recovery
• Monitoring
• Scale up and down
• Multi-AZ
• Backup and Restore
• Supports
• Redis
• Memcached
42. Current design – performance improvements
Added a caching layer
• Repeat requests don’t even make it to DynamoDB
• Current cache hit ratio – 3x (3000 hits per 1000 misses)
43. Results
• Provisioned write
capacity has been
reduced from 35000
to 3500, reducing
the cost significantly
by 6x
• Highly performant
and high throughput
application backed
by DynamoDB and
ElastiCache
44. Recommendations
• Take a closer look at the application access
patterns so that you can design application
appropriately
• Add a caching layer if you have repeat
requests
• Configure throughput on DynamoDB, which
can handle your burst traffic and doesn't slow
you down
45. How we use DynamoDB and ElastiCache
for global delivery
Ian Ward, Mapbox
Mapbox global scale
46. What is Mapbox
How Mapbox uses AWS
Why we use DynamoDB
How ElastiCache improves global performance
60. API service
Client
GLOBAL 9 REGIONS 2 REGIONS
DNS
CDN
Route 53
CloudFront
DNS
Route 53
ELB
Application
servers
Cache
Object store
Database
ElastiCache
S3
DynamoDB
68. API service
Client
GLOBAL 9 REGIONS 2 REGIONS
DNS
CDN
Route 53
CloudFront
DNS
Route 53
ELB
Application
servers
Cache
Object store
Database
ElastiCache
S3
DynamoDB
80. Learn More
• DynamoDB:
– DAT401 - Amazon DynamoDB Deep Dive: Schema Design, Indexing, JSON, Search, and More –
will be available on YouTube
– WRK302 - Event-Driven Programming – Wednesday, 10/7 3:15pm-5:15pm, Galileo 1006
– BDT313 - Amazon DynamoDB for Big Data – Thursday, 10/8 11am-12pm, Lando 4306
• ElastiCache
– DAT407 - Amazon ElastiCache: Deep Dive, Thursday, 10/8 11am-12pm, Lando 4301B
– “Performance at scale with Amazon ElastiCache” whitepaper.