Wrestle Your NoSQL
Data with DynamoDB
Je ff Dougl a s @je ff do n th em ic
C lo udSp ok es C ommun it y Arch itec t
Rambling Talk Roadmap
Short NoSQL overview (thanks Max @ 10gen!)
Why NoSQL database are like Mexican Wrestlers
Amazon DynamoDB in depth
Amazon DynamoDB demo and code
CloudSpokes challenge submissions for “Build an
#Awesome Demo with Amazon DynamoDB”
Times they are a-changin’
Cloud applications and
APIs need to be fast,
flexible and scalable.
RDBMS typically do not
scale well for certain data-intensive application.
NoSQL is cloud friendly.
“NoSQL is a rebellion against the DBAs who prevent us from
- James Governor, Gluecon 2012
Why is NoSQL #awesome?
Developed to manage large volumes of data that
do not necessarily follow a fixed schema
Great for heavy read/write workloads
Simple to setup, configure and administer
Distributed, fault tolerant architecture
Scale out not up
Specialized database for the right task
Key NoSQL differences
Do not use SQL as a query language
Dynamic & schema-less
Non-relational, no JOIN operations
No complex transactions
May not give full ACID guarantees; eventually
consistent instead. Performance and real-time
nature is more important than consistency.
NoSQL database types
Document store (MongoDB, CouchDB)
A document-oriented database that stores, retrieves, and manages semi
structured data including XML, YAML, JSON and binary (PDF, DOC)
Key-value store (Cassandra, Redis)
Stores scheme-less data referenced by a simple key value
Graph database (Neo4j, FlockDB)
Stores the relationship of data as a graph (social relations, network
How to choose?
With all of the different NoSQL database types, how
do you choose the “best” one?
El Toro Más Macho
Stores structured data as JSON-like
Ad hoc queries, indexing, master-slave
All the “cool kids” are using it.
Node.js + MongoDB = WINNING!
JSON Document store
Embedded CouchDB with caching,
clustering and high-performance storage
HTTP for an API
El Matador Misterio
What exactly is redis? MAGIC!
By definition, it’s an in-memory, key-value
data store with optional durability.
Data model includes list of string, sets of
strings, sorted sets of strings & hashes.
Awesome at doing set comparisons.
Fast, reliable analysis of both structured data
and complex data.
Derived from Google's MapReduce and File
System (GFS) papers. Yahoo is one of the
Reliable data storage using the Hadoop
Distributed File System (HDFS) and high-
performance parallel data processing using
El Jefe Supremo
Massively scalable key-value store initially
developed by Facebook.
BigTable data model (nested hashes) running
on an Amazon Dynamo-like infrastructure.
Has some RDBMS “feel” with column families
that make it it a hybrid column/row store.
No single point of failure, fault-tolerant multi
data center replication, MapReduce support.
CQL (Cassandra Query Language)
Amazon DynamoDB is a fast, fully managed key-value
database service that scales seamlessly with extremely
low latency and predictable performance.
Store and retrieve any amount of data
Serve any level of request traffic
Hands off administration
Pay for throughput and not storage
No hardware or software provisioning, setup and
configuration, software patching, or partitioning data over
multiple instances and regions.
Specify the request throughput for your table and in the
background, Amazon handles the provisioning of resources to
meet the requested throughput rate.
Automatically partitions/re-partitions data and provisions
additional server capacity based upon table size & throughput.
Synchronously replicates data across multiple facilities in an
AWS Region giving you high availability and data durability.
Consistent, predictable performance
Runs on a new solid state disk (SSD) architecture
for low-latency response times.
Read latencies average less than 5 milliseconds,
and write latencies average less than 10
No table size limits (adiós SimpleDB?)
No downtime when scaling up or down
Automatically scale machine resources in
response to increases in database traffic without
the need of client-side partitioning.
Modelo de datos flexible
Flexible data model with familiar tables, items
and key-value pairs.
Schema-less document storage. Each item can
have different attributes.
Easy to create and modify documents. Simple
No cross-table joins. Use composite keys to
Consistent, disk-only writes
Atomic increment/decrement (w/single API call)
Optimistic concurrency control (aka conditional
writes & updates)
Item level transactions (even in bulk)
Automatic and synchronous replication across
data centers and availability zones.
Pay for throughput and not storage.
Priced per hour of provisioned read/write
Scales up and down well with a free tier
Unit = size of item x writes/second
$0.01 per hour for 10 write units
Strongly consistent reads (mucho dinero)
Eventually consistent reads
See Amazon’s site for read throughput pricing!
Integrates with Amazon Elastic MapReduce and
Libraries, mappers and mocks for Django,
Erlang, Java, .NET, Node.js, Perl, PHP, Python &
Session based authentication using Amazon
Security Token Service
Monitoring via CloudWatch
Tables, item & attributes
Items are indexed by primary key (single hash
and composite keys)
Items are a collection of attributes and attributes
have a key and value.
Unlimited number of attributes up to 64k total.
Flickr on DynamoDB
Wcheung (Canada) submitted a Grails application that caches Flickr photos in
Amazon DynamoDB. You can then search for cached feed entries by primary key
(author + published date/time range) or by table scan. You can also “like” a
photo, resulting in the atomic “like” counter for the item in DynamoDB getting
Mbleigh (US) submitted a simple, barebones Twitter-esque service created in
Ruby using Sinatra. It is far from complete but uses a number of DynamoDB's
key features including Hash/Range Keys and Atomic Set Push Operations.
DynamoDB Task Manager
Darthdeus (Czech Republic) wrote his app in Ruby using Sinatra. It uses a custom
ORM he wrote called DynamoRecord to access DynamoDB. His main idea was to
get at least some of the ActiveRecord-ish API to DynamoDB using some basic
Simple Sur vey
Peakpado (US) created an application using Ruby on Rails. For each table he
created a sophisticated hask/range key model class which resulted in an API very
similar to ActiveRecord for DynamoDB.
Data Sets for Mumbai
Romin (India) developed an API that exposes data sets of Mumbai city in JSON
format. The solution uses Amazon DynamoDB for storing the data and a NodeJS
application that exposes the REST interface and talks to Amazon DynamoDB via
a backend Java application.