DynamoDB Gluecon 2012

¡Ay, caramba!

Wrestle Your NoSQL
Data with DynamoDB
Je ff Dougl a s @je ff do n th em ic
C lo udSp ok es C ommun it y Arch itec t

Rambling Talk Roadmap

Short NoSQL overview (thanks Max @ 10gen!)

Why NoSQL database are like Mexican Wrestlers

Amazon DynamoDB in depth

Amazon DynamoDB demo and code

CloudSpokes challenge submissions for “Build an
#Awesome Demo with Amazon DynamoDB”

Times they are a-changin’

Cloud applications and
APIs need to be fast,
flexible and scalable.

RDBMS typically do not
scale well for certain data-intensive application.

NoSQL is cloud friendly.
“NoSQL is a rebellion against the DBAs who prevent us from
doing shit.”
- James Governor, Gluecon 2012

Why is NoSQL #awesome?
Developed to manage large volumes of data that
do not necessarily follow a fixed schema

Great for heavy read/write workloads

Simple to setup, configure and administer

Distributed, fault tolerant architecture

Scale out not up

Specialized database for the right task

Key NoSQL differences

Do not use SQL as a query language

Dynamic & schema-less

Non-relational, no JOIN operations

No complex transactions

May not give full ACID guarantees; eventually
consistent instead. Performance and real-time
nature is more important than consistency.

NoSQL databases are
“different”

NoSQL database types

Document store (MongoDB, CouchDB)
A document-oriented database that stores, retrieves, and manages semi
structured data including XML, YAML, JSON and binary (PDF, DOC)

Key-value store (Cassandra, Redis)
Stores scheme-less data referenced by a simple key value

Graph database (Neo4j, FlockDB)
Stores the relationship of data as a graph (social relations, network
topologies)

How to choose?
With all of the different NoSQL database types, how
do you choose the “best” one?

El Toro Más Macho
MongoDB
Stores structured data as JSON-like
documents.

Ad hoc queries, indexing, master-slave
replication, sharding, server-side JavaScript
execution

All the “cool kids” are using it.

Node.js + MongoDB = WINNING!

Muy Guapo
Couchbase
JSON Document store

Embedded CouchDB with caching,
clustering and high-performance storage
management components.

JavaScript as its query language and
HTTP for an API

Serve HTML and JavaScript-based
“CouchApps”

El Matador Misterio
Redis
What exactly is redis? MAGIC!

By definition, it’s an in-memory, key-value
data store with optional durability.

Data model includes list of string, sets of
strings, sorted sets of strings & hashes.

Awesome at doing set comparisons.

Comando Loco
Apache Hadoop

Fast, reliable analysis of both structured data
and complex data.

Derived from Google's MapReduce and File
System (GFS) papers. Yahoo is one of the
main contributors.

Reliable data storage using the Hadoop
Distributed File System (HDFS) and high-
performance parallel data processing using
MapReduce.

El Jefe Supremo
Apache Cassandra
Massively scalable key-value store initially
developed by Facebook.

BigTable data model (nested hashes) running
on an Amazon Dynamo-like infrastructure.

Has some RDBMS “feel” with column families
that make it it a hybrid column/row store.

No single point of failure, fault-tolerant multi
data center replication, MapReduce support.

CQL (Cassandra Query Language)

¡Hola DynamoDB

Amazon DynamoDB is a fast, fully managed key-value
database service that scales seamlessly with extremely
low latency and predictable performance.

Store and retrieve any amount of data

Serve any level of request traffic

Hands off administration

Pay for throughput and not storage

¡No! administración
No hardware or software provisioning, setup and
configuration, software patching, or partitioning data over
multiple instances and regions.

Specify the request throughput for your table and in the
background, Amazon handles the provisioning of resources to
meet the requested throughput rate.

Automatically partitions/re-partitions data and provisions
additional server capacity based upon table size & throughput.

Synchronously replicates data across multiple facilities in an
AWS Region giving you high availability and data durability.

Muy rápido

Consistent, predictable performance

Runs on a new solid state disk (SSD) architecture
for low-latency response times.

Read latencies average less than 5 milliseconds,
and write latencies average less than 10
milliseconds.

Muy Escalable

No table size limits (adiós SimpleDB?)

No downtime when scaling up or down

Unlimited storage

Automatically scale machine resources in
response to increases in database traffic without
the need of client-side partitioning.

Modelo de datos flexible

Flexible data model with familiar tables, items
and key-value pairs.

Schema-less document storage. Each item can
have different attributes.

Easy to create and modify documents. Simple
API.

No cross-table joins. Use composite keys to
model relationships.

Duradero

Consistent, disk-only writes

Atomic increment/decrement (w/single API call)

Optimistic concurrency control (aka conditional
writes & updates)

Item level transactions (even in bulk)

Automatic and synchronous replication across
data centers and availability zones.

Costos?

Pay for throughput and not storage.

Priced per hour of provisioned read/write
throughput

Scales up and down well with a free tier

Write throughput

Write throughput

Unit = size of item x writes/second

$0.01 per hour for 10 write units

Read throughput
Strongly consistent reads (mucho dinero)

Eventually consistent reads

See Amazon’s site for read throughput pricing!

Other features

Integrates with Amazon Elastic MapReduce and
Hadoop.

Libraries, mappers and mocks for Django,
Erlang, Java, .NET, Node.js, Perl, PHP, Python &
Ruby.

Session based authentication using Amazon
Security Token Service

Monitoring via CloudWatch

DynamoDB Semantics

Tables, item & attributes

Items are indexed by primary key (single hash
and composite keys)

Items are a collection of attributes and attributes
have a key and value.

Unlimited number of attributes up to 64k total.

Simple API calls

CreateTable PutItem
UpdateTable GetItem
DeleteTable UpdateItem
DescribeTable DeleteItem
ListTables

Query BatchGetItem
Scan BatchWriteItem

Kiva loan browser

http://kivabrowser.elasticbeanstalk.com

Flickr on DynamoDB

Wcheung (Canada) submitted a Grails application that caches Flickr photos in
Amazon DynamoDB. You can then search for cached feed entries by primary key
(author + published date/time range) or by table scan. You can also “like” a
photo, resulting in the atomic “like” counter for the item in DynamoDB getting
incremented.

http://screencast.com/t/MAVgm7xeqDpr

Posterity

Mbleigh (US) submitted a simple, barebones Twitter-esque service created in
Ruby using Sinatra. It is far from complete but uses a number of DynamoDB's
key features including Hash/Range Keys and Atomic Set Push Operations.

http://www.screencast.com/t/me8hW27MYs3x

DynamoDB Task Manager

Darthdeus (Czech Republic) wrote his app in Ruby using Sinatra. It uses a custom
ORM he wrote called DynamoRecord to access DynamoDB. His main idea was to
get at least some of the ActiveRecord-ish API to DynamoDB using some basic
metaprogramming

http://www.youtube.com/watch?v=9tOzaDPP39I

Simple Sur vey

Peakpado (US) created an application using Ruby on Rails. For each table he
created a sophisticated hask/range key model class which resulted in an API very
similar to ActiveRecord for DynamoDB.

http://screencast.com/t/ri1XkMxGcpnS

Data Sets for Mumbai

Romin (India) developed an API that exposes data sets of Mumbai city in JSON
format. The solution uses Amazon DynamoDB for storing the data and a NodeJS
application that exposes the REST interface and talks to Amazon DynamoDB via
a backend Java application.

Thanks!

Jeff Douglas
CloudSpokes
Community Architect

@jeffdonthemic
jeff@cloudspokes.com

http://www.cloudspokes.com
http://blog.jeffdouglas.com

DynamoDB Gluecon 2012

More Related Content

What's hot

Similar to DynamoDB Gluecon 2012

More from Appirio

Recently uploaded

DynamoDB Gluecon 2012

Editor's Notes