¡Ay, caramba!

  Wrestle Your NoSQL
  Data with DynamoDB
 Je ff Dougl a s @je ff do n th em ic
 C lo udSp ok es C ommun it y Arch itec t
Rambling Talk Roadmap

Short NoSQL overview (thanks Max @ 10gen!)

Why NoSQL database are like Mexican Wrestlers

Amazon DynamoDB in depth

Amazon DynamoDB demo and code

CloudSpokes challenge submissions for “Build an
#Awesome Demo with Amazon DynamoDB”
Times they are a-changin’

    Cloud applications and
    APIs need to be fast,
    flexible and scalable.

    RDBMS typically do not
    scale well for certain data-intensive application.

    NoSQL is cloud friendly.
“NoSQL is a rebellion against the DBAs who prevent us from
                        doing shit.”
                          - James Governor, Gluecon 2012
Why is NoSQL #awesome?
Developed to manage large volumes of data that
do not necessarily follow a fixed schema

Great for heavy read/write workloads

Simple to setup, configure and administer

Distributed, fault tolerant architecture

Scale out not up

Specialized database for the right task
Key NoSQL differences

Do not use SQL as a query language

Dynamic & schema-less

Non-relational, no JOIN operations

No complex transactions

May not give full ACID guarantees; eventually
consistent instead. Performance and real-time
nature is more important than consistency.
NoSQL databases are
    “different”
NoSQL database types

Document store (MongoDB, CouchDB)
 A document-oriented database that stores, retrieves, and manages semi
 structured data including XML, YAML, JSON and binary (PDF, DOC)


Key-value store (Cassandra, Redis)
 Stores scheme-less data referenced by a simple key value


Graph database (Neo4j, FlockDB)
 Stores the relationship of data as a graph (social relations, network
 topologies)
How to choose?
With all of the different NoSQL database types, how
            do you choose the “best” one?
El Toro Más Macho
                    MongoDB
       Stores structured data as JSON-like
       documents.

       Ad hoc queries, indexing, master-slave
       replication, sharding, server-side JavaScript
       execution

       All the “cool kids” are using it.

       Node.js + MongoDB = WINNING!
Muy Guapo
              Couchbase
   JSON Document store

   Embedded CouchDB with caching,
   clustering and high-performance storage
   management components.

   JavaScript as its query language and
   HTTP for an API

   Serve HTML and JavaScript-based
   “CouchApps”
El Matador Misterio
                         Redis
        What exactly is redis? MAGIC!

        By definition, it’s an in-memory, key-value
        data store with optional durability.

        Data model includes list of string, sets of
        strings, sorted sets of strings & hashes.

        Awesome at doing set comparisons.
Comando Loco
              Apache Hadoop

    Fast, reliable analysis of both structured data
    and complex data.

    Derived from Google's MapReduce and File
    System (GFS) papers. Yahoo is one of the
    main contributors.

    Reliable data storage using the Hadoop
    Distributed File System (HDFS) and high-
    performance parallel data processing using
    MapReduce.
El Jefe Supremo
              Apache Cassandra
     Massively scalable key-value store initially
     developed by Facebook.

     BigTable data model (nested hashes) running
     on an Amazon Dynamo-like infrastructure.

     Has some RDBMS “feel” with column families
     that make it it a hybrid column/row store.

     No single point of failure, fault-tolerant multi
     data center replication, MapReduce support.

     CQL (Cassandra Query Language)
Introducing...
La Amazon DynamoDB
¡Hola DynamoDB

Amazon DynamoDB is a fast, fully managed key-value
database service that scales seamlessly with extremely
low latency and predictable performance.

   Store and retrieve any amount of data

   Serve any level of request traffic

   Hands off administration

   Pay for throughput and not storage
¡No! administración
No hardware or software provisioning, setup and
configuration, software patching, or partitioning data over
multiple instances and regions.

Specify the request throughput for your table and in the
background, Amazon handles the provisioning of resources to
meet the requested throughput rate.

Automatically partitions/re-partitions data and provisions
additional server capacity based upon table size & throughput.

Synchronously replicates data across multiple facilities in an
AWS Region giving you high availability and data durability.
Muy rápido

Consistent, predictable performance

Runs on a new solid state disk (SSD) architecture
for low-latency response times.

Read latencies average less than 5 milliseconds,
and write latencies average less than 10
milliseconds.
Muy Escalable

No table size limits (adiós SimpleDB?)

No downtime when scaling up or down

Unlimited storage

Automatically scale machine resources in
response to increases in database traffic without
the need of client-side partitioning.
Modelo de datos flexible

Flexible data model with familiar tables, items
and key-value pairs.

Schema-less document storage. Each item can
have different attributes.

Easy to create and modify documents. Simple
API.

No cross-table joins. Use composite keys to
model relationships.
Duradero

Consistent, disk-only writes

Atomic increment/decrement (w/single API call)

Optimistic concurrency control (aka conditional
writes & updates)

Item level transactions (even in bulk)

Automatic and synchronous replication across
data centers and availability zones.
Costos?

Pay for throughput and not storage.

Priced per hour of provisioned read/write
throughput

Scales up and down well with a free tier
Write throughput

Write throughput

Unit = size of item x writes/second

$0.01 per hour for 10 write units
Read throughput
Strongly consistent reads (mucho dinero)

Eventually consistent reads




       See Amazon’s site for read throughput pricing!
Other features

Integrates with Amazon Elastic MapReduce and
Hadoop.

Libraries, mappers and mocks for Django,
Erlang, Java, .NET, Node.js, Perl, PHP, Python &
Ruby.

Session based authentication using Amazon
Security Token Service

Monitoring via CloudWatch
DynamoDB Semantics

Tables, item & attributes

Items are indexed by primary key (single hash
and composite keys)

Items are a collection of attributes and attributes
have a key and value.

Unlimited number of attributes up to 64k total.
Simple API calls

 CreateTable       PutItem
UpdateTable        GetItem
 DeleteTable      UpdateItem
DescribeTable     DeleteItem
  ListTables

   Query          BatchGetItem
   Scan          BatchWriteItem
Kiva loan browser




http://kivabrowser.elasticbeanstalk.com
CRUD items
Connect to DynamoDB
New Loan
Show Loan
All/Filter Loans
CloudSpokes Challenge
Flickr on DynamoDB




 Wcheung (Canada) submitted a Grails application that caches Flickr photos in
Amazon DynamoDB. You can then search for cached feed entries by primary key
  (author + published date/time range) or by table scan. You can also “like” a
 photo, resulting in the atomic “like” counter for the item in DynamoDB getting
                                   incremented.

        http://screencast.com/t/MAVgm7xeqDpr
Posterity




Mbleigh (US) submitted a simple, barebones Twitter-esque service created in
Ruby using Sinatra. It is far from complete but uses a number of DynamoDB's
 key features including Hash/Range Keys and Atomic Set Push Operations.

  http://www.screencast.com/t/me8hW27MYs3x
DynamoDB Task Manager




Darthdeus (Czech Republic) wrote his app in Ruby using Sinatra. It uses a custom
ORM he wrote called DynamoRecord to access DynamoDB. His main idea was to
 get at least some of the ActiveRecord-ish API to DynamoDB using some basic
                                metaprogramming

 http://www.youtube.com/watch?v=9tOzaDPP39I
Simple Sur vey




  Peakpado (US) created an application using Ruby on Rails. For each table he
created a sophisticated hask/range key model class which resulted in an API very
                     similar to ActiveRecord for DynamoDB.

         http://screencast.com/t/ri1XkMxGcpnS
Data Sets for Mumbai




 Romin (India) developed an API that exposes data sets of Mumbai city in JSON
format. The solution uses Amazon DynamoDB for storing the data and a NodeJS
application that exposes the REST interface and talks to Amazon DynamoDB via
                           a backend Java application.
Thanks!

Jeff Douglas
CloudSpokes
Community Architect

@jeffdonthemic
jeff@cloudspokes.com



             http://www.cloudspokes.com
              http://blog.jeffdouglas.com

DynamoDB Gluecon 2012

  • 1.
    ¡Ay, caramba! Wrestle Your NoSQL Data with DynamoDB Je ff Dougl a s @je ff do n th em ic C lo udSp ok es C ommun it y Arch itec t
  • 2.
    Rambling Talk Roadmap ShortNoSQL overview (thanks Max @ 10gen!) Why NoSQL database are like Mexican Wrestlers Amazon DynamoDB in depth Amazon DynamoDB demo and code CloudSpokes challenge submissions for “Build an #Awesome Demo with Amazon DynamoDB”
  • 3.
    Times they area-changin’ Cloud applications and APIs need to be fast, flexible and scalable. RDBMS typically do not scale well for certain data-intensive application. NoSQL is cloud friendly. “NoSQL is a rebellion against the DBAs who prevent us from doing shit.” - James Governor, Gluecon 2012
  • 4.
    Why is NoSQL#awesome? Developed to manage large volumes of data that do not necessarily follow a fixed schema Great for heavy read/write workloads Simple to setup, configure and administer Distributed, fault tolerant architecture Scale out not up Specialized database for the right task
  • 5.
    Key NoSQL differences Donot use SQL as a query language Dynamic & schema-less Non-relational, no JOIN operations No complex transactions May not give full ACID guarantees; eventually consistent instead. Performance and real-time nature is more important than consistency.
  • 6.
    NoSQL databases are “different”
  • 7.
    NoSQL database types Documentstore (MongoDB, CouchDB) A document-oriented database that stores, retrieves, and manages semi structured data including XML, YAML, JSON and binary (PDF, DOC) Key-value store (Cassandra, Redis) Stores scheme-less data referenced by a simple key value Graph database (Neo4j, FlockDB) Stores the relationship of data as a graph (social relations, network topologies)
  • 8.
    How to choose? Withall of the different NoSQL database types, how do you choose the “best” one?
  • 9.
    El Toro MásMacho MongoDB Stores structured data as JSON-like documents. Ad hoc queries, indexing, master-slave replication, sharding, server-side JavaScript execution All the “cool kids” are using it. Node.js + MongoDB = WINNING!
  • 10.
    Muy Guapo Couchbase JSON Document store Embedded CouchDB with caching, clustering and high-performance storage management components. JavaScript as its query language and HTTP for an API Serve HTML and JavaScript-based “CouchApps”
  • 11.
    El Matador Misterio Redis What exactly is redis? MAGIC! By definition, it’s an in-memory, key-value data store with optional durability. Data model includes list of string, sets of strings, sorted sets of strings & hashes. Awesome at doing set comparisons.
  • 12.
    Comando Loco Apache Hadoop Fast, reliable analysis of both structured data and complex data. Derived from Google's MapReduce and File System (GFS) papers. Yahoo is one of the main contributors. Reliable data storage using the Hadoop Distributed File System (HDFS) and high- performance parallel data processing using MapReduce.
  • 13.
    El Jefe Supremo Apache Cassandra Massively scalable key-value store initially developed by Facebook. BigTable data model (nested hashes) running on an Amazon Dynamo-like infrastructure. Has some RDBMS “feel” with column families that make it it a hybrid column/row store. No single point of failure, fault-tolerant multi data center replication, MapReduce support. CQL (Cassandra Query Language)
  • 14.
  • 15.
  • 16.
    ¡Hola DynamoDB Amazon DynamoDBis a fast, fully managed key-value database service that scales seamlessly with extremely low latency and predictable performance. Store and retrieve any amount of data Serve any level of request traffic Hands off administration Pay for throughput and not storage
  • 17.
    ¡No! administración No hardwareor software provisioning, setup and configuration, software patching, or partitioning data over multiple instances and regions. Specify the request throughput for your table and in the background, Amazon handles the provisioning of resources to meet the requested throughput rate. Automatically partitions/re-partitions data and provisions additional server capacity based upon table size & throughput. Synchronously replicates data across multiple facilities in an AWS Region giving you high availability and data durability.
  • 18.
    Muy rápido Consistent, predictableperformance Runs on a new solid state disk (SSD) architecture for low-latency response times. Read latencies average less than 5 milliseconds, and write latencies average less than 10 milliseconds.
  • 19.
    Muy Escalable No tablesize limits (adiós SimpleDB?) No downtime when scaling up or down Unlimited storage Automatically scale machine resources in response to increases in database traffic without the need of client-side partitioning.
  • 20.
    Modelo de datosflexible Flexible data model with familiar tables, items and key-value pairs. Schema-less document storage. Each item can have different attributes. Easy to create and modify documents. Simple API. No cross-table joins. Use composite keys to model relationships.
  • 21.
    Duradero Consistent, disk-only writes Atomicincrement/decrement (w/single API call) Optimistic concurrency control (aka conditional writes & updates) Item level transactions (even in bulk) Automatic and synchronous replication across data centers and availability zones.
  • 22.
    Costos? Pay for throughputand not storage. Priced per hour of provisioned read/write throughput Scales up and down well with a free tier
  • 23.
    Write throughput Write throughput Unit= size of item x writes/second $0.01 per hour for 10 write units
  • 24.
    Read throughput Strongly consistentreads (mucho dinero) Eventually consistent reads See Amazon’s site for read throughput pricing!
  • 25.
    Other features Integrates withAmazon Elastic MapReduce and Hadoop. Libraries, mappers and mocks for Django, Erlang, Java, .NET, Node.js, Perl, PHP, Python & Ruby. Session based authentication using Amazon Security Token Service Monitoring via CloudWatch
  • 26.
    DynamoDB Semantics Tables, item& attributes Items are indexed by primary key (single hash and composite keys) Items are a collection of attributes and attributes have a key and value. Unlimited number of attributes up to 64k total.
  • 27.
    Simple API calls CreateTable PutItem UpdateTable GetItem DeleteTable UpdateItem DescribeTable DeleteItem ListTables Query BatchGetItem Scan BatchWriteItem
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
    Flickr on DynamoDB Wcheung (Canada) submitted a Grails application that caches Flickr photos in Amazon DynamoDB. You can then search for cached feed entries by primary key (author + published date/time range) or by table scan. You can also “like” a photo, resulting in the atomic “like” counter for the item in DynamoDB getting incremented. http://screencast.com/t/MAVgm7xeqDpr
  • 36.
    Posterity Mbleigh (US) submitteda simple, barebones Twitter-esque service created in Ruby using Sinatra. It is far from complete but uses a number of DynamoDB's key features including Hash/Range Keys and Atomic Set Push Operations. http://www.screencast.com/t/me8hW27MYs3x
  • 37.
    DynamoDB Task Manager Darthdeus(Czech Republic) wrote his app in Ruby using Sinatra. It uses a custom ORM he wrote called DynamoRecord to access DynamoDB. His main idea was to get at least some of the ActiveRecord-ish API to DynamoDB using some basic metaprogramming http://www.youtube.com/watch?v=9tOzaDPP39I
  • 38.
    Simple Sur vey Peakpado (US) created an application using Ruby on Rails. For each table he created a sophisticated hask/range key model class which resulted in an API very similar to ActiveRecord for DynamoDB. http://screencast.com/t/ri1XkMxGcpnS
  • 39.
    Data Sets forMumbai Romin (India) developed an API that exposes data sets of Mumbai city in JSON format. The solution uses Amazon DynamoDB for storing the data and a NodeJS application that exposes the REST interface and talks to Amazon DynamoDB via a backend Java application.
  • 40.