¡Ay, caramba! Wrestle Your NoSQL Data with DynamoDB Je ff Dougl a s @je ff do n th em ic C lo udSp ok es C ommun it y Arch itec t
Rambling Talk RoadmapShort NoSQL overview (thanks Max @ 10gen!)Why NoSQL database are like Mexican WrestlersAmazon DynamoDB in depthAmazon DynamoDB demo and codeCloudSpokes challenge submissions for “Build an#Awesome Demo with Amazon DynamoDB”
Times they are a-changin’ Cloud applications and APIs need to be fast, flexible and scalable. RDBMS typically do not scale well for certain data-intensive application. NoSQL is cloud friendly.“NoSQL is a rebellion against the DBAs who prevent us from doing shit.” - James Governor, Gluecon 2012
Why is NoSQL #awesome?Developed to manage large volumes of data thatdo not necessarily follow a fixed schemaGreat for heavy read/write workloadsSimple to setup, configure and administerDistributed, fault tolerant architectureScale out not upSpecialized database for the right task
Key NoSQL differencesDo not use SQL as a query languageDynamic & schema-lessNon-relational, no JOIN operationsNo complex transactionsMay not give full ACID guarantees; eventuallyconsistent instead. Performance and real-timenature is more important than consistency.
NoSQL database typesDocument store (MongoDB, CouchDB) A document-oriented database that stores, retrieves, and manages semi structured data including XML, YAML, JSON and binary (PDF, DOC)Key-value store (Cassandra, Redis) Stores scheme-less data referenced by a simple key valueGraph database (Neo4j, FlockDB) Stores the relationship of data as a graph (social relations, network topologies)
How to choose?With all of the different NoSQL database types, how do you choose the “best” one?
El Matador Misterio Redis What exactly is redis? MAGIC! By definition, it’s an in-memory, key-value data store with optional durability. Data model includes list of string, sets of strings, sorted sets of strings & hashes. Awesome at doing set comparisons.
Comando Loco Apache Hadoop Fast, reliable analysis of both structured data and complex data. Derived from Googles MapReduce and File System (GFS) papers. Yahoo is one of the main contributors. Reliable data storage using the Hadoop Distributed File System (HDFS) and high- performance parallel data processing using MapReduce.
El Jefe Supremo Apache Cassandra Massively scalable key-value store initially developed by Facebook. BigTable data model (nested hashes) running on an Amazon Dynamo-like infrastructure. Has some RDBMS “feel” with column families that make it it a hybrid column/row store. No single point of failure, fault-tolerant multi data center replication, MapReduce support. CQL (Cassandra Query Language)
¡Hola DynamoDBAmazon DynamoDB is a fast, fully managed key-valuedatabase service that scales seamlessly with extremelylow latency and predictable performance. Store and retrieve any amount of data Serve any level of request traffic Hands off administration Pay for throughput and not storage
¡No! administraciónNo hardware or software provisioning, setup andconfiguration, software patching, or partitioning data overmultiple instances and regions.Specify the request throughput for your table and in thebackground, Amazon handles the provisioning of resources tomeet the requested throughput rate.Automatically partitions/re-partitions data and provisionsadditional server capacity based upon table size & throughput.Synchronously replicates data across multiple facilities in anAWS Region giving you high availability and data durability.
Muy rápidoConsistent, predictable performanceRuns on a new solid state disk (SSD) architecturefor low-latency response times.Read latencies average less than 5 milliseconds,and write latencies average less than 10milliseconds.
Muy EscalableNo table size limits (adiós SimpleDB?)No downtime when scaling up or downUnlimited storageAutomatically scale machine resources inresponse to increases in database traffic withoutthe need of client-side partitioning.
Modelo de datos flexibleFlexible data model with familiar tables, itemsand key-value pairs.Schema-less document storage. Each item canhave different attributes.Easy to create and modify documents. SimpleAPI.No cross-table joins. Use composite keys tomodel relationships.
DuraderoConsistent, disk-only writesAtomic increment/decrement (w/single API call)Optimistic concurrency control (aka conditionalwrites & updates)Item level transactions (even in bulk)Automatic and synchronous replication acrossdata centers and availability zones.
Costos?Pay for throughput and not storage.Priced per hour of provisioned read/writethroughputScales up and down well with a free tier
Write throughputWrite throughputUnit = size of item x writes/second$0.01 per hour for 10 write units
Read throughputStrongly consistent reads (mucho dinero)Eventually consistent reads See Amazon’s site for read throughput pricing!
Other featuresIntegrates with Amazon Elastic MapReduce andHadoop.Libraries, mappers and mocks for Django,Erlang, Java, .NET, Node.js, Perl, PHP, Python &Ruby.Session based authentication using AmazonSecurity Token ServiceMonitoring via CloudWatch
DynamoDB SemanticsTables, item & attributesItems are indexed by primary key (single hashand composite keys)Items are a collection of attributes and attributeshave a key and value.Unlimited number of attributes up to 64k total.
Flickr on DynamoDB Wcheung (Canada) submitted a Grails application that caches Flickr photos inAmazon DynamoDB. You can then search for cached feed entries by primary key (author + published date/time range) or by table scan. You can also “like” a photo, resulting in the atomic “like” counter for the item in DynamoDB getting incremented. http://screencast.com/t/MAVgm7xeqDpr
PosterityMbleigh (US) submitted a simple, barebones Twitter-esque service created inRuby using Sinatra. It is far from complete but uses a number of DynamoDBs key features including Hash/Range Keys and Atomic Set Push Operations. http://www.screencast.com/t/me8hW27MYs3x
DynamoDB Task ManagerDarthdeus (Czech Republic) wrote his app in Ruby using Sinatra. It uses a customORM he wrote called DynamoRecord to access DynamoDB. His main idea was to get at least some of the ActiveRecord-ish API to DynamoDB using some basic metaprogramming http://www.youtube.com/watch?v=9tOzaDPP39I
Simple Sur vey Peakpado (US) created an application using Ruby on Rails. For each table hecreated a sophisticated hask/range key model class which resulted in an API very similar to ActiveRecord for DynamoDB. http://screencast.com/t/ri1XkMxGcpnS
Data Sets for Mumbai Romin (India) developed an API that exposes data sets of Mumbai city in JSONformat. The solution uses Amazon DynamoDB for storing the data and a NodeJSapplication that exposes the REST interface and talks to Amazon DynamoDB via a backend Java application.