I’ve outgrown my basic
stack. Now what?
Thoughts and feelings about growing
with Django and NoSQL
Our common stack is built on:
Super awesome and fast (once you learn what knobs to turn).
Lots of cool features and tools: pg_tune, pg_top, pg_bouncer
Django is a high-level Python Web framework that encourages
rapid development and clean, pragmatic design.
In-memory key-value store for small chunks of data.
Super simple and awesome
People are signing up and using your
But wait, can Postgres handle this?
Can have lots of app code sometimes.
What about when you outgrow your shard key
“Don’t shard until you have to”
- every single talk I’ve seen
“master to multiple slaves” replication
Nearly all of them are based on 2 papers
Built on CAP theorem
The theorem began as a conjecture made by University of California, Berkeley computer
scientist Eric Brewer at the 2000 Symposium on Principles of Distributed Computing.
In 2002, Seth Gilbert and Nancy Lynch of MIT published a formal proof of Brewer's
conjecture, rendering it a theorem.
Advanced key-value store
Think of it as super memcached. With union math.
All data must fit in ram
It is often referred to as a data structure server
Keys can contain strings, hashes, lists, sets and sorted sets
Not a db solution but more of a helper.
This is now a part of our basic stack for most apps.
JSON-like documents with dynamic schemas
Load-balancing MongoDB scales horizontally using sharding
MongoDB uses a readers-writer lock that allows concurrent reads access to a database but gives exclusive access to a single write
operation.However, when a write lock exists, a single write operation holds the lock exclusively, and no other read or write operations may share
Global write lock*
Uncompressed field names
Safe off by default
Just google “mongo problems” or “moving off mongodb”
Uses pre-defined column family format
Used by all people with with ‘big data’ problems
Amazing workhorse for data
You need a sizable cluster
Cluster setup can be difficult
I need to personally spend more time with this
JSON to store data,
Views: embedded map/reduce
BigCouch, couchbase, Membase?
Kind of in a dev rut...
...but just pushed a new huge upgrade
Based hardcore on Amazon's Dynamo paper
Key Value store
Super good about failure, “no downtime”
Map Reduce / Secondary Indexes
Built-in full text search
2 types of mapreduce
Erlang - super fast
Key value + row-oriented = column family
Linear scalability and fault-tolerance on commodity
hardware or cloud infrastructure
Built by Facebook for Messages
Has CQL3 - think SQL, kind of
Baked auto cluster AMI
Super fast writes
Compresses data that’s not accessed a lot
Can tie in to Hadoop for big map reduce
Things to think about:
Is eventual consistency ok for you?
Do you know your queries you need right now?
Is your data complicated or simple?
How fast does it grow?
How long do you want that data to hang around?
Really think about trade offs.
Every system has its good and bad
There is no “winner”, so stop searching “which is best”
Think about which fits your use case
Tons of links out there, just make sure they are relatively new