no central point of organization no committee or standardizing body no plan/strategy/illuminati to take down the RDBMS; lots of "in-fighting"
central tenant - there IS NO one-size-fits-all unlike RDBMS assumptions, each engineering effort must be evaluated for data needs
is it &#x201C;anti-RDBMS&#x201D;?
not so much
will not magically solve all your data or performance problems applications won&#x2019;t magically stop crashing, data corruption, etc. Big Data is still hard. These tools make it possible/affordable/approachable
data persistence comes down to garantees
why are we here?
"web scale" more users, content, connections more trends, insight, knowledge
Atomicity: fault-tolerance is moving to the application layer - smaller atomic units Consistency: yes! but not necessarily immediate - "availability" (latency, reads) is more important. Isolation: smaller atomic units (multi-step transaction vs. compare-and-swap), greater availability, denormalization => reduced dependency on isolation Durability: some things are more important that getting every last detail, i.e. latency of response, view in aggregate
Basically Available: is the data layer up or not? are we serving content to our users or not? Soft State: shifting burden of "correctness" up to application layer. availability is more important than precision. accuracy (correct) vs. precision (repeatable). Eventual Consistency: all operations are recorded and ordered. played back as resources permit.
agile dev moves too fast for schema and constraints - this isn&#x2019;t waterfall data models change quickly up-front schema modeling is akin to waterfall development - not always practical/feasible/possible data is messy - record what you have and leave constraints up to the application
at scale, data services look like a DHT anyway! isolated independent services introduced caching layers partitioned data by logical and range boundaries.
app servers/session self-contained - load-balanced data&#x2019;s in one spot - what do you do?
37-signals approach - DHH &#x201C;scaling is a good thing because scaling => users => $$$&#x201D;
more users, more instances. easy!
doesn&#x2019;t work for social applications: - users cannot interact - old MMO&#x2019;s vs. new social games
redesign data server as &#x201C;data services&#x201D; separate independent logical components
knowing each service by name becomes &#x201C;vexing&#x201D;
abstractions! wouldn&#x2019;t it be nice if...
Distributed Computing Made Easy Less Hard
programming model/API for parallel computing Google's MapReduce paper
replicated, high throughput, fairly UNIX-y (not POSIX). Google FS Paper
Distributed Group Services - coordination, synchronization, configuration, naming. Google Chubby Paper
efficient, cross-language messaging Facebook/Apache Thrift Google Protobufs
Addresses limitations of Raw M/R, HDFS access
request by key: vs. hdfs sequential reads
low-latency, ms response times vs. m/r high-latency
Nick Dimiduk - @xefyr
Founder, Drawn to Scale
April 28, 2010
what NoSQL is not
Computer Science & Engineering at Ohio State:
Artiﬁcial Intelligence, Programming Languages, Systems
Applied Technical Systems: Hierarchical, non-relational
data storage and analysis systems (no-sql before there was
NoSQL). Information Retrieval, Wire Serialization/RPC
(before there was Thrift/Avro), Data Visualization (GB's)
Visible Technologies: Social Media Storage, Processing,
Analytics. Monitoring, Engagement, Warehousing, and BI. (TB's)
Drawn to Scale: Big Data Storage, Processing, Retrieval,
Analytics (TB's, PB's)
what NoSQL is not