SQL or NoSQL, that is the question!
Upcoming SlideShare
Loading in...5
×
 

SQL or NoSQL, that is the question!

on

  • 3,496 views

Origins of NoSQL movement, description of approaches and different trade-offs new databases make

Origins of NoSQL movement, description of approaches and different trade-offs new databases make

Statistics

Views

Total Views
3,496
Views on SlideShare
3,460
Embed Views
36

Actions

Likes
8
Downloads
96
Comments
1

7 Embeds 36

http://a0.twimg.com 13
http://www.linkedin.com 8
http://www.lecturehub.lan 4
http://www.plurk.com 4
https://www.linkedin.com 4
http://tweetedtimes.com 2
https://twitter.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • hihi. for people who interesting about nosql in taiwan. please kindly join us @ http://fb.nosql.org.tw/
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Kaj sploh je Silicijeva Dolina? Zakaj se to sploh sprašujemo? Mislijo politiki dobesedno? Povedal bom o pozitivnih straneh. V bistvu sem hotel povedati drugo zgodbo

SQL or NoSQL, that is the question! SQL or NoSQL, that is the question! Presentation Transcript

  • SQL or NoSQL, that is the question! October 2011 Andraž Tori, CTO at Zemanta @andraz andraz@zemanta.com
  • Answering
    • - Why NoSQL?
    • - What is NoSQL?
    • - How does it work?
  • SQL is awesome!
    • - Structured Query Language
    • - ACID
      • Atomicity, Consistency, Isolation, Durability
    • - Predictable
    • - Schema
    • - Based on rational algebra
    • - Standardized
  • No, really, it's awesome!
    • - Hardened
    • - Free and commercial choices
      • - MySQL, PostgreSQL, Oracle, DB2, MS SQL...
    • - Commercial support
    • - Tooling
    • - Everyone knows it
    • - It's mature!
  •  
  • So this is the end, right?
  • Why the heck would someone not want SQL?
  • Why not to use SQL?
    • - Clueless self-thought programmers who use text files
    • - NIH - Not Invented Here syndrome. And I want to design my own CPU!
    • - Because it's hard!
    • - I can't afford it
    • - “This app was first ported from Clipper to DBase”
  • Some other perspectives...
  • Let's say ...
  • You are a big tech company, located on west coast of USA
  •  
  • You are...
    • - big international web company based in San Francisco
    • - 5 data centers around the world
    • - Petabytes of data behind the service
    • - A day of downtown costs you at least millions
    • - And it's not question of when, but if
  • You want to
    • - keep the service up no matter what
    • - have it fast
    • - deal with humongous amounts of data
    • - enable your engineers to make great stuff
  • You are...
  • Some interesting constraints
    • Amazon claim that just an extra one tenth of a second on their response times will cost them 1% in sales.
  • So...
    • - Some pretty big and important problems
    • - And brightest engineers in the world
    • - Who loooove to build stuff
    • - Sooner or later even Oracle RAC cluster is not enough
  • Numbers everybody should know! Jeff Dean at famous Stanford talk L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes w/ cheap algorithm 3,000 ns Send 2K bytes over 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns
  • Facebook circa 2009
    • - from 200GB (March 2008) to 4 TB of compressed new data added per day
    • - 135TB of compressed data scanned per day
    • - 7500+ Database jobs on production cluster per day
    • - 80K compute hours per day
    • - And that's just for data warehousing/analysis
      • - plus thousands of MySQL machines acting as Key/Value stores
  • Big Data
    • - Internet generates huge amounts of data
    • - First encountered by big guys AltaVista, Google, Amazon …
    • - Need to be handled
    • - Classical storage solutions just don't fit/behave/scale anymore
  • So smart guys create solutions to these internal challenges
  • And then?
    • - Papers:
    • The Google File System (Google, 2003)
    • MapReduce: Simplified Data Processing on Large Clusters (Google, 2004)
    • Bigtable: A Distributed Storage System for Structured Data (Google, 2006)
      • Amazon Dynamo (Amazon, 2007)
    • - Projects (all open source):
      • Hadoop (coming out of Nutch, Yahoo, 2008)
      • Memcached (LiveJournal, 2003)
      • Voldemort (Linkedin, 2008)
      • Hive (Facebook, 2008)
      • Cassandra (Facebook, 2008)
      • MongoDB (2007)
      • Redis, Tokyo Cabinet , CouchDB, Riak...
  • Four papers to rule them all
    • Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “ The Google File System ”, 19th ACM Symposium on Operating Systems Principles, Lake George, NY, October, 2003.
    • Jeffrey Dean and Sanjay Ghemawat, “ MapReduce: Simplified Data Processing on Large Clusters ”, OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004.
    • Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, “ Bigtable: A Distributed Storage System for Structured Data ”, OSDI'06: Seventh Symposium on Operating System Design and Implementation, Seattle, WA, November, 2006.
    • Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall and Werner Vogels, “ Dynamo: Amazon's Highly Available Key-Value Store ”, in the Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.
  • Solving problems of big guys?
  • Total Sites Across All Domains August 1995 - October 2011, NetCraft
  • Yesterday's problem of biggest guys Is today's problem of garden variety startup
  •  
  • And so we end up with Cambrian explosion
  •  
  •  
  • These solutions don't have much in common, Except...
  • They definitely aren't SQL
  • Not Only SQL
  • So what are these beasts?
  • That's a hard question...
    • - There is no standard
    • - This is a new technology
      • - new research
      • - survival of the fittest
      • - experimenting
    • - They obviously fulfill some new needs
      • - but we don't yet know which are real and which superficial
    • - Most are extremely use-case specific
  • Example use-cases
    • - Shopping cart on Amazon
    • - PageRank calculation at Google
    • - Streams stuff at Twitter
    • - Extreme K/V store at bit.ly
    • - Analytics at Facebook
  • At the core, it's a different set of trade-offs and operational constraints
  • Trade-offs and operational constraints
    • - Consistent?
      • Eventually consistent?
    • - Highly available?
      • Distributed across continents?
    • - Fault tolerant?
      • Partition tolerant?
      • Tolerant to consumer grade hardware?
    • - Distributed?
      • Across 10, 100, 1000, 10000 machines?
  • More possibilities
    • - All in memory? (disk is the new tape)
    • - Batch processing?
      • - tolerant to node failures?
    • - Graph oriented?
    • - No transactions?
      • Programmer deals with inconsistencies?
    • - No schemas?
    • - BASE? (Basically Available, Soft state, Eventually Consistent)
    • - Horizontal scaling, with no downtime?
    • - Self healing?
  • A consistent topic: CAP Theorem
  • CAP theorem (Eric Brewer, 2000, Symposium on Principles of Distributed Computing)
    • - CAP = Consistency, Availability, Partition tolerance
    • - Pick any two!
    • - Distributed systems have to sacrifice something to be fast
    • - Usually you drop:
      • - consistency – all clients see the same data
      • - availability – the service returns something
    • - Sometimes can even tune the trade-offs!
  • "There is no free lunch with distributed data” – HP
  • Eventual Consistency
    • - Different clients can read the data and write it, no locking or maybe partitioned nodes
    • - What we know is that given enough time data is synchronized to the same state across all replicas
  • But this is horrible!
  • … you already are eventually consistent! :) If your database stores how many vases you have in your shop...
  • Eventual consistency
    • - Conflict resolution:
      • - Read time
      • - Write time
      • - Asynchronous
    • - Possibilities:
      • - client timestamps
      • - vector clocks, when writing say what your original data version was
    • - Conflict resolution can be server or client based
  • There are different kinds of consistencies
    • - Read-your-writes consistency
    • - Monotonic write / monotonic read consistency
    • - Session consistency
    • - Casual consistency
  • There's not even a proper taxonomy of features different NoSQL solutions offer
  • And this presentation is too short to present whole breadth of possibilities
  • Usual taxonomy of NoSQL
    • Usual taxonomy:
    • - Key/Value stores
    • - Column stores
    • - Document stores
    • - Graph stores
  • Other attributes
    • - In-memory / on-disk
    • - Latency / throughput (batch processing)
    • - Consistency / Availability
  • Key/Value stores
    • - a.k.a. Distributed hashtables!
    • - Amazon Dynamo
    • - Redis, Voldemort, Cassandra, Tokyo Cabinet, Riak
  • Document databases
    • - Similar to Key/Value, but value is a document
    • - JSON or something similar, flexible schema
    • - CouchDB, MongoDB, SimpleDB...
    • - May support indexing or not
    • - Usually support more complex queries
  • Column stores
    • - one key, multiple attributes
    • - hybrid row/column
    • - BigTable, Hbase, Cassandra, Hypertable
  • Graph Databases
    • - Neo4J, Maestro OpenLink, InfiniteGraph, HyperGraphDB, AllegroGraph
    • - Whole semantic web shebang!
  • To make the situation even more confusing...
    • - Fast pace of development
    • - In-memory stores gain on-disk support overnight
    • - Indexing capabilities are added
  • Two examples
    • - Cassandra
    • - Hadoop
      • - Hive
      • - Mahout
  •  
  • Cassandra
    • - BigTable + Dynamo
    • - P2P, horizontally scalable
    • - No SPOF
    • - Eventually consistent
    • - Tunable tradeoffs between consistency and availability
      • - number of replicas, writes, reads
  • Cassandra – writes
    • - No reads
    • - No seeks
    • - Log oriented writes
    • - Fast, atomic inside ColumnFamily
    • - Always available for writing
  • Cassandra
    • - Billions of rows
    • - Mysql:
      • ~ 300ms write
      • ~ 350ms read
    • - Cassandra:
      • ~ 0.12ms write
      • ~ 15ms read
  • Not enough time to go into data model...
  • Cassandra
    • In production at: Facebook, Digg, Rackspace, Reddit, Cloudkick, Twitter
    • - largest production cluster over 150TB and over 150 machines
    • Other stuff:
      • - pluggable partitioner (Random/OrderPerserving)
      • - rack aware, datacenter aware
  • Experiences?
    • - Works pretty good at Zemanta
      • - user preferences store
      • - extending to new use-cases
    • - Digg had some problems
    • - Don't necessary use it as primary store
    • - Not very easy to back-up, situation is improving
  • Cassandra - queries
    • - Column by key
    • - Slices (of columns/supercolumns)
    • - Range queries (when using OrderPerservingPartitioner to be efficient)
  •  
  • Hadoop
    • - GFS + MapReduce
    • - Fault tolerant
    • - (massively) distributed
    • - massive datasets
    • - batch-processing (non real-time responses)
    • - Written in Java
    • - A whole ecosystem
  • Hadoop: Why? (Owen O’Malley, Yahoo Inc!, omalley@apache.org)
    • • Need to process 100TB datasets with multi-day jobs
    • • On 1 node:
      • – scanning @ 50MB/s = 23 days
      • – MTBF = 3 years
    • • On 1000 node cluster:
      • – scanning @ 50MB/s = 33 min
      • – MTBF = 1 day
    • • Need framework for distribution
      • – Efficient, reliable, easy to use
  • Hadoop @ Facebook
    • - Use Hadoop to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning.
    • - Currently 2 major clusters:
      • A 1100-machine cluster with 8800 cores and about 12 PB raw storage.
      • A 300-machine cluster with 2400 cores and about 3 PB raw storage.
      • Each (commodity) node has 8 cores and 12 TB of storage.
    • - Heavy users of both streaming as well as the Java apis. They built a higher level data warehousing framework using these features called Hive (see the http://hadoop.apache.org/hive/).
  • But also at smaller startups
    • - Zemanta: 2 to 4 node cluster, 7TB
      • - log processing
    • - Hulu 13 nodes
      • - log storage and analysis
    • - GumGum 9 nodes
      • - image and advertising analytics
    • - Universities: Cornell – Generating web graphs (100 nodes)
    • - It's almost everywhere
  • Hadoop Architecture - HDFS
    • - HDFS provides a single distributed filesystem
    • - Managed by a NameNode (SPOF)
    • - Append-only filesystem
      • - distributed by blocks (for example 64MB)
    • - It's like one big RAID over all the machines
      • - tunable replication
    • - Rack aware, datacenter aware
    • - It just works, really!
  •  
  •  
  • Hadoop Architecture - MapReduce
    • - Based on an old concept from Lisp
    • - Generally it's not just map-reduce, it's:
      • Map -> shuffle (sort) -> merge-> reduce
    • - Jobs can be partitioned
    • - Jobs can be run and be restarted independently (parallelization, fault tolerance)
    • - Aware of data-locality of HDFS
    • - Speculative execution (toward the end, of tasks machines that stall)
  • Infamous word counting example
    • - “One and one is two and one is three”
    • - Two mappers: “One and one is”, “two and one is three”
    • - Pretty “stupid” mappers, just output word and “1”
    Otuput Mapper1: One 1 And 1 One 1 Is 1 Output Mapper2: Two 1 And 1 One 1 Is 1 Three 1 And 1 And 1 Is 1 Is 1 One 1 One 1 One 1 Two 1 Three 1 And 2 Is 2 One 3 Two 1 Three 1 Sorter Reducer
  • Important to know
    • - Mappers can output more than one output per input (or none)
    • - Bucketing for reducers happens immediately after mapping output
    • - Every reducer gets all input records for certain “key”
    • - All parts are highly pluggable – readers, mapping, sorting, reducing … it's java
  • Hadoop
    • - You can write your jobs in Java
    • - You get used to thinking inside the constraints
    • - You can use “Hadoop Streaming” to write jobs in any language
    • - It's great not to have to think about the machines, but you can “peep” if you want to see how your job is doing.
  • Now, this is a bit wonky, right?
    • - Word counting is a really bad example
    • - However it's like “Hello world”, so get used to it
    • - When you get to real problems it gets much more logical
  • Benchmarks, 2009
    • This doesn't help me much, but...
    Bytes Nodes Maps Reduces Replication Time 500000000000 1406 8000 2600 1 59 seconds 1000000000000 1460 8000 2700 1 62 seconds 100000000000000 3452 190000 10000 2 173 minutes 1000000000000000 3658 80000 20000 2 975 minutes
  • Hive
  • Hive
    • - A system built on top of Hive that mimics SQL
    • - Hive Query Language
    • - Built at Facebook, since writing MapReduce jobs in Java is tedious basic tasks
    • - Every operation is one or multiple full index scans
    • - Bunch of heuristics, query optimization
  • Hive – Why we love it at Zemanta
    • - Don't need to transform your data on “load time”
    • - Just copy your files to HDFS (preferably compressed and chunked)
    • - Write your own deserializer (50 lines in Java)
    • - And use your file as a table
    • - Plus custom User Defined Functions
  •  
  • Mahout
    • - Bunch of algorithms implemented
      • Collaborative Filtering
      • User and Item based recommenders
      • K-Means, Fuzzy K-Means clustering
      • Mean Shift clustering
      • Dirichlet process clustering
      • Latent Dirichlet Allocation
      • Singular value decomposition
      • Parallel Frequent Pattern mining
      • Complementary Naive Bayes classifier
      • Random forest decision tree based classifier
      • High performance java collections (previously colt collections)
      • A vibrant community
      • and many more cool stuff to come by this summer thanks to Google summer of code
  • General notes
  • Some observations
    • - Non-fixed schemas are a blessing when you have to adapt constantly
      • - that doesn't mean you should not have documentation and be thoughtful!
    • - Denormalization is the way to scale
      • - sorry guys
    • - Clients get to manage things more precisely, but also have to manage things more precisely
  • Some internals, “fun” tricks
    • - Bloom filter: Is data on this node?
      • Maybe / Definitely not
      • Maybe -> let's go to disk to check out
    • - Vector clocks
    • - Consistent hashing
  • Consistent hashing
    • - key -> hash -> “coordinator node”
    • - depending on replication the key is then stored in sequential N nodes
    • - When new node gets added to the ring replication is relatively easy
  • And if you don't take anything else from this presentation...
  •  
  •  
  •  
  • But there's more to it
  • This is the edge today
    • - Tons of interesting research waiting to be made
    • - Ability to leverage these solutions to process terabytes of data cheaply
    • - Ability to seize new opportunities
    • - Innovation is the only thing keeping you/us ahead
    • - Are you preparing yourself for tomorrow's technologies? Tomorrow's research?
  • Images
    • http://www.flickr.com/photos/60861613@N00/3526232773/sizes/m/in/photostream/
    • http://www.zazzle.com/sql_awesome_me_tshirt-235011737217980907
    • http://geekandpoke.typepad.com/geekandpoke/2011/01/nosql.html
    • http://hadoop.apache.org/common/docs/current/hdfs_design.html
    • http://www.flickr.com/photos/unitednationsdevelopmentprogramme/4273890959/