Code camp2012


Published on

This is the presentation that I gave at Silicon Valley Code Camp 2012. The deck covers various aspects of bigdata and NoSQL solutions available to handle this.

Published in: Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Code camp2012

  1. 1. Big Data and NoSQL Landscape Sanjeev Mishra Silicon Valley Code Camp 2012 Sanjeev Mishra SVCC 2012
  2. 2. Timeline • 1970s – Genesis of modern db • Modeling the world based on relational calculus: best for managing uniform data • 1980s • RDBMS takes over the world • 1990s – 2000+ • Invention of HTML • Spread of Web based technologies Sanjeev Mishra SVCC 2012
  3. 3. Need for Modern Data Storage • Amazon • Managing: Shopping carts, Seller Lists, Customer Preferences, Sales Rank, Recommendations • Google • Storing and managing web scale data • Facebook • Managing social graphs • LinkedIn, Twitter and others Sanjeev Mishra SVCC 2012
  4. 4. Data Explosion Current • Every two days now we create as much information as we did from the dawn of civilization up until 2003 - about 5 exabytes (1K PB) of data: Eric Schmidt * Sanjeev Mishra SVCC 2012
  5. 5. Data Explosion Future • A telescope planned to be finished in 2024 will generate more data in a single day than the entire Internet.* Sanjeev Mishra SVCC 2012
  6. 6. What is Big Data? • Terabytes(TB) is not big data, petabytes (PB) (1000 TB) may be. • Current definition of big data: zettabytes (1M PB or 1G TB) Sanjeev Mishra SVCC 2012
  7. 7. Nature of Big DataWeb 2.0 kind of data • Different from traditional RDBMS/Warehouse data – more reads less updates • User Generated Content – Tweets, Reviews, Comments etc… • Lots of updates and lots of reads • Scale to millions of users • Not necessarily Transactional • Compromised consistency Sanjeev Mishra SVCC 2012
  8. 8. Data Explosion, So What? • Structural issues • The dynamic nature of data • Performance issues • Insertion • Search • Scaling Horizontally • Dozens or hundreds of machines to operate as single server Sanjeev Mishra SVCC 2012
  9. 9. What is NoSQL?Not Only SQL or Not Relational • Carlo Strozzi used it in 1998 and then Eric Evans in 2009 • Simple call level interface (SQL not supported) • Flexible schema • Efficient use of distributed indexes • Horizontally scaling of operations over many server • No ACID but BASE (Basically Available, Soft state*, Eventually consistent**) Sanjeev Mishra SVCC 2012
  10. 10. CAP Theorem (Brewer’s Theorem)* A distributed system can satisfy any two of following three guarantees at any time o Consistency (all nodes see the same data at the same time) o Availability (a guarantee that every request receives a response about whether it was successful or failed) o Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) Sanjeev Mishra SVCC 2012
  11. 11. Eventual Consistency Flavors • Causal consistency o changes are notified through events, the receiving session will always see the updated value. • Read your own writes o a session that updates the db will immediately see the changes. • Monotonic consistency* o once a session reads a value will never see an earlier value. Sanjeev Mishra SVCC 2012
  12. 12. Consistency TradeoffsWhere, o N is # of copies of each data that db maintains o R is # of copies that is read for each read o W is # of copies that must be written for each write• Most NoSQL use N>W>1: More than one write must complete but not all nodes need to update immediately. Sanjeev Mishra SVCC 2012
  13. 13. Column Vs Row Storage Sanjeev Mishra SVCC 2012
  14. 14. Row vs. Column Oriented DBId First name Last name SSN DOB1 John Doe 111-222-3333 8/12/19682 Jane Doe 111-332-3408 4/3/1972Row oriented Column oriented 1 1 John 2 Doe John 111-222-3333 Jane 8/12/1968 Doe 2 Doe Jane 111-222-3333 Doe 111-332-3408 111-332-3408 8/12/1968 4/3/1972 4/3/1972 Sanjeev Mishra SVCC 2012
  15. 15. Contrasting Operations on Row vs Col DBInsert a new tupleRow oriented Column oriented 1 1 2 John 3 Doe John 111-22-3333 8/12/1968 Jane Foo 2 Doe Jane Doe Doe 111-32-3408 Bar 4/3/1972 111-22-3333 3 111-32-3408 Foo 237-23-3924 Bar 8/12/1968 237-23-3924 4/3/1972 2/3/1978 2/3/1978 Sanjeev Mishra SVCC 2012
  16. 16. Row vs. Column Oriented DBCreate a new attributeRow oriented Column oriented 1 1 John 2 Doe John 111-22-3333 Jane 8/12/1968 Doe 408-555-1212 Doe 2 111-22-3333 Jane 111-32-3408 Doe 8/12/1968 111-32-3408 4/3/1972 4/3/1972 408-555-1212 650-555-2323 650-555-2323 Sanjeev Mishra SVCC 2012
  17. 17. Row vs. Column Oriented DBGet all who were born in a given yearRow oriented Column oriented Easy, just pick all rows where year Not so simple, scan the years and of DOB matches the given year remember the indexes of all occurrences that match given year and extract based on these indexesGet sum of all years Little difficult, data does not live Easy, the data is found consecutively so scanning through consecutively entire dataset needed Sanjeev Mishra SVCC 2012
  18. 18. Glossary• Consistent Hashing (Cassandra, Dynamo) o the output range of a hash function is treated as a fixed circular space or “ring” (i.e. the• largest hash value wraps around to the smallest hash value) Vector Clock (Cassandra, Riak, Dynamo) o an algorithm for generating a partial ordering of events in a distributed system and• detecting causality violations• Quorum (Cassandra, Dynamo (sloppy)) Merkle Tree (Cassandra, Riak, Dynamo) o a hash tree where leaves are hashes of the values of individual keys. Parent nodes higher in the tree are hashes of their respective children. The principal advantage of Merkle tree is that each branch of the tree can be checked independently without requiring nodes to• download the entire data set Anti-Entropy Gossip Protocol (Cassandra, Dynamo) o comparing all the replicas of each piece of data that exist and updating each replica to the• newest version Order preserving partitioning (Cassandra, MongoDB) Sanjeev Mishra SVCC 2012
  19. 19. Glossary• MVCC o• multi version concurrency control Atomicity o• all or nothing Consistency o• each transaction leaves the db in valid state Isolation o• concurrent execution of txn results into a state that is obtained if txn were executed serially Durability o committed txn remain so even in the event of power loss, crashes or errors• WAL o Write ahead logging – changes are written to a log before they are applied (Durability)• Eventually consistent o sufficiently long quiet period all updates can be expected to propagate eventually through the system and all replicas will be consistent Sanjeev Mishra SVCC 2012
  20. 20. Glossary • Sharding o horizontal partitioning of data, storing records on different servers according to some key • Tuple o row in RDBMS, predefined schema. • Document o contains nested document or lists as well as scalar values. No predefined schema. • Extensible Record o hybrid between Tuple and Document, families of attributes defined in a schema but attributes can be added on a per record basis. • Key-value Stores o stores values indexed by a user defined key. • Document Stores o indexed document store • Extensible Record Stores aka Wide Column Stores o Stores extensible records partitioned vertically and horizontally across nodes. Sanjeev Mishra SVCC 2012
  21. 21. NoSQL Categories • Key-value Stores o Stores values indexed by a user defined key. • Document Stores o Indexed document store • Extensible Record Stores (Column Stores) o Stores extensible records partitioned vertically and horizontally across nodes. • Graph Databases Sanjeev Mishra SVCC 2012
  22. 22. Key-Value Stores Sanjeev Mishra SVCC 2012
  23. 23. Key-Value Stores • A distributed cache/Hashtable o Inspired by Amazon Dynamo o like memcached with o persistence, replication, versioning, locking, transactions, sorting etc. o get/put and lookups o No secondary indices or keys o Values are BLOBs or in some cases JSON document o Scalability through key distribution over nodes Sanjeev Mishra SVCC 2012
  24. 24. Key-Value Stores • Riak (Erlang/Basho/Apache) • Membase (C+Erlang/Couchbase/Apache) • Project Voldemort (Java/LinkedIn/Apache) • Redis (C/VMWare/BSD) • Scalaris (Erlang/Zuse+onScale/Apache) • Tokyo Cabinet (C/Fal Labs/LGPL) • Dynamo (Java/For Amazon internal use) There are others Key Value / Tuple Store at Sanjeev Mishra SVCC 2012
  25. 25. Amazon Dynamo• KV Store Developed by Amazon to support o Best Seller Lists o Shopping carts o Customer Preferences o Session Management o Sales Rank o Product Catalog etc...• Variation of Consistent Hashing based Data Partitioning and Replication• Dynamic add/delete of Storage Nodes• Each service uses distinct instance of Dynamo Sanjeev Mishra SVCC 2012
  26. 26. Amazon Dynamo Cont...• Key/Value are opaque byte[]. ID= 128- bit MD5 hash of the Key• “always writeable” where no updates are rejected due to failures or concurrent writes• Simple Read/Write - get/put - operation on data uniquely identified by a key, value is binary object (BLOB) o get(key): single or a list (conflicts with context) o put(key,context,object)• Eventual consistency with no isolation guarantees Sanjeev Mishra SVCC 2012
  27. 27. RIAK• Developed in Erlang by Basho• Clients:Python, Javascript, Java, PHP, Erlang• Dynamo inspired Open-Source o Advanced K/V and o Document Store (not a full featured document store)• Replication and sharding by primary key hash o Consistent Hashing o De-Centralized (No-Master node)• Eventually consistent o Tunable number of replicas for read and write o Tunable per-read and per-write o Different parts of application can choose different trade offs Sanjeev Mishra SVCC 2012
  28. 28. Project Voldemort• Java based advanced Key/Value store• Developed at LinkedIn• Open source, Apache license• Supports MVCC for updates• Replicas are updated asynchronously - up-to- date view guaranteed if majority of replicas read• Uses optimistic locking for consistent multi- record updates• Versions are ordered based on Vector clocks• More info: Sanjeev Mishra SVCC 2012
  29. 29. Document Stores Sanjeev Mishra SVCC 2012
  30. 30. Document Stores • Data more complex than that in K/V stores • Data encapsulated and encoded in o JSON, XML, YAML, BSON or some other standard format • Multiple types of documents per database o Documents of similar type grouped together o Optional metadata/schema for the document o Less rigid schema than that of RDBMS • Nested documents or collection • Secondary indexes • Complex query/update support o Multiple attributes, collections etc Sanjeev Mishra SVCC 2012
  31. 31. Document Example { "when": "2011-09-19T02:10:11.3Z", "author": "alex", "title": "No Free Lunch", "text": "This is the text of the post. It could be very long.", "tags": [ "business", "ramblings“ ], "votes": 5, "voters": ["jane“, "joe", "spencer", "phyllis", "li”], "comments": [ { "who": "jane", "when": "2011-09-19T04:00:10.112Z", "comment": "I agree." }, { "who": "meghan", "when": "2011-09-20T14:36:06.958Z", "comment": "You must be joking. etc etc ..." } ] } Sanjeev Mishra SVCC 2012
  32. 32. Document Stores • MongoDB (C/10Gen/AGPL) • Apache CouchDB (Erlang/Apache) • Amazon SimpleDB (Erlang/Amazon) • Terrastore (Java/Terracota/Apache) • RavenDB (C#/HibernatingRhino/AGPL) There are others Document Store at Sanjeev Mishra SVCC 2012
  33. 33. MongoDB Sanjeev Mishra SVCC 2012
  34. 34. MongoDBhuMongous • Document format: BSON (Binary JSON) • Supports nested documents • Documents are grouped in Collections • Supports secondary indexes • Scalability – auto sharding • Consistency – Tunable based on request (WriteConcerns) • Replication – replica set – master – slave • Atomicity – document level Sanjeev Mishra SVCC 2012
  35. 35. MongoDB Data Type SQL MongoDBString Integer create table users db.createCollections(“users”) (name varchar(128), age number)Boolea Double insert into users values („bob‟,32‟) db.users.insertNull Array ({name:”bob”, age:32})Object ObjectId select * from user db.users.find()Binary RegexCode select name, age from users db.users.find ({}, {name:1, age:1,_id:0}) select name, age from users where age db.users.find =32 ({age:32}, {name:1, age:1}) SQL MongoDB select * from user db.users.find().sort({name:1})Database Database order by name ascTable Collection select * from user db.users.find().skip(20).limit(10) limit 10 offset 20Index Index select distinct name from user db.users.distinct(“name”)Row DocumentColumn Field select count(*) from user db.users.count()Join Embedding or update users set age =39 where name = db.users.update({name:”bob”}, Linking „bob‟ {$set:{age:33}}, false, true)Primary _id delete from users where name=„bob‟ db.users.remove({name:”bob”})Key Sanjeev Mishra SVCC 2012
  36. 36. Extensible Record Stores aka Column Stores Sanjeev Mishra SVCC 2012
  37. 37. Extensible Record StoresColumn Stores • Motivated by Google BigTable • Basic Data Model – Rows and Columns • Scale by splitting rows and columns over multiple nodes o Rows split by sharding on primary key – split by range rather than hash function o Columns split by column groups Sanjeev Mishra SVCC 2012
  38. 38. Extensible Record Stores • Cassandra (Java/Facebook/Apache) • Marriage of Dynamo and BigTable • HBase (Java/Yahoo/Apache) • Inspired by BigTable, used HDFS for storage • HyperTable (C/Zvent/GPL) • Similar to HBase/BigTable • Accumulo (Java/NSA/Apache) • Uses Hadoop, ZooKeeper, and Thrift, cell level access control • Google BigTable (Internal to Google) There are others Wide Column Store at Sanjeev Mishra SVCC 2012
  39. 39. Cassandra Sanjeev Mishra SVCC 2012
  40. 40. Cassandra Features • Decentralized o Data is distributed across cluster of nodes o No master, any node can address any request o No single point of failure • Fault-tolerant (Configurable replication strategies) o Simple Strategy (first determined by partitioner, rest on other nodes clockwise) o Network Topology Strategy: multi datacenter strategy Sanjeev Mishra SVCC 2012
  41. 41. Cassandra Features Cont… • Failure detection and recovery o Based on Gossip protocol o Node state updated based on gossip message version o Per-node heartbeat threshold • Tunable consistency o Can be configured per read/write Sanjeev Mishra SVCC 2012
  42. 42. Cassandra Data Type SQL Cassandra QLascii int create database codecamp CREATE KEYSPACE codecamp WITH strategy_class =float decimal „NetworkTopologyStrategy‟ ANDboolean bigint strategy_options:DC1=3double varchar create table users CREATE COLUMNFAMILY users (key (key varchar(128), name varchar PRIMARY KEY, namecounter timestamp varchar(128), age number) varchar, age int)uuid text create index idx_name ON CREATE INDEX idx_name ONblob varint users(name) users(name) insert into users values („bob‟, „Bob‟,32‟) INSERT INTO users (KEY, name, age) SQL Cassandra VALUES(„jdoe‟,‟Jane Doe‟, 39)Database Keyspace select name, age from users SELECT name, age FROM usersTable Column Family where age>30 WHERE age>30Index Index update users set age = 35 UPDATE users SET age=35 where name = „bob‟ WHERE name=„bob‟Row Row delete from users where DELETE FROM users where KEY =Column Column key=„bob‟ „bob‟ DELETE age FROM users whereJoin KEY=„alice‟Primary Key Primary Key drop table users DROP COLUMNFAMILY users drop database codecamp DROP KEYSPACE codecamp Sanjeev Mishra SVCC 2012
  43. 43. CassandraColumn and Column Family Column Super Columnname:byte[] Name: byte[] Value: Collection of Columnsvalue:byte[]timestamp Super Column name: homeaddress Columnname:”userid” value:value:”jdoe” name: “street” name: ”city” name: “zip” value: “555 Homestead Rd” value:“Sunnyvale” value: “95051”Timestamp: timestamp:… timestamp:… timestamp:… Row Row Column Column Column Key name: “userid” name: “name” name: “age” jdoe value: “jdoe” value: “Jane Doe” value: 33 Column timestamp:… timestamp:…= timestamp:… Family name: “userid” name: “name” name: “age” ladams value: “ladams” value: “Larry Adam” value: 47 timestamp:… timestamp:…= timestamp:… name: “userid” name: “name” name: “age” bdole value: “bdole” value: “Bob Dole” value: 67 timestamp:… timestamp:…= timestamp:… Sanjeev Mishra SVCC 2012
  44. 44. Cassandra KeyspaceAnalogous to database in RDBMS • Contains one or more Column Families analogous to tables in RDBMS • Column Family contains columns • A Row Key identifies a set of related columns • A Row is not required to have same set of columns • No join between two column families: o Each column family is self contained to serve a query o A rule of thumb - one column family per query for better performance • Replication is controlled on per-keyspace basis Sanjeev Mishra SVCC 2012
  45. 45. Cassendra In Enterprise • Netflix, Twitter, Urban Airship, Constant Contact, Reddit, Cisco, OpenX, Rackspace, Ooyala, and many more • The largest Cassandra cluster has over 300 TB of data in over 400 machines Sanjeev Mishra SVCC 2012
  46. 46. HBase• Design influenced by Google BigTable• A type of NoSQL – more a data store than data base, lacks many RDBMS features such as • Typed column, secondary indexes, triggers, advanced query language etc.• Build on top of HDFS: Data is stored in HDFS as indexed “StoreFiles”• Strongly consistent R/W not “eventually consistent” – suitable for counter aggregation• Auto Sharding• Auto Region Server Failover• Out of the box support for Hadoop/HDFS• Can be used as Source and/or Sink for MapReduce• Java, Thrift/REST client• Support Block Cache and Bloom Filters for high volume query optimization• Web management tool and JMX support Sanjeev Mishra SVCC 2012
  47. 47. Sanjeev Mishra SVCC 2012
  48. 48. NoSQL Growth Trends Sanjeev Mishra SVCC 2012
  49. 49. Big Data and NoSQL Landscape Sanjeev Mishra SVCC 2012