NoSQL Options Compared Different Horses for Different Courses
NoSQL Options Compared Different Horses for Different Courses The current world of NoSQL The NoSQL model NoSQL classes Which model should you use Best use (& use cases comparison) Choosing the Right Horse Lessons learned from actual use
The current world of NoSQL The current world of NoSQL RDBMS 105+ databases NoSQL 122+ databases Forecast NoSQL market expected to reach $3.4 Billion by 2018 NoSQL market revenue  $14 Billion over 2013 – 2018 RDBMS are great and ... will be fine
Recap: RDBMS are great SQL  ACID Well understood by developers Well supported by frameworks and tools Backups  Tuning Recovery support
RDBMS are great, but... Difficult to handle relational schema  Schema changes Difficult to scale writes Vertical scaling is limited Horizontal scaling is limited
Bridging NoSQL\SQL divide Interest in using NoSQL  technology has reemerged: Oracle released MySQL + NoSQL memcached plugin More: Neo4J announced JDBC (SQL) driver  Cassandra + CQL CouchBase + UnQL
NoSQL model promotes Schema-free approach Flexible data models No unneeded complexity Strict data consistency might be unnecessary Big data amount H igh throughput Over RDBMS expensive in performance  BASE (not ACID) Eventually consistent Simple API
NoSQL models 31% Key Value 10% Document 13% Graph 9% Column Family 6% XML 10% Object 21% Other
NoSQL models Column Oriented Store - each storage block contains data from only one column  Document Store - stores documents made up of tagged elements  Key Value Store - hash table of keys Graph Database - stores data in the nodes and relationships of a graph
NoSQL candidates to compare Key Value / Tuple Store Document Store Graph Database Wide Column Store / Column Families Database Redis open source  networked in-memory  key-value  optional durability written in ANSI C CouchBase open source high-performance map/reduce  schema-free document-oriented  written in C, C++, Erlang Neo4j open source embedded or server disk-based transactional d ata stored in graphs   written in Java Cassandra open source no single point of failure column family store tunable consistency  written in Java
Comparison General Information Redis CouchBase Neo4j Cassandra Language C C, C++, Erlang Java Java Commercial Support Third party companies Consulting & Support with Enterprise Neo4j Advanced, Neo4j Enterprise DataStax, Impetus, Acunu, Riptapo, Cubet Technologies Customers GitHub, Guardian Media Group Zynga, AOL, BBC Adobe, Cisco,  StudiVZ,  Deutsche Telekom, Fanbox Twitter, Digg, Reddit, Rackspace, Facebook Licenses New BSD Community & Enterprise Licenses GPL or AGPLv3 Apache License 2
Comparison Queries & Operations Redis CouchBase Neo4j Cassandra Client Libraries Bindings + REST\HTTP non-vBucket or v-Bucket Bindings + REST\HTTP Thrift based C + + n/a + C++ n/a + n/a + C# + + + + Java + + + + Perl + + + + PHP + + n/a + Python + + + + Ruby + + + + Querying - - - - Secondary Indexes n/a + - + Map/Reduce n/a n/a n/a Hadoop supported ACID transactions n/a + + +/- (tunable consistency)
Best Use Redis CouchBase Neo4j Cassandra Real-time systems where low latency is critical (games)  Syncing online and offline data (allows synchronization and sharing of  data and applications across multiple platforms and mobile devices) Cloud/network management  Managing large streams of non-transactional data: apache logs, application logs, etc High  performance caching tier for web sites and other applications Social and online gaming Social, geospatial data Consistent, fast response times under writes (high volume writes) Server for backed sessions or transient data Data management layer for recommendation engine Bioinformatics Real-time analytics & statistics Service offering some real-time statistics Highly available solution
Which model should you use? Column Oriented Store Document Store Key Value Store Graph Database More specific:  which NoSQL database?
Answer This depends on your case! Compare your problems to others Evaluate  characteristics  of NoSQL storage: Maturity Connectivity/Querying/Operations
More aspects to consider  How big  is your data Massive read/write throughput Fast key-value access No single point of failure   Tunable Brewer's CAP trade-offs: Consistency Availability Partition Tolerance Maintainability, Administration
Lessons learned from actual use Start small, but significant Is not “one size fits all”, but “horses for courses” Consider a Hybrid Approach (NoSQL + RDBMS)
Lessons learned from actual use Hybrid Approach NoSQL RDBMS Business Facade Two Databases: NoSQL + RDBMS Key Value Storage for Session Data + RDBMS  for User Data Column Storage for Reporting Data  + RDBMS for User Data
Q&A Thank you for attention mail:  [email_address] skype: siarhei_bushyk

NoSQL Options Compared

  • 1.
    NoSQL Options ComparedDifferent Horses for Different Courses
  • 2.
    NoSQL Options ComparedDifferent Horses for Different Courses The current world of NoSQL The NoSQL model NoSQL classes Which model should you use Best use (& use cases comparison) Choosing the Right Horse Lessons learned from actual use
  • 3.
    The current worldof NoSQL The current world of NoSQL RDBMS 105+ databases NoSQL 122+ databases Forecast NoSQL market expected to reach $3.4 Billion by 2018 NoSQL market revenue $14 Billion over 2013 – 2018 RDBMS are great and ... will be fine
  • 4.
    Recap: RDBMS aregreat SQL ACID Well understood by developers Well supported by frameworks and tools Backups Tuning Recovery support
  • 5.
    RDBMS are great,but... Difficult to handle relational schema Schema changes Difficult to scale writes Vertical scaling is limited Horizontal scaling is limited
  • 6.
    Bridging NoSQL\SQL divideInterest in using NoSQL technology has reemerged: Oracle released MySQL + NoSQL memcached plugin More: Neo4J announced JDBC (SQL) driver Cassandra + CQL CouchBase + UnQL
  • 7.
    NoSQL model promotesSchema-free approach Flexible data models No unneeded complexity Strict data consistency might be unnecessary Big data amount H igh throughput Over RDBMS expensive in performance BASE (not ACID) Eventually consistent Simple API
  • 8.
    NoSQL models 31%Key Value 10% Document 13% Graph 9% Column Family 6% XML 10% Object 21% Other
  • 9.
    NoSQL models ColumnOriented Store - each storage block contains data from only one column Document Store - stores documents made up of tagged elements Key Value Store - hash table of keys Graph Database - stores data in the nodes and relationships of a graph
  • 10.
    NoSQL candidates tocompare Key Value / Tuple Store Document Store Graph Database Wide Column Store / Column Families Database Redis open source networked in-memory key-value optional durability written in ANSI C CouchBase open source high-performance map/reduce schema-free document-oriented written in C, C++, Erlang Neo4j open source embedded or server disk-based transactional d ata stored in graphs written in Java Cassandra open source no single point of failure column family store tunable consistency written in Java
  • 11.
    Comparison General InformationRedis CouchBase Neo4j Cassandra Language C C, C++, Erlang Java Java Commercial Support Third party companies Consulting & Support with Enterprise Neo4j Advanced, Neo4j Enterprise DataStax, Impetus, Acunu, Riptapo, Cubet Technologies Customers GitHub, Guardian Media Group Zynga, AOL, BBC Adobe, Cisco, StudiVZ, Deutsche Telekom, Fanbox Twitter, Digg, Reddit, Rackspace, Facebook Licenses New BSD Community & Enterprise Licenses GPL or AGPLv3 Apache License 2
  • 12.
    Comparison Queries &Operations Redis CouchBase Neo4j Cassandra Client Libraries Bindings + REST\HTTP non-vBucket or v-Bucket Bindings + REST\HTTP Thrift based C + + n/a + C++ n/a + n/a + C# + + + + Java + + + + Perl + + + + PHP + + n/a + Python + + + + Ruby + + + + Querying - - - - Secondary Indexes n/a + - + Map/Reduce n/a n/a n/a Hadoop supported ACID transactions n/a + + +/- (tunable consistency)
  • 13.
    Best Use RedisCouchBase Neo4j Cassandra Real-time systems where low latency is critical (games) Syncing online and offline data (allows synchronization and sharing of data and applications across multiple platforms and mobile devices) Cloud/network management Managing large streams of non-transactional data: apache logs, application logs, etc High performance caching tier for web sites and other applications Social and online gaming Social, geospatial data Consistent, fast response times under writes (high volume writes) Server for backed sessions or transient data Data management layer for recommendation engine Bioinformatics Real-time analytics & statistics Service offering some real-time statistics Highly available solution
  • 14.
    Which model shouldyou use? Column Oriented Store Document Store Key Value Store Graph Database More specific: which NoSQL database?
  • 15.
    Answer This dependson your case! Compare your problems to others Evaluate characteristics of NoSQL storage: Maturity Connectivity/Querying/Operations
  • 16.
    More aspects toconsider How big is your data Massive read/write throughput Fast key-value access No single point of failure Tunable Brewer's CAP trade-offs: Consistency Availability Partition Tolerance Maintainability, Administration
  • 17.
    Lessons learned fromactual use Start small, but significant Is not “one size fits all”, but “horses for courses” Consider a Hybrid Approach (NoSQL + RDBMS)
  • 18.
    Lessons learned fromactual use Hybrid Approach NoSQL RDBMS Business Facade Two Databases: NoSQL + RDBMS Key Value Storage for Session Data + RDBMS for User Data Column Storage for Reporting Data + RDBMS for User Data
  • 19.
    Q&A Thank youfor attention mail: [email_address] skype: siarhei_bushyk

Editor's Notes

  • #3 It's a well known truth that we should choose the right tool for the job. Everyone says that. Who can disagree? The problem is this is not helpful assertion without being able to answer more specific questions like: what jobs are the tools good at? What NoSQL database should I choose out of many available options? Here is Table of Contents of our today's workshop which aim's is to to help you to get an answer for this question. 1) First off, I will provide some statistics and interesting numbers from the current world of NoSQL. 2) Next, you will know some info on NoSQL initiative and specifically about classes of NoSQL databases. 3) Then I will tell you the differences between existing models & classes of NoSQL. We will stop more on one specific database out of each class. 4) Next five minutes I will briefly tell about differences of these databases and outline best use cases for each. 5) When speaking about how ultimately choose the right tool I will recap some recommendations and good approaches. 6) NoSQL clients - few real world examples and stories (from Renat). 7) Finally lessons that were learnt from the actual usage of NoSQL.
  • #4 The worldwide NoSQL market is expected to reach $3.4 billion by 2018 and NoSQL market will generate $14 Billion in revenues over the period 2013 – 2018. RDBMS are great and the forecast that they will be fine. Why?
  • #5 SQL - rich, declarative query language ACID - database enforces referential integrity ACID semantics
  • #7 Oracle officially released memcached daemon plugin that talks with InnoDB and NoSQL+MySQL has become an official solution. More changes bridging NoSQL\\SQL divide: Neo4j recently announced that JDBC interface was created which forwards database queries to Neo4j and allows common applications to access the NoSQL database without modification. Cassandra + CQL (structured query language) Couchbase Server 2.0 comes along with a NoSQL query language called UnQL Interest in using key-value pair (KVP) technology has reemerged to the point where the traditional RDMS vendors evaluate strategy of developing in-house NoSQL solutions and integrating them in current product offers. It will not take long before we’ll see acquisitions driven by emerging NoSQL technology. Oracle officially released memcached daemon plugin that talks with InnoDB and NoSQL+MySQL has become an official solution. More changes bridging NoSQL\\SQL divide: Neo4j recently announced that JDBC interface was created which forwards database queries to Neo4j and allows common applications to access the NoSQL database without modification. Cassandra + CQL (structured query language) Couchbase Server 2.0 comes along with a NoSQL query language called UnQL By the way, the same state was with the database market in the 1970s before SQL was invented (a lot of APIs and no single standard)
  • #8 NoSQL initiative promotes a loosely defined class of non-relational data stores that break with a ACID paradigm and relational databases. NoSQL data management systems are inherently: - Schema-free (no unneeded complexity; flexible data models; variety of features and strict data consistency of RDBMS might be unnecessary; - Huge data amount & high throughput over slow, expensive in terms of performance relational databases in favor of more efficient and cheaper ways of managing data; dealing with big data and web scale; - Eventually consistent / BASE (not ACID) -basically available, soft state, eventual consistency; - Simple API
  • #9 Core NoSQL systems can be divided in these main classes: Key-Value Stores (Riak, Redis, MemcacheDB) Wide Column Store / Column Families (Cassandra, Hbase, Amazon SimpleDB) Document Stores (MongoDB, CouchDB, Jackrabbit) Graph Databases (Neo4J, InfiniteGraph) XML Databases (Berkeley DB XML, eXist) - typically communication is performed by means of HTTP/REST, WebDAV, SOAP, XML-RPC and xml-oriented query method: XQuery, Xpointer, Xpath Object Databases (Objectivity, db4o) – one of the main goals is to provide an easy and native interface to persistence for object oriented programming languages. Other (unresolved and uncategorized)
  • #11 Redis is an open-source, networked, in-memory, key-value data store with optional durability. It is written in ANSI C. The development of Redis is sponsored by VMware. CouchBase is open source, schema-free document database, which provides JavaScript-based map/reduce-indexing to query and analyze data; peer-based replication, geoCouch for creating location-aware applications, binary packages for Red Hat and Ubuntu Linux, Windows, and Mac OS X. It combines CouchDB, Membase, and Memcached. Neo4j is open source, either embedded or standalone server with REST API, disk-based, fully transactional Java persistence engine. It stores data with multiple relationships, multiple connections in graphs rather than in tables. Cassandra is an open source distributed database management system designed to handle large amounts of data spread. It provides a highly available service with no single point of failure. It is a NoSQL solution that was initially developed by Facebook and powered their Inbox Search feature until late 2010.
  • #12 Commercial Support: Redis - CouchBase – depending on whether it community edition license or enterprise license Neo4j - Cassandra – third companies provide commercial support and commercial distributions of Cassandra. Customers & some notable users: Redis – Online hosting service GitHub, British Guardian Media Group CouchBase - organizations including Zynga, AOL, the BBC and thousands of others power their interactive web applications with Couchbase Neo4j – Adobe, Cisco, StudiVZ (the largest social network in Europe), Fanbox (social networking website)
  • #13 Client Libraries (Accessing your data should be easy): CouchBase non-vBucket ("Classic" Memcached clients) or vBucket-aware Type 2 Membase clients (vBucket is defined as the "owner" of a subset of the key space of a membase cluster. Every key "belongs" to a vBucket. A mapping function is used to calculate the vbucket in which a given key belongs). Cassandra's client API is built entirely on top of Thrift for different programming languages including Python, Java, .NET, Ruby, PHP, Perl, C++ Map\\Reduce (Generally available parallel computing might be impotant, ): Cassandra – enables certain Hadoop functionality against Cassandra's data. ACID transactions: Redis is not a "durable" datastore, in the sense of the "D" in ACID. CouchBase – support ACID transaction semantics Neo4j – supports ACID transactions with the default isolation level is read committed, locks are acquired at the Node and Relationship level, deadlock detection is built into the core transaction management.
  • #14 Redis: Service offering some-realtime statistics. A good example of this - an application built on Redis, a tool for debugging HTTP requests built in 48 hours by Leah Culver and Chris Wanstrath. It's called Hurl. Transient data. Any transient data used by your application is also a good fit for Redis CouchBase: Sync mobile data to cloud data - Not all iPhones, iPads or iPod Touch devices are online all the time, or even within range of Internet connectivity. But the devices, and software, must be useful whether online or offline. Social and online gaming – CouchBase can be a good option for data management layer in the social and online gaming, where predictable latency, responsiveness and automated data caching are required. Data management layer for recommendation engine – recommendation engine targeting ads and offers. Targeting algorithms and approaches can change and often require changes in input data. With schema-free data it's no need to define a database schema before inserting data. Neo4j: Social, geospatial data – neo4j allows queries to find target nodes or shortest paths. It allows indexing on node/relationship properties.
  • #16 - Maturity Some databases are not as proven Incomplete NoSQL solutions You write a larger data management tier You maintain your business code and infrastructure code You have to customize management and deployment technology and procedures - Connectivity/querying APIs for .NET, Java, Perl, Python, etc. Some solutions have no querying When available query languages differ Lack of general ad-hoc querying – “no” SQL
  • #17 A distributed system can support only two of the following characteristics: Consistency (all nodes see the same data at the same time), Availability (every operation must terminate in an intended response), Partition tolerance (Operations will complete, even if individual components are unavailable)
  • #18 Start small, but significant – meaning that you should focus on the problem you try to solve with NoSQL