Rise
Arnd Kleinbeck
of
NoSQL
TheRise
Arnd Kleinbeck - September 2013
1
2
History
3
1980
1990
2000
2010
Rise of
RDBMS
4
RDBMS
Persistence
Integration
SQL
ACID
Transactions
Tooling
5
Order: 4711
Customer: Max
Payment: Credit Card
Line items:
405 235 001
540 987 326
6
7
Impedance
Mismatch
8
1980
1990
2000
2010
Rise of
RDBMS
Rise of
OODBMS
9
1980
1990
2000
2010
Rise of
RDBMS
Rise of
OODBMS
RDBMS
Dominance
10
A New Era
11
600.000.000
tweets per day
12
1.100.000.000
active users per month
13
Not only size matters...
DataVolumes grow exponentially
Data gets more connected
Semi-Structured/ Unstructured Data
14
Lots of
Traffic
15
16
17
SCALING
UP SCALING
OUT
18
BigTable
Dynamo
19
1980
1990
2000
2010
Rise of
RDBMS
Rise of
OODBMS
RDBMS
Dominance
Rise of
NoSQL
20
Definition
21
„Not only SQL“
22
Characteristics
non relational
schemaless
open source
cluster
friendly
21st Century
Web
no
joins
23
Differences
data model
APIs
consistency
data
distribution
persistence
24
Data Models
25
26
Document
Column
Family
Graph Key-Value
27
Key Value
28
Key-Value
153245
153246
153247
......
29
http://www.oredev.org/videos/nosql--the-new-generation-of-agile-databases
Key-Value
30
KeyValue Store Characteristics
Most simple data model
DB does not care about data types
Similar to persistent hash map
Fas...
Open Source Advanced KeyValue Store
In-Memory Store with optional durability
Knows types like strings, hashes, lists, sets...
Open Source KeyValue Store
Highly available and fault-tolerant
Basho Technologies
Apache License
Implemented in Erlang
API...
Open Source KeyValue Store
Big, distributed, persistent, fault-tolerant hash table
Developed by LinkedIn
Implemented in Ja...
Document
35
{
	
  	
  	
  	
  	
  	
  "id":	
  "993174208"
	
  	
  	
  	
  	
  	
  "tex":	
  "texture	
  wood	
  pile"
	
  	
  	
  	
 ...
Document Store Characteristics
You can query into document structure
You can use natural aggregates as documents
You can r...
Open Source Document Store
„Most popular NoSQL database“
Stores JSON like documents
Implemented in C++
GNU AGPL License
AP...
Open Source Document Store
Ease of Use
No update locks
Stores JSON like documents
Implemented in Erlang
Apache License
API...
Open Source Distributed Document Store
Optimized for interactive applications
Merged from Membase and CouchDB
Implemented ...
Schemaless
41
Schemaless
Schemaless is one of the main reasons of interest
in NoSQL databases
Schemaless reduces ceremony
Schemaless inc...
Schemaless means
implicit schema
To query specific attributes
you have to know their names
Schema Managment is shifted from...
Column
Family
44
Column-Family
http://www.oredev.org/videos/nosql--the-new-generation-of-agile-databases
45
more complicated data model
rich structure
single key (row key)
easy/ fast access to columns/column families in a row
rows...
Open Source Wide Column Store
Supports multi data center replication
Good for distributed DBs with massive write loads
Imp...
Open Source Column Oriented Database
Part of Hadoop, Inspired by Googles BigTable
Implemented in Java
Apache License 2.0
A...
Graph
49
Graph
http://www.neo4j.org/learn/graphdatabase
50
51
Graph DBs disassemble things in fragments and
relations
You can do very interesting queries on graph
structures - things y...
Open Source Graph Database
Embedded, disk-based, fully transactional
Implemented in Java
GPLv3 and AGPLv3 / commercial
API...
Open Source Document Database
with Graph oriented extensions
Supports SQL (without join) as query language
Supports ACID t...
Scaling out
55
Replication
Master
Slave 1 Slave 2 Slave 3
write
read
56
Sharding
Shard 1 Shard 2 Shard 3
Router
writeread
57
Hashing Problems
common way of choosing a server:
server = hash(key) mod n
Every object
gets hashed to
a new location!
Wha...
Consistent Hashing
Use same hash function for both objects and servers
shards:A, B, C
objects: 1, 2, 3, 4
http://www.tom-e...
CAP Theorem
C
A
P
Availability
Partition
Tolerance
Consistency
http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote...
BASE (vs.ACID)
Basical Availability
Soft State
Eventual Consistency
http://www.allthingsdistributed.com/2008/12/eventually...
Wrap Up
62
RDBMS will not die
Use a relational database
unless you have good reason not to
63
RDBMS have their limits
Vertical scaling is expensive and has hard limits
Horizontal scaling is not possible/ limited
Join...
NoSQL come to the rescue
Distribution and scalability are fundamental
design goals of NoSQL DBs
Tradeoff between Consisten...
There are cons too
Broad spectrum of products is difficult to
understand
You have to get used to designing models for
Key/V...
RDBMS vs. NoSQL
think about data think about queries
redundancy is bad redundancy is ok
indexes managed by DB manage own i...
Size
Complexity
Key
Value
Column
Family
Document
Graph
RDBMS
68
What‘s next?
69
Polyglot Persistence
NoSQL will break the relational dominance unlike the
OODBMSs in the 80ies
RDBMS is not the one and on...
NewSQL
The answer of traditional RDBMS vendors to the great
success of NoSQL
Improved RDBMS offer more features and better...
Links
72
Amazon Dynamo Paper
http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/
decandia07dynamo.pdf
Google Big Table Paper
...
Thx!
Arnd Kleinbeck
Senior Software Architect
Business Division Applications
@akleinbe
74
Upcoming SlideShare
Loading in …5
×

The Rise of NoSQL

1,190 views
1,020 views

Published on

After a brief introduction into the history of Database Management Systems different types of NoSQL data stores are characterized. Theoretical background information about sharding mechanisms, horizontal scaling and the CAP theorem are getting explained.
After a comparison of different NoSQL stores you will get to know the pros and cons of the different approaches and you will learn how to take the decision for the best fitting database in your project.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,190
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
67
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

The Rise of NoSQL

  1. 1. Rise Arnd Kleinbeck of NoSQL TheRise Arnd Kleinbeck - September 2013 1
  2. 2. 2
  3. 3. History 3
  4. 4. 1980 1990 2000 2010 Rise of RDBMS 4
  5. 5. RDBMS Persistence Integration SQL ACID Transactions Tooling 5
  6. 6. Order: 4711 Customer: Max Payment: Credit Card Line items: 405 235 001 540 987 326 6
  7. 7. 7
  8. 8. Impedance Mismatch 8
  9. 9. 1980 1990 2000 2010 Rise of RDBMS Rise of OODBMS 9
  10. 10. 1980 1990 2000 2010 Rise of RDBMS Rise of OODBMS RDBMS Dominance 10
  11. 11. A New Era 11
  12. 12. 600.000.000 tweets per day 12
  13. 13. 1.100.000.000 active users per month 13
  14. 14. Not only size matters... DataVolumes grow exponentially Data gets more connected Semi-Structured/ Unstructured Data 14
  15. 15. Lots of Traffic 15
  16. 16. 16
  17. 17. 17
  18. 18. SCALING UP SCALING OUT 18
  19. 19. BigTable Dynamo 19
  20. 20. 1980 1990 2000 2010 Rise of RDBMS Rise of OODBMS RDBMS Dominance Rise of NoSQL 20
  21. 21. Definition 21
  22. 22. „Not only SQL“ 22
  23. 23. Characteristics non relational schemaless open source cluster friendly 21st Century Web no joins 23
  24. 24. Differences data model APIs consistency data distribution persistence 24
  25. 25. Data Models 25
  26. 26. 26
  27. 27. Document Column Family Graph Key-Value 27
  28. 28. Key Value 28
  29. 29. Key-Value 153245 153246 153247 ...... 29
  30. 30. http://www.oredev.org/videos/nosql--the-new-generation-of-agile-databases Key-Value 30
  31. 31. KeyValue Store Characteristics Most simple data model DB does not care about data types Similar to persistent hash map Fast lookups Easy to distribute Inspired by Amazon Dynamo paper Restricted possibilities of querying 31
  32. 32. Open Source Advanced KeyValue Store In-Memory Store with optional durability Knows types like strings, hashes, lists, sets BSD License Implemented in C Very small footprint (20k LOC for rel. 2.2) APIs for C/C++, C#, Closure, Lisp, Erlang, Go, Haskell, Java, JavaScript, Objective-C, Perl, PHP, Python, Ruby, ... Used at Twitter, Instagram, flickr, stackoverflow, ... 32
  33. 33. Open Source KeyValue Store Highly available and fault-tolerant Basho Technologies Apache License Implemented in Erlang APIs for Java, Erlang, Ruby, Php, Python, Closure, C#, C/C++, HTTP, Node.js, Perl, Scala, Smalltalk, ... Used at Mozilla, Comcast,AOL 33
  34. 34. Open Source KeyValue Store Big, distributed, persistent, fault-tolerant hash table Developed by LinkedIn Implemented in Java Apache 2.0 License Dynamo Scale Out Used at LinkedIn 34
  35. 35. Document 35
  36. 36. {            "id":  "993174208"            "tex":  "texture  wood  pile"            "in_reply_to_screen_name":  "akleinbe",              "in_reply_to_status_id_str":  null,              "id_str":  "54691802283900928",              "entities":  {                        "user_mentions":  [                                    {                                                "indices":  [                                                            3,                                                              19                                                ],                                                  "screen_name":  "PostGradProblem",                                                  "id_str":  "271572434",                                                  "name":  "PostGradProblems",                                                  "id":  271572434                                    }                        ],                          "urls":  [  ],                          "hashtags":  [  ]            } }   {            "id":  "596229751"            "customer_id":  "RT  @PostGradProblem:  In  preparation  for  the  NFL   lockout,  I  will  be  spending  twice  as  much  time  analyzing  my  fantasy   baseball  team  during  ...",              "truncated":  true,              "in_reply_to_user_id":  null,              "in_reply_to_status_id":  null,              "favorited":  false,              "source":  "<a  href="http://twitter.com/"  rel="nofollow">Twitter   for  iPhone</a>",              "in_reply_to_screen_name":  null,              "in_reply_to_status_id_str":  null,              "id_str":  "54691802283900928",              "entities":  {                        "user_mentions":  [                                    {                                                "indices":  [                                                            3,                                                              19                                                ],                                                  "screen_name":  "PostGradProblem",                                                  "id_str":  "271572434",                                                  "name":  "PostGradProblems",                                                  "id":  271572434                                    }                        ],                          "urls":  [  ],                          "hashtags":  [  ]            } }   {    "id":  "3452094105"    "user":  {          "notifications":  null,            "profile_use_background_image":  true,            "statuses_count":  31,            "profile_background_color":  "C0DEED",            "followers_count":  3066,            "profile_image_url":  "http://a2.twimg.com/profile_images/1285770264/ PGP_normal.jpg",            "listed_count":  6,            "profile_background_image_url":  "http://a3.twimg.com/a/1301071706/ images/themes/theme1/bg.png",            "description":  "",            "screen_name":  "PostGradProblem",            "default_profile":  true,            "verified":  false,            "time_zone":  null,            "profile_text_color":  "333333",            "is_translator":  false,            "profile_sidebar_fill_color":  "DDEEF6",            "location":  ""  } } Document 36
  37. 37. Document Store Characteristics You can query into document structure You can use natural aggregates as documents You can retrieve portions of a document You can update portions of a document You can have links between documents Compared to key value data model the document is more transparent No schema / implicit schema Some queries are a pain in the neck! 37
  38. 38. Open Source Document Store „Most popular NoSQL database“ Stores JSON like documents Implemented in C++ GNU AGPL License APIs for C/C++, C#, Go, Erlang, Java, JavaScript, Node.js, Perl, PHP, Python, Ruby, Scala, HTTP/REST Used at Craigslist, eBay, Foursquare, SourceForge, NYT, ... 38
  39. 39. Open Source Document Store Ease of Use No update locks Stores JSON like documents Implemented in Erlang Apache License APIs for JavaScript, MapReduce, HTTP/REST Used at BBC, Credit Suisse, Meebo, ... 39
  40. 40. Open Source Distributed Document Store Optimized for interactive applications Merged from Membase and CouchDB Implemented in C++, Erlang, C Apache License / Proprietary APIs for Java, .NET, PHP, Ruby, Python, C Used at AOL, Cisco, LinkedIn, Salesforce.com, Zynga, ... 40
  41. 41. Schemaless 41
  42. 42. Schemaless Schemaless is one of the main reasons of interest in NoSQL databases Schemaless reduces ceremony Schemaless increases flexibility BUT... 42
  43. 43. Schemaless means implicit schema To query specific attributes you have to know their names Schema Managment is shifted from db to code http://martinfowler.com/articles/schemaless/ 43
  44. 44. Column Family 44
  45. 45. Column-Family http://www.oredev.org/videos/nosql--the-new-generation-of-agile-databases 45
  46. 46. more complicated data model rich structure single key (row key) easy/ fast access to columns/column families in a row rows can contain 100s or 1000s of columns aggregate oriented Column Family Characteristics 46
  47. 47. Open Source Wide Column Store Supports multi data center replication Good for distributed DBs with massive write loads Implemented in Java Apache License 2.0 APIs for C#, C++, Clojure, Erlang, Go, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby, Scala Used at CERN, Facebook, Netflix, Rackspace, SoundCloud,Twitter ... 47
  48. 48. Open Source Column Oriented Database Part of Hadoop, Inspired by Googles BigTable Implemented in Java Apache License 2.0 APIs for Restful HTTP,Thrift, C/C++, C#, Groovy, Java, PHP, Python, Scala Used at Amazon,Adobe,AOL, Cloudspace, eBay, Facebook, IBM, Last.fm, LinkedIn, Spotify,Yahoo!, ... 48
  49. 49. Graph 49
  50. 50. Graph http://www.neo4j.org/learn/graphdatabase 50
  51. 51. 51
  52. 52. Graph DBs disassemble things in fragments and relations You can do very interesting queries on graph structures - things you can not event think of in SQL Good for complex graph structured data Fast lookups, fast traversing Whiteboard Friendly Graph DB Characteristics 52
  53. 53. Open Source Graph Database Embedded, disk-based, fully transactional Implemented in Java GPLv3 and AGPLv3 / commercial APIs for .NET, Clojure, Go, Groovy, Java, JavaScript, Perl, PHP, Pyhton, Ruby, Scala Used at Adobe, Cisco,Telekom... 53
  54. 54. Open Source Document Database with Graph oriented extensions Supports SQL (without join) as query language Supports ACID transactions Implemented in Java Apache License 2.0 Commercial support available APIs for HTTP/REST, Java, JavaScript, Scala, PHP, Ruby, .NET, Clojure, Node.js, Python, ... Used at SKY, Spielo, UltraDNS... 54
  55. 55. Scaling out 55
  56. 56. Replication Master Slave 1 Slave 2 Slave 3 write read 56
  57. 57. Sharding Shard 1 Shard 2 Shard 3 Router writeread 57
  58. 58. Hashing Problems common way of choosing a server: server = hash(key) mod n Every object gets hashed to a new location! What happens, if a server goes down? 58
  59. 59. Consistent Hashing Use same hash function for both objects and servers shards:A, B, C objects: 1, 2, 3, 4 http://www.tom-e-white.com/2007/11/consistent-hashing.html 59
  60. 60. CAP Theorem C A P Availability Partition Tolerance Consistency http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf 60
  61. 61. BASE (vs.ACID) Basical Availability Soft State Eventual Consistency http://www.allthingsdistributed.com/2008/12/eventually_consistent.html http://www.infoq.com/articles/pritchett-latency 61
  62. 62. Wrap Up 62
  63. 63. RDBMS will not die Use a relational database unless you have good reason not to 63
  64. 64. RDBMS have their limits Vertical scaling is expensive and has hard limits Horizontal scaling is not possible/ limited Joins on big and distributed tables too expenisve/ too slow Rigid Schema inappropriate for semi structured/dynamic data (sparse tables) Consistency is higher rated than availability 64
  65. 65. NoSQL come to the rescue Distribution and scalability are fundamental design goals of NoSQL DBs Tradeoff between Consistency,Availability and horizontal scalability (CAP Theorem, BASE) Small footprint in favor of ease of use Outstandingly proven in practice (Google, Amazon, Facebook, LinkedIn,Twitter, ...) 65
  66. 66. There are cons too Broad spectrum of products is difficult to understand You have to get used to designing models for Key/Value or Column Family stores Mostly no ad hoc queries No standards - no portability Sometimes poor documentation Few commercial support offers 66
  67. 67. RDBMS vs. NoSQL think about data think about queries redundancy is bad redundancy is ok indexes managed by DB manage own indexes query over relations no joins always exact results results may be out of date SQL proprietary APIs 67
  68. 68. Size Complexity Key Value Column Family Document Graph RDBMS 68
  69. 69. What‘s next? 69
  70. 70. Polyglot Persistence NoSQL will break the relational dominance unlike the OODBMSs in the 80ies RDBMS is not the one and only option any more Select the storage technology that best fits your current situation Enterprises will use different storage technologies for different kinds of data DB is no integration point any more Apps talk via WebServices and encapsulate their individual data storage technologies 70
  71. 71. NewSQL The answer of traditional RDBMS vendors to the great success of NoSQL Improved RDBMS offer more features and better scalability Oracle launches Oracle NoSQL, their own NoSQL DB based upon a revised Berkley DB Oracle, Microsoft, Sybase, IBM, Greenplum, Pervuasive already have a tight Hadoop Integration „Can‘t fight it? Embrace it!“ 71
  72. 72. Links 72
  73. 73. Amazon Dynamo Paper http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/ decandia07dynamo.pdf Google Big Table Paper http://static.googleusercontent.com/external_content/ untrusted_dlcp/research.google.com/de//archive/bigtable- osdi06.pdf NoSQL Archive http://nosql-database.com DB Engines Ranking http://db-engines.com/en/ranking Recommended Reads 73
  74. 74. Thx! Arnd Kleinbeck Senior Software Architect Business Division Applications @akleinbe 74

×