Your SlideShare is downloading. ×
0
NoSQL & Architectures
Eberhard Wolff
@ewolff

Eberhard Wolff - @ewolff
About me
Eberhard Wolff
► Freelance consultant
► Head technology advisory board at
adesso
► Speaker
► Author
► 

Blog: htt...
Back in the Days….

Eberhard Wolff - @ewolff
NoSQL Is All About the Persistence
Question

Eberhard Wolff - @ewolff
Key-Value Stores
Key

Maps keys to values
► Just a large globally available Map
► i.e. not very powerful data model
► 

Va...
Wide Column
Add any "column" you like to a row
► key-(column-value)
► Column families like tables
► E.g. in the "Users" co...
Document Stores
Aggregates are typically stored as
"documents“ (key-value collection)
► JSON quite common
► No fixed schem...
Graph 
Nodes with Properties
► Typed relationships with
properties
► 

Ideal e.g. to model relations in a
social network

...
NoSQL Benefits
Costs
•  Scale out instead of Scale Up
•  Cheap Hardware
•  Usually Open Source

Dev

Ops
Flexibility
•  Sch...
Drivers
Exponential Data
Growth

Key Value

Scale Out

Wide Column

Semi Structured
Data

Document

More Connected
Data

G...
Document-oriented Databases are
the best NoSQL database
For at least one definition of “best”

Eberhard Wolff - @ewolff
Document-oriented databases
Offer scale out
> Unless you need huge amounts of data

► 

Offer a rich and flexible data mod...
Financial System
Different financial products

► 

Mapping objects / database

► 

Inheritance

► 

Eberhard Wolff - @ewol...
E/R Model
Zero
Bond

Stock

Option

Investment

> 20 database tables Country
Up to 25 attributes

Currency
Eberhard Wolff ...
#SRLSY??
Eberhard Wolff - @ewolff
Investment
Type

ID

Price

Country

Country
Currency

Zero
Bond
Interest
Rate

Fixed
Rate
Bond
Interest
Rate

Stock

Opti...
Polyglot Persistence in Ecommerce
Application
Needs transactions
& reports. Data fit well in
tables.

Complex document-lik...
The NoSQL Game
Needs transactions
& reports. Data fit well in
tables.

Complex document-like
data structures and
complex q...
Just Like the Patterns Game!
Points for each Pattern used
Extra points if one class implements
multiple Pattern

Eberhard ...
This is not how

Software Architecture works.

Eberhard Wolff - @ewolff
Why not?
More is worse!
More hardware
More Developer Skills
Not necessarily bad

More Ops Trouble
•  Installation
•  Backu...
But: Polyglot Persistence Has a Point
Object-oriented Databases did it wrong
► Strategy: Replace RDBMS
► Enterprises will ...
Archive

Classic approach for
current data

NoSQL for the archive

Current Data

Archive

RDBMS

Document
Store

Eberhard ...
Archives for Insurances
Legacy migration
► Querying and visualizing not migrated
data
► i.e. old contracts
► Legacy hard- ...
Complex Document Processing System

MongoDB
Documentoriented
Documents

Redis
Key/value
in memory
Meta Data for
quick acce...
Alternative: Only elasticsearch

•  Stores original documents as
well
•  (like a key/value store)
•  Support for complex q...
Scaling elasticsearch

Shard 1

Replica 1

Replica 2

Shard 2
Shard 3

Server

Server

Replica 3
Server
Eberhard Wolff - @...
Alternative: Only MongoDB
•  Now with (limited beta)
fulltext search
•  Excellent support for updates
•  Quite fast – memo...
Scaling MongoDB

Replica 1

Replica 1

Replica 2

Replica 2

Replica 3

Replica 3

Shard 1

Shard 2
Eberhard Wolff - @ewol...
Scaling MongoDB

Replica 1

Replica 1

Replica 1

Replica 2

Replica 2

Replica 2

Replica 3

Replica 3

Replica 3

Shard ...
What about Redis?
•  MongoDB uses memory mapped
files
– Why Redis?
•  Like a Swiss Knife
•  Cache
•  Messaging
•  Central ...
Scaling Redis

Asynchronous replication built in
Replica
Server
Replica
Eberhard Wolff - @ewolff
Alternative: Riak
• 
• 

• 

• 
• 
• 

Key / value store
But includes Solr for fulltext
search
What is the difference to a...
Scaling Riak
Server A
Shard3
Shard1

Server B
Shard1
Shard2

Shard4

Shard4

Server D
Shard2
Shard4

Server C
Shard2
Shard...
Scaling Riak
Server A
Shard3
Shard1

Server B
Shard1
Shard2

Shard4

Shard4

Server D
Shard2
Shard4

Server C
Shard2
Shard...
Scaling Riak
Server A
Shard3
Shard1

Server B
Shard1
Shard2

Shard4

Shard4
New Server

Server D
Shard2
Shard4

Server C
S...
Key/Value!
Document-oriented Databases are
the best NoSQL database
For at least one definition of “best”

Eberhard Wolff -...
MongoDB
 Redis

riak

elastic
search

Your Choice – a trade off!
Typical architecture
decision

Eberhard Wolff - @ewolff
Data Access: RDBMS
Optimizations

Data Model

•  Indices
•  Tables

spaces
No need to
change code
•  …

•  Schema
•  Store...
RDBMS separate data from
data access
Indices

Joins and normalization
allow flexible data access
patterns

Eberhard Wolff ...
Sacrifice Joins for Scalability
► Join: Combine tables to retrieve results
► Need transactions spanning multiple
tables
► E...
CAP Theorem
Consistency

► 
> All nodes see the same data
> Not the ACID Consistency

Availability

► 
> Node failure do n...
CAP Theorem
Consistency
Quorum

Partition
Tolerance

DNS
Replication

RDBMS
2 Phase
Commit

Availability
Eberhard Wolff - ...
BASE
► Basically Available Soft state
Eventually consistent
► I.e. trade consistency for
availability
Pun concerning ACID…...
BASE
Eventually consistent
► If no updates are sent for a
while all previous updates will
eventually propagate through
the...
Banking is BASE
ATMs relax rules on providing cash if
network partitioned

► 

Your account is only guaranteed to be
consi...
No Joins - What now?
► Customer and addresses must be
consistent!
► Solution: Store both as one entity
► Atomic changes ea...
Data Access MongoDB
Optimizations
•  Only basic
indices
Other
optimizations
must be

done in

code

DBA

Data Model
•  Infl...
Cluster: RDBMS
► 

Transparent to developers

► 

How many nodes?

► 

A special setup of hardware and RDBMS software

DBA...
Cluster: MongoDB
► 

CAP theorem
> If the network is
down choose
> Consistency xor
> Availabilty

► 

Deals with replicati...
More Power and more Responsibility
Architect

DB Admin

Eberhard Wolff - @ewolff
Architects
Architecture has always been a multidimensional problem
► 

► 

Need to choose persistence technology

► 

Need...
NoSQL Is All About the Persistence
Question

Eberhard Wolff - @ewolff
Upcoming SlideShare
Loading in...5
×

NoSQL and Architectures

15,302

Published on

This presentation shows the influence of NoSQL databases on software architectures. It discusses different NoSQL flavors and products and shows how software architects can get the maximum benefit from those databases.

Transcript of "NoSQL and Architectures"

  1. 1. NoSQL & Architectures Eberhard Wolff @ewolff Eberhard Wolff - @ewolff
  2. 2. About me Eberhard Wolff ► Freelance consultant ► Head technology advisory board at adesso ► Speaker ► Author ►  Blog: http://ewolff.com ► Twitter: @ewolff ►  Eberhard Wolff - @ewolff
  3. 3. Back in the Days…. Eberhard Wolff - @ewolff
  4. 4. NoSQL Is All About the Persistence Question Eberhard Wolff - @ewolff
  5. 5. Key-Value Stores Key Maps keys to values ► Just a large globally available Map ► i.e. not very powerful data model ►  Value 42 Some data No complex queries or indices ► Just access by key ► Might add e.g. full text engine ►  Redis: Cache + Persistence ► Riak: Massive scale +Solr queries ►  Eberhard Wolff - @ewolff
  6. 6. Wide Column Add any "column" you like to a row ► key-(column-value) ► Column families like tables ► E.g. in the "Users" column family ►  >  "someuser" è ("username"è"someuser"), XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX Columns named: indexing possible ► So fast queries possible XX XX XX ("email" è"someuser@example.com") ►  XX XX XX XX XX xX XX XX XX XX XX XX Apache Cassandra ► Amazon SimpleDB ► Apache HBase ► All tuned for large data sets ►  Eberhard Wolff - @ewolff
  7. 7. Document Stores Aggregates are typically stored as "documents“ (key-value collection) ► JSON quite common ► No fixed schema ► Indexes possible ► Queries possible ►  > E.g. "find all baskets that contain the product 123" Still great horizontal scalability ► Relations might be modeled as links ►  MongoDB, CouchDB ►  Eberhard Wolff - @ewolff
  8. 8. Graph Nodes with Properties ► Typed relationships with properties ►  Ideal e.g. to model relations in a social network ►  Easy to find number of followers, degree of relation etc. ► Hard to scale out ►  Neo4j ►  Eberhard Wolff - @ewolff
  9. 9. NoSQL Benefits Costs •  Scale out instead of Scale Up •  Cheap Hardware •  Usually Open Source Dev Ops Flexibility •  Schema in code not in database •  Easier to upgrade schema •  Easier to handle heterogeneous data No Object/relational impedance mismatch •  NoSQL database are more OO like Eberhard Wolff - @ewolff
  10. 10. Drivers Exponential Data Growth Key Value Scale Out Wide Column Semi Structured Data Document More Connected Data Graph Cost Flexibility Eberhard Wolff - @ewolff
  11. 11. Document-oriented Databases are the best NoSQL database For at least one definition of “best” Eberhard Wolff - @ewolff
  12. 12. Document-oriented databases Offer scale out > Unless you need huge amounts of data ►  Offer a rich and flexible data model > …and queries ►  Cost Flexibility Other databases have other sweet spots > Huge data sets > Graph structures > Analyzing data ►  Niches or mainstream? ►  Eberhard Wolff - @ewolff
  13. 13. Financial System Different financial products ►  Mapping objects / database ►  Inheritance ►  Eberhard Wolff - @ewolff
  14. 14. E/R Model Zero Bond Stock Option Investment > 20 database tables Country Up to 25 attributes Currency Eberhard Wolff - @ewolff
  15. 15. #SRLSY?? Eberhard Wolff - @ewolff
  16. 16. Investment Type ID Price Country Country Currency Zero Bond Interest Rate Fixed Rate Bond Interest Rate Stock Option … Preferred Underlying asset Eberhard Wolff - @ewolff
  17. 17. Polyglot Persistence in Ecommerce Application Needs transactions & reports. Data fit well in tables. Complex document-like data structures and complex queries Financial Data Product Catalog RDBMS Document Store High Performance & Scalability No complex queries Based on friends, their purchases and reviews Shopping Cart Recommendation Key / Value Graph Eberhard Wolff - @ewolff
  18. 18. The NoSQL Game Needs transactions & reports. Data fit well in tables. Complex document-like data structures and complex queries 2700 High Score! Financial Data Product Catalog RDBMS Document Store 0 1000 High Performance & Scalability No complex queries Based on friends, their purchases and reviews Shopping Cart Recommendation Key / Value Graph 900 800 Eberhard Wolff - @ewolff
  19. 19. Just Like the Patterns Game! Points for each Pattern used Extra points if one class implements multiple Pattern Eberhard Wolff - @ewolff
  20. 20. This is not how
 Software Architecture works. Eberhard Wolff - @ewolff
  21. 21. Why not? More is worse! More hardware More Developer Skills Not necessarily bad More Ops Trouble •  Installation •  Backup •  Disaster Recovery •  Monitoring •  Optimizations Eberhard Wolff - @ewolff
  22. 22. But: Polyglot Persistence Has a Point Object-oriented Databases did it wrong ► Strategy: Replace RDBMS ► Enterprises will stick to RDBMS ► Pure technology migration basically never happens ► …only vendors think differently ►  Eberhard Wolff - @ewolff
  23. 23. Archive Classic approach for current data NoSQL for the archive Current Data Archive RDBMS Document Store Eberhard Wolff - @ewolff
  24. 24. Archives for Insurances Legacy migration ► Querying and visualizing not migrated data ► i.e. old contracts ► Legacy hard- and software can be switched off ► Flexibility: Host data formats ► Cost: Inexpensively handling large data volumes ►  Eberhard Wolff - @ewolff
  25. 25. Complex Document Processing System MongoDB Documentoriented Documents Redis Key/value in memory Meta Data for quick access elastic search Search engine Search index Eberhard Wolff - @ewolff
  26. 26. Alternative: Only elasticsearch •  Stores original documents as well •  (like a key/value store) •  Support for complex queries elastic •  Very powerful features also for search data mining / analytics •  Not well suited for update heavy operations •  Backup / disaster recovery? •  Written in Java Eberhard Wolff - @ewolff
  27. 27. Scaling elasticsearch Shard 1 Replica 1 Replica 2 Shard 2 Shard 3 Server Server Replica 3 Server Eberhard Wolff - @ewolff
  28. 28. Alternative: Only MongoDB •  Now with (limited beta) fulltext search •  Excellent support for updates •  Quite fast – memory mapped MongoDB files •  Also fast for updates •  Disaster recovery possible •  Map/Reduce support •  Written in C++ Eberhard Wolff - @ewolff
  29. 29. Scaling MongoDB Replica 1 Replica 1 Replica 2 Replica 2 Replica 3 Replica 3 Shard 1 Shard 2 Eberhard Wolff - @ewolff
  30. 30. Scaling MongoDB Replica 1 Replica 1 Replica 1 Replica 2 Replica 2 Replica 2 Replica 3 Replica 3 Replica 3 Shard 1 Shard 2 Shard 3 Eberhard Wolff - @ewolff
  31. 31. What about Redis? •  MongoDB uses memory mapped files – Why Redis? •  Like a Swiss Knife •  Cache •  Messaging •  Central coordination in a distributed environment •  Written in C Redis Eberhard Wolff - @ewolff
  32. 32. Scaling Redis Asynchronous replication built in Replica Server Replica Eberhard Wolff - @ewolff
  33. 33. Alternative: Riak •  •  •  •  •  •  Key / value store But includes Solr for fulltext search What is the difference to a document store then? Map/reduce possible Written in Erlang Smart scaling Eberhard Wolff - @ewolff
  34. 34. Scaling Riak Server A Shard3 Shard1 Server B Shard1 Shard2 Shard4 Shard4 Server D Shard2 Shard4 Server C Shard2 Shard3 Shard3 Shard1 Eberhard Wolff - @ewolff
  35. 35. Scaling Riak Server A Shard3 Shard1 Server B Shard1 Shard2 Shard4 Shard4 Server D Shard2 Shard4 Server C Shard2 Shard3 Shard3 Shard1 Eberhard Wolff - @ewolff
  36. 36. Scaling Riak Server A Shard3 Shard1 Server B Shard1 Shard2 Shard4 Shard4 New Server Server D Shard2 Shard4 Server C Shard2 Shard3 Shard3 Shard1 Eberhard Wolff - @ewolff
  37. 37. Key/Value! Document-oriented Databases are the best NoSQL database For at least one definition of “best” Eberhard Wolff - @ewolff
  38. 38. MongoDB Redis riak elastic search Your Choice – a trade off! Typical architecture decision Eberhard Wolff - @ewolff
  39. 39. Data Access: RDBMS Optimizations Data Model •  Indices •  Tables
 spaces No need to change code •  … •  Schema •  Stored Procedures DBA Data Access •  Queries •  Other code RDBMS Architect/ Developer Eberhard Wolff - @ewolff
  40. 40. RDBMS separate data from data access Indices Joins and normalization allow flexible data access patterns Eberhard Wolff - @ewolff
  41. 41. Sacrifice Joins for Scalability ► Join: Combine tables to retrieve results ► Need transactions spanning multiple tables ► Example: Customer table + addresses ► Inserts need locks and consistency across both tables Limits scalability ► Global and distributed locks are nasty ► Consistency limits either availability or partition tolerance Eberhard Wolff - @ewolff ► 
  42. 42. CAP Theorem Consistency ►  > All nodes see the same data > Not the ACID Consistency Availability ►  > Node failure do not prevent survivors from operating Partition Tolerance ►  > System continues to operate despite arbitrary message loss C Can at max have two A P ► Or rather: If network fail – choose A or C. ►  Eberhard Wolff - @ewolff
  43. 43. CAP Theorem Consistency Quorum Partition Tolerance DNS Replication RDBMS 2 Phase Commit Availability Eberhard Wolff - @ewolff
  44. 44. BASE ► Basically Available Soft state Eventually consistent ► I.e. trade consistency for availability Pun concerning ACID… ► Not the same C, however! ►  Eberhard Wolff - @ewolff
  45. 45. BASE Eventually consistent ► If no updates are sent for a while all previous updates will eventually propagate through the system ► Then all replicas are consistent ► Can deal with network partitioning: Message will be transferred later ► All replicas are always available ►  Pun concerning ACID… ► Not the same C, however! ►  Eberhard Wolff - @ewolff
  46. 46. Banking is BASE ATMs relax rules on providing cash if network partitioned ►  Your account is only guaranteed to be consistent by the end of the year ►  Eberhard Wolff - @ewolff
  47. 47. No Joins - What now? ► Customer and addresses must be consistent! ► Solution: Store both as one entity ► Atomic changes easily possible ► Queries might be distributed across multiple notes “NoSQL does not support transactions / ACID” is wrong ►  > NoSQL does not support Joins is better > Atomic changes still possible > Schema design different Eberhard Wolff - @ewolff
  48. 48. Data Access MongoDB Optimizations •  Only basic indices Other optimizations must be
 done in
 code DBA Data Model •  Influences access
 patterns Data Access •  WriteConcerns
 how much do love your data? •  Shard key •  Consistency MongoDB Architect/ Developer Eberhard Wolff - @ewolff
  49. 49. Cluster: RDBMS ►  Transparent to developers ►  How many nodes? ►  A special setup of hardware and RDBMS software DBA Eberhard Wolff - @ewolff
  50. 50. Cluster: MongoDB ►  CAP theorem > If the network is down choose > Consistency xor > Availabilty ►  Deals with replication ►  MongoDB has master / slave replication Write Concerns: > Unacknowledged > Acknowledged > Journaled > Some nodes in the replica set ►  Queries might go to master only or also slaves ►  Influences consistency ►  MongoDB Architect/ Developer Eberhard Wolff - @ewolff
  51. 51. More Power and more Responsibility Architect DB Admin Eberhard Wolff - @ewolff
  52. 52. Architects Architecture has always been a multidimensional problem ►  ►  Need to choose persistence technology ►  Need to think about operations ►  Needs to do DBA work Eberhard Wolff - @ewolff
  53. 53. NoSQL Is All About the Persistence Question Eberhard Wolff - @ewolff
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×