NoSQL and Architectures
Upcoming SlideShare
Loading in...5
×
 

NoSQL and Architectures

on

  • 14,605 views

This presentation shows the influence of NoSQL databases on software architectures. It discusses different NoSQL flavors and products and shows how software architects can get the maximum benefit from ...

This presentation shows the influence of NoSQL databases on software architectures. It discusses different NoSQL flavors and products and shows how software architects can get the maximum benefit from those databases.

Statistics

Views

Total Views
14,605
Views on SlideShare
14,512
Embed Views
93

Actions

Likes
8
Downloads
57
Comments
0

3 Embeds 93

https://twitter.com 90
http://www.linkedin.com 2
https://www.linkedin.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

NoSQL and Architectures NoSQL and Architectures Presentation Transcript

  • NoSQL & Architectures Eberhard Wolff @ewolff Eberhard Wolff - @ewolff
  • About me Eberhard Wolff ► Freelance consultant ► Head technology advisory board at adesso ► Speaker ► Author ►  Blog: http://ewolff.com ► Twitter: @ewolff ►  Eberhard Wolff - @ewolff
  • Back in the Days…. Eberhard Wolff - @ewolff
  • NoSQL Is All About the Persistence Question Eberhard Wolff - @ewolff
  • Key-Value Stores Key Maps keys to values ► Just a large globally available Map ► i.e. not very powerful data model ►  Value 42 Some data No complex queries or indices ► Just access by key ► Might add e.g. full text engine ►  Redis: Cache + Persistence ► Riak: Massive scale +Solr queries ►  Eberhard Wolff - @ewolff
  • Wide Column Add any "column" you like to a row ► key-(column-value) ► Column families like tables ► E.g. in the "Users" column family ►  >  "someuser" è ("username"è"someuser"), XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX Columns named: indexing possible ► So fast queries possible XX XX XX ("email" è"someuser@example.com") ►  XX XX XX XX XX xX XX XX XX XX XX XX Apache Cassandra ► Amazon SimpleDB ► Apache HBase ► All tuned for large data sets ►  Eberhard Wolff - @ewolff
  • Document Stores Aggregates are typically stored as "documents“ (key-value collection) ► JSON quite common ► No fixed schema ► Indexes possible ► Queries possible ►  > E.g. "find all baskets that contain the product 123" Still great horizontal scalability ► Relations might be modeled as links ►  MongoDB, CouchDB ►  Eberhard Wolff - @ewolff
  • Graph Nodes with Properties ► Typed relationships with properties ►  Ideal e.g. to model relations in a social network ►  Easy to find number of followers, degree of relation etc. ► Hard to scale out ►  Neo4j ►  Eberhard Wolff - @ewolff
  • NoSQL Benefits Costs •  Scale out instead of Scale Up •  Cheap Hardware •  Usually Open Source Dev Ops Flexibility •  Schema in code not in database •  Easier to upgrade schema •  Easier to handle heterogeneous data No Object/relational impedance mismatch •  NoSQL database are more OO like Eberhard Wolff - @ewolff
  • Drivers Exponential Data Growth Key Value Scale Out Wide Column Semi Structured Data Document More Connected Data Graph Cost Flexibility Eberhard Wolff - @ewolff
  • Document-oriented Databases are the best NoSQL database For at least one definition of “best” Eberhard Wolff - @ewolff
  • Document-oriented databases Offer scale out > Unless you need huge amounts of data ►  Offer a rich and flexible data model > …and queries ►  Cost Flexibility Other databases have other sweet spots > Huge data sets > Graph structures > Analyzing data ►  Niches or mainstream? ►  Eberhard Wolff - @ewolff
  • Financial System Different financial products ►  Mapping objects / database ►  Inheritance ►  Eberhard Wolff - @ewolff
  • E/R Model Zero Bond Stock Option Investment > 20 database tables Country Up to 25 attributes Currency Eberhard Wolff - @ewolff
  • #SRLSY?? Eberhard Wolff - @ewolff
  • Investment Type ID Price Country Country Currency Zero Bond Interest Rate Fixed Rate Bond Interest Rate Stock Option … Preferred Underlying asset Eberhard Wolff - @ewolff
  • Polyglot Persistence in Ecommerce Application Needs transactions & reports. Data fit well in tables. Complex document-like data structures and complex queries Financial Data Product Catalog RDBMS Document Store High Performance & Scalability No complex queries Based on friends, their purchases and reviews Shopping Cart Recommendation Key / Value Graph Eberhard Wolff - @ewolff
  • The NoSQL Game Needs transactions & reports. Data fit well in tables. Complex document-like data structures and complex queries 2700 High Score! Financial Data Product Catalog RDBMS Document Store 0 1000 High Performance & Scalability No complex queries Based on friends, their purchases and reviews Shopping Cart Recommendation Key / Value Graph 900 800 Eberhard Wolff - @ewolff
  • Just Like the Patterns Game! Points for each Pattern used Extra points if one class implements multiple Pattern Eberhard Wolff - @ewolff
  • This is not how
 Software Architecture works. Eberhard Wolff - @ewolff
  • Why not? More is worse! More hardware More Developer Skills Not necessarily bad More Ops Trouble •  Installation •  Backup •  Disaster Recovery •  Monitoring •  Optimizations Eberhard Wolff - @ewolff
  • But: Polyglot Persistence Has a Point Object-oriented Databases did it wrong ► Strategy: Replace RDBMS ► Enterprises will stick to RDBMS ► Pure technology migration basically never happens ► …only vendors think differently ►  Eberhard Wolff - @ewolff
  • Archive Classic approach for current data NoSQL for the archive Current Data Archive RDBMS Document Store Eberhard Wolff - @ewolff
  • Archives for Insurances Legacy migration ► Querying and visualizing not migrated data ► i.e. old contracts ► Legacy hard- and software can be switched off ► Flexibility: Host data formats ► Cost: Inexpensively handling large data volumes ►  Eberhard Wolff - @ewolff
  • Complex Document Processing System MongoDB Documentoriented Documents Redis Key/value in memory Meta Data for quick access elastic search Search engine Search index Eberhard Wolff - @ewolff
  • Alternative: Only elasticsearch •  Stores original documents as well •  (like a key/value store) •  Support for complex queries elastic •  Very powerful features also for search data mining / analytics •  Not well suited for update heavy operations •  Backup / disaster recovery? •  Written in Java Eberhard Wolff - @ewolff
  • Scaling elasticsearch Shard 1 Replica 1 Replica 2 Shard 2 Shard 3 Server Server Replica 3 Server Eberhard Wolff - @ewolff
  • Alternative: Only MongoDB •  Now with (limited beta) fulltext search •  Excellent support for updates •  Quite fast – memory mapped MongoDB files •  Also fast for updates •  Disaster recovery possible •  Map/Reduce support •  Written in C++ Eberhard Wolff - @ewolff
  • Scaling MongoDB Replica 1 Replica 1 Replica 2 Replica 2 Replica 3 Replica 3 Shard 1 Shard 2 Eberhard Wolff - @ewolff
  • Scaling MongoDB Replica 1 Replica 1 Replica 1 Replica 2 Replica 2 Replica 2 Replica 3 Replica 3 Replica 3 Shard 1 Shard 2 Shard 3 Eberhard Wolff - @ewolff
  • What about Redis? •  MongoDB uses memory mapped files – Why Redis? •  Like a Swiss Knife •  Cache •  Messaging •  Central coordination in a distributed environment •  Written in C Redis Eberhard Wolff - @ewolff
  • Scaling Redis Asynchronous replication built in Replica Server Replica Eberhard Wolff - @ewolff
  • Alternative: Riak •  •  •  •  •  •  Key / value store But includes Solr for fulltext search What is the difference to a document store then? Map/reduce possible Written in Erlang Smart scaling Eberhard Wolff - @ewolff
  • Scaling Riak Server A Shard3 Shard1 Server B Shard1 Shard2 Shard4 Shard4 Server D Shard2 Shard4 Server C Shard2 Shard3 Shard3 Shard1 Eberhard Wolff - @ewolff
  • Scaling Riak Server A Shard3 Shard1 Server B Shard1 Shard2 Shard4 Shard4 Server D Shard2 Shard4 Server C Shard2 Shard3 Shard3 Shard1 Eberhard Wolff - @ewolff
  • Scaling Riak Server A Shard3 Shard1 Server B Shard1 Shard2 Shard4 Shard4 New Server Server D Shard2 Shard4 Server C Shard2 Shard3 Shard3 Shard1 Eberhard Wolff - @ewolff
  • Key/Value! Document-oriented Databases are the best NoSQL database For at least one definition of “best” Eberhard Wolff - @ewolff
  • MongoDB Redis riak elastic search Your Choice – a trade off! Typical architecture decision Eberhard Wolff - @ewolff
  • Data Access: RDBMS Optimizations Data Model •  Indices •  Tables
 spaces No need to change code •  … •  Schema •  Stored Procedures DBA Data Access •  Queries •  Other code RDBMS Architect/ Developer Eberhard Wolff - @ewolff
  • RDBMS separate data from data access Indices Joins and normalization allow flexible data access patterns Eberhard Wolff - @ewolff
  • Sacrifice Joins for Scalability ► Join: Combine tables to retrieve results ► Need transactions spanning multiple tables ► Example: Customer table + addresses ► Inserts need locks and consistency across both tables Limits scalability ► Global and distributed locks are nasty ► Consistency limits either availability or partition tolerance Eberhard Wolff - @ewolff ► 
  • CAP Theorem Consistency ►  > All nodes see the same data > Not the ACID Consistency Availability ►  > Node failure do not prevent survivors from operating Partition Tolerance ►  > System continues to operate despite arbitrary message loss C Can at max have two A P ► Or rather: If network fail – choose A or C. ►  Eberhard Wolff - @ewolff
  • CAP Theorem Consistency Quorum Partition Tolerance DNS Replication RDBMS 2 Phase Commit Availability Eberhard Wolff - @ewolff
  • BASE ► Basically Available Soft state Eventually consistent ► I.e. trade consistency for availability Pun concerning ACID… ► Not the same C, however! ►  Eberhard Wolff - @ewolff
  • BASE Eventually consistent ► If no updates are sent for a while all previous updates will eventually propagate through the system ► Then all replicas are consistent ► Can deal with network partitioning: Message will be transferred later ► All replicas are always available ►  Pun concerning ACID… ► Not the same C, however! ►  Eberhard Wolff - @ewolff
  • Banking is BASE ATMs relax rules on providing cash if network partitioned ►  Your account is only guaranteed to be consistent by the end of the year ►  Eberhard Wolff - @ewolff
  • No Joins - What now? ► Customer and addresses must be consistent! ► Solution: Store both as one entity ► Atomic changes easily possible ► Queries might be distributed across multiple notes “NoSQL does not support transactions / ACID” is wrong ►  > NoSQL does not support Joins is better > Atomic changes still possible > Schema design different Eberhard Wolff - @ewolff
  • Data Access MongoDB Optimizations •  Only basic indices Other optimizations must be
 done in
 code DBA Data Model •  Influences access
 patterns Data Access •  WriteConcerns
 how much do love your data? •  Shard key •  Consistency MongoDB Architect/ Developer Eberhard Wolff - @ewolff
  • Cluster: RDBMS ►  Transparent to developers ►  How many nodes? ►  A special setup of hardware and RDBMS software DBA Eberhard Wolff - @ewolff
  • Cluster: MongoDB ►  CAP theorem > If the network is down choose > Consistency xor > Availabilty ►  Deals with replication ►  MongoDB has master / slave replication Write Concerns: > Unacknowledged > Acknowledged > Journaled > Some nodes in the replica set ►  Queries might go to master only or also slaves ►  Influences consistency ►  MongoDB Architect/ Developer Eberhard Wolff - @ewolff
  • More Power and more Responsibility Architect DB Admin Eberhard Wolff - @ewolff
  • Architects Architecture has always been a multidimensional problem ►  ►  Need to choose persistence technology ►  Need to think about operations ►  Needs to do DBA work Eberhard Wolff - @ewolff
  • NoSQL Is All About the Persistence Question Eberhard Wolff - @ewolff