To be relational, or not to be relational? That's NOT the question!
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

To be relational, or not to be relational? That's NOT the question!

on

  • 9,832 views

With distributed computing being a reality, whether between clouds or within data center walls, programmers and architects are facing new challenges. In traditional enterprise development, we?ve been ...

With distributed computing being a reality, whether between clouds or within data center walls, programmers and architects are facing new challenges. In traditional enterprise development, we?ve been relying on the database to be the keeper of our data and constraints, but since the pattern didn't scale, NoSQL recently came to our rescue. Now what does all that mean? And more importantly, what does it mean to you and your application? This session aims at guiding you on the journey from using a single relational database server, to making it as scalable as possible and choosing proper alternative solutions: explaining pros and cons of tools available depending on your needs, their impact on your architecture, the concepts and algorithms under the hood, and when NoSQL should really be Not Only SQL. This session should be attended by developers and architects that plan on delivering applications in the next couple of years, as the future is now and you can't get struggled with hamletic questions!

Statistics

Views

Total Views
9,832
Views on SlideShare
7,137
Embed Views
2,695

Actions

Likes
11
Downloads
100
Comments
2

26 Embeds 2,695

http://thetechnicalweb.blogspot.in 1364
http://www.codespot.net 963
http://fuzzytolerance.info 204
http://thetechnicalweb.blogspot.com 61
http://gisahmedabad.blogspot.in 45
http://paper.li 11
http://thetechnicalweb.blogspot.co.uk 9
http://feeds.feedburner.com 6
http://translate.googleusercontent.com 5
http://www.planetgs.com 4
http://thetechnicalweb.blogspot.de 3
https://www.google.com 3
http://thetechnicalweb.blogspot.ca 2
http://thetechnicalweb.blogspot.sg 2
http://thetechnicalweb.blogspot.com.au 2
http://thetechnicalweb.blogspot.co.il 1
http://www.thetechnicalweb.blogspot.in 1
http://thetechnicalweb.blogspot.com.es 1
http://www.tecnotic.com 1
http://10.237.125.81 1
http://thetechnicalweb.blogspot.ch 1
http://10.225.161.111 1
http://a0.twimg.com 1
http://grumpy.junta.com.au 1
http://www.onlydoo.com 1
http://gisahmedabad.blogspot.ae 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • In our blog about SQL vs NoSQL we discuss advantages on SQL (Relational) and NoSQL models, in the broader sense and ecosystem:
    http://blog.xeround.com/2010/12/nosql-the-sequel
    Are you sure you want to
    Your message goes here
    Processing…
  • Awesome slide set :)
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

To be relational, or not to be relational? That's NOT the question! Presentation Transcript

  • 1. To be relational, or not ? Thats not the question!@sbtourist aka sergio bossa@alexsnaps aka alex snaps sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 2. AgendaAct 1 – Relational databases: a difficult love affair.Interlude 1 – Problems in the modern era: big data and the CAP theorem.Act 2 : Non-relational databases: love is in the air.Interlude 2 : Relational or non-relational? Not the correct question.Act 3 : Building scalable apps.The End sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 3. About usSergio Bossa Software Engineer at Bwin Italy. Long time open source contributor. (Micro)Blogger - @sbtourist.Alex Snaps Software engineer at Terracotta … … after 10 years of industry experience. www.codespot.net — @alexsnaps sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 4. Act I Relational databasesA difficult love affair … sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 5. The relational modelDefines constraints Finite model aka relation variable, relation or table Candidate keys Foreign keysQueries are relations themselves Heading & bodySQL differs slightly from the "pure" relational model sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 6. The ACID guaranteesLet people easily reason about the problem. Atomic. We see all changes, or no changes at all. Consistent. Changes respect our rules and constraints. Isolated. We see all changes as independently happening. Durable. We keep the effect of our changes forever.Fits a simplified model of our reality. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 7. The SQL ubiquitySQL is everywhere Still trying to figure out why my blog uses a relational database to be honestSQL is known by everyone Raise your hand if youve never written a SQL query... and if you dont want to ORM are there for you ActiveRecord if youre into RailsSimple persistence for all our objects! sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 8. Interlude 1Problems of  the Modern Era sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 9. Big DataWhats Big Data for you?Not a question of quantity. Gigabytes? Terabytes? Petabytes?Its all about supporting your business growth. Growth in terms of schema evolution. Growth in terms of data storage. Growth in terms of data processing.Is your data stack capable of handling such a growth? sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 10. CAP TheoremYear 2000: Formulated by Eric Brewer.Year 2002: Demonstrated by Lynch and Gilbert.Nowadays: A religion for many distributed system guys.CAP Theorem in a few words: Consistency. Availability. Partition-Tolerance. Pick (at most) two.More later. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 11. Act IINon-relational databasesLove is in the air sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 12. Non-relational databases From the origins to the current explosion... How do they differ? sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 13. Data Model (1)Column-family. Key-identified rows with a sparse number of columns. Columns grouped in families. Multiple families for the same key. Dynamically add/remove columns. Efficiently access same-family columns sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 14. Data Model (2)Graph. Vertices represent your data. Edges represent meaningful relations between nodes. Key/Value properties attached to both. Indexed properties. Efficient traversal. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 15. Data Model (3)Document. Schemaless documents. With denormalized data. Stored as a whole unit. Clients can update/query contents. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 16. Data Model (4)Key/Value. Opaque values. Maximize flexibility. Efficiently store and retrieve by key. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 17. Consistency Model (1)Strict (Sequential) Consistency. Every read/write operation act on either: The last value read. The last value written. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 18. Consistency Model (2)Eventual Consistency. All read/write operations will eventually reach consistent state. Stale data may be served. Versions may diverge. Repair may be needed. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 19. Partitioning (1)Client-side partitioning. Every server is self-contained. Clients partition data per-request. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 20. Partitioning (2)Server-side partitioning. Servers automatically partition data. Consistent-hashing, ring-based is the most used. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 21. Replication (1)Master/Slave. Master propagates changes to slaves. Log replication. Statement replication. Slaves may take over master in case of failures. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 22. Replication (2)N-Replication. Nodes are all peers. No master, no slaves. Each node replicates its data to a subset (N) of nodes. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 23. ProcessingA distributed system is built by: Moving data toward its behavior. ... or ... Moving behavior toward its data.An efficient distributed system is built by: Moving behavior toward its data.Map/Reduce is the most common and efficient. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 24. CAP - ProblemCAP Theorem. Consistency. Availability. Partition tolerance. Pick two. Do you remember?Makes sense only when dealing with partitions/failures ... sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 25. CAP - Trade-offsConsistency + Availability. Requests will wait until partitions heal. Full consistency. Full availability. Max latency.Consistency + Partition tolerance. Some requests will act on partial data. Some requests will be refused. Full consistency. Reduced availability. Min latency.Availability + Partition tolerance. All requests will be fulfilled. Sacrifice consistency. Max availability. Min latency. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 26. Interlude IIRelational or non-relational? Not the correct question! sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 27. Relational or non-relational? Not the correct question!Freedom to build the right solution for your problems.Freedom to scale your solution as your problems scale.Know your use case. Understand the problem domain, then choose technology.Know your data. Understand your data and data access patterns, then choose technology.Know your tools. Understand available tools, dont go blind.Pick the simple solution. Choose the simpler technology that works for your problem.Build on that. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 28. Act IIIBuilding scalable apps sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 29. Building scalable appsRelational databases sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 30. Scaling out...Adding app servers will work... ... for a little while!But at some point we need to scale the database as well RDBMS WRITE READ PowerBook G4 PowerBook G4 PowerBook G4 sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 31. Master-SlaveOne master gets all the writes replicates to slavesSlaves (& master) gets the read operationsWe didnt really scale writes Static topologySPOF remains sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 32. Master-MasterMultiple masters Writes & reads all participantsWrites are replicated to mastersSynchronously Expensive 2PCAsynchronously Conflicts have to be resolvedComplexStatic topologyLimited performance againSolved SPOF, sort of… sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 33. Vertical PartitioningData is split across multiple database servers based on The functional areaJoins are moved to the application Not relational anymoreSPOF backWhat about one functional area growing "out of hand" ? sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 34. Horizontal partitioningData is split across multiple database servers based on Key shardingJoins are moved to the application Not relational anymoreSPOF backWhat about one functional area growing "out of hand" ?Routing required Wheres Order#123 ? sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 35. Building scalable appsCaching sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 36. CachingIf going to the database is so expensive... ... we should just avoid it!Put a cache in front of the databaseData remains closer to processing unitIf needed, distribute sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 37. Distributed CachingWhat if data isnt perfectly partitioned ?How do we keep this all in sync ? RDBMS WRITEPeer-to-peer ? READ Cache Cache Cache PowerBook G4 PowerBook G4 PowerBook G4 sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 38. Distributed CachingCached data remains close to the processing unit RDBMSCentral unit L2 orchestrates it all L1 L1 L1 PowerBook G4 PowerBook G4 PowerBook G4 sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 39. Distributed Caching RDBMS L2 L1 L1 L1 PowerBook G4 PowerBook G4 PowerBook G4 sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 40. Distributed Caching SLOWER LARGER RAM L2 L1 L1 L1 Core Core Core FASTER SMALLER sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 41. Distributed Caching SLOWER LARGER RDBMS L2 L1 L1 L1 FASTER PowerBook G4 PowerBook G4 PowerBook G4 SMALLER sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 42. Distributed Caching Weve scaled our reads But what about writes ? SLOWER LARGER RDBMS L2 L1 L1 L1 FASTER PowerBook G4 PowerBook G4 PowerBook G4 SMALLER sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 43. Write-behind CacheRather than write changes directly to the slowest participant Write to fastest durable store (persistent queue) required for recovery in the face of failure Only write to database later in batches and/or coalesced while still controlling the lag the load on the databaseIn a distributed environment handling failures  we enforce happens at least once  loosens the contract vs. "once and only once"! sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 44. Write-through RDBMS Cache Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 45. Write-through RDBMS Cache Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 46. Write-through RDBMS Cache Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 47. Write-through RDBMS Cache Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 48. Write-behind RDBMS Cache Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 49. Write-behind RDBMS Cache Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 50. Write-behind RDBMS Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 51. Write-behind RDBMS Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 52. Write-behind RDBMS Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 53. Write-behind RDBMS Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 54. Write-behind RDBMS Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 55. Write-behind RDBMS Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 56. Write-behind RDBMS Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 57. Write-behind RDBMS Writer Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 58. Write-behind RDBMS Writer Cache Writer Application code sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 59. Building scalable appsNon-relational databases sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 60. The case for non-relationalsWeve seen how to scale our relational database.Weve seen how to add caching.Weve seen how to make caching scale.So why going non-relational? A matter of use case sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 61. Rich DataFrequent schema changes. Relational databases cannot easily handle table modifications. Column, Document and Graph databases can.Tracking/Processing of large relations. Relational databases keep and traverse table relations by foreign-key joins. Joins are expensive. Graph databases provide cheap and fast traversal operations. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 62. Runtime DataPoorly structured data. Relational model doesnt fit unstructured data. Unless you want BLOBs. Non-relational databases provide more flexible models.Throw-away data. Relational databases provide maximum data durability. But not all kind of data are the same. When you can afford to possibly lose some data, non-relational databases let you trade durability for higher flexibility and performance. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 63. Massive DataLarge quantity of data. Relational databases are typically single instance. Otherwise cost too much. Choose a partitioned non-relational database.High volume of writes. Scaling writes with relational databases is difficult. Even if using write-behind caching. Choose a partitioned non-relational database. Eventual consistency may also help.Complex, aggregated, queries. Complex aggregations and queries are expensive in standard relational databases. Remember joins? Choose a non-relational database supporting distributed data processing. Hint: Map/Reduce. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 64. Building scalable appsMulti-paradigm sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 65. A case for multi-paradigm: CQRS sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 66. CQRS - ExplainedCommand Store. Strictly modeled after the problem domain. Commands model domain changes. Domain changes cause events publishing.Query Store. Fed by published events. Strictly modeled after the user interface. Possibly denormalized to accommodate queries.Offline Store. Fed by published events. Strictly modeled after processing needs. Business Intelligence. Reporting. Statistical aggregations, … sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 67. What about stores implementation? sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 68. Introducing EhcacheStarted in 2003 by Greg LuckApache 2.0 LicenseIntegrated by lots of projects, productsHibernate Provider implemented 2003Web Caching 2004Distributed Caching 2006REST and SOAP APIs 2008Acquired by Terracotta Sept. 2009 sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 69. Introducing Ehcache 2.4New Search APINew consistency modesNew transactional modesStill: Small memory footprint Cache Writers JTA Bulk loadGrows with your application with only two lines of configuration Scale Up - BigMemory (100’s of Gig, in process, NO GC) Scale Out - Clustering Platform (Up to 2 Terabytes, HA) sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 70. Ehcache & TerracottaTerabyte scale with minimal footprintServer array with striping for linear scaleHigh availabilityHigh performance data persistenceIn-memory performance as you scale sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 71. Introducing TerrastoreDocument Store. Ubiquitous. Consistent. Distributed. Scalable.Written in Java.Based on Terracotta.Open Source.Apache-licensed. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 72. Terrastore Architecture sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 73. CQRS - Implemented sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 74. CQRS - ImplementedCommand Store. Express your domain as an object model. Process and store it with Ehcache and Terracotta. Optionally write it back to a relational database.Query Store. Map queries to user (screen) views. Map views to JSON documents. Store and get them back with Terrastore.Offline Store. Collect data from input commands. Keep data denormalized. Aggregate and process with Terrastore Map/Reduce. sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 75. The endBe a polyglot! sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 76. ConclusionBe polyglot Go out and play with all this Know your options (all of them)Understand your domain It’s current and future requirementsBe wise, and choose well! sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 77. Creative Commons by wwworks sergio bossa & alex snaps - @sbtourist & @alexsnaps
  • 78. More infowww.ehcache.orgwww.terracotta.orgcode.google.com/p/terrastore sergio bossa & alex snaps - @sbtourist & @alexsnaps