0
© Zühlke 2013
Michael Lehmann &
Roman Kuczynski
Polyglot Persistence with
NoSQL
Advanced software architecture by using mu...
Abstract
Alternative data persistence technologies like NoSQL emerged since more than
10 years, but we developers hesitate...
SELECT *
FROM dbo.Presentation
WHERE Title LIKE 'Polyglot pers%'
http://mechiz.deviantart.com/art/India-32-327938771
Our holy cow!
http://www.deviantart.com/art/swiss-army-knife-185060119
One size fits all
Michael Lehmann @lehmamic
Senior Software Engineer @Zühlke since 2012
.Net enterprise and cloud applications
Roman Kuczyns...
Borat @BoratNoSQL
Why should I change? It worked for me until
now!
http://memod.deviantart.com/art/Racing-Lights-12889655
Listen to the business
RDBMS
Volume
RDBMS
Volume
Velocity
RDBMS
Volume
Velocity
Variability
RDBMS
Volume
AgilityVelocity
Variability
Borat @BoratNoSQL
Sounds plausible, but what options do we
have?
#NoSQL
Increasing performance through scale out
Roger Federer
Rafael Nadal
Andy Murray
N. Djokovic
Scaling by sharding
Roger Federer Roger Federer
Scaling by replication
Roger Federer
N. Djokovic N. Djokovic N. Djokovic
Impedance mismatch using relational databases
public class BlogPost
{
public int Id { get; set; }
public string Content { ...
Design for the relational model
public class BlogPost
{
public int Id { get; set;}
public List<Tag> Tags { get; set; }
}
p...
NoSQL databases increase productivity
var post = new BlogPost
{
Id = 1,
Content = "Any text content",
Tags = new [] { "NoS...
Data integrity cannot be enforced
NoSQL databases are eventual consistent
NY ZH
Free Free
We look for a hotel room
NY ZH
booked Free
We book the room
NY ZH
booked Free
Inconsistency
Inconsistency window
NY ZH
booked Free
Inconsistency
Someone else books the same room
NY ZH
booked booked
Inconsistency
Conflict!
Why not handle such cases by business?
performance consistency
What do we have in our toolbox?
A lot of database products
Borat @BoratNoSQL
I feel swamped, how can I differentiate
these products?
Key-value stores
Document stores
{
"playerId": 1,
"firstName": "Roger",
"lastName": "Federer",
"ranking": "#1",
"address": { "city": "Wollerau" }
"sponsors...
Column-family stores
Row 1 FirstName:Roger LastName:Federer
Row 2 FirstName:Andy LastName:Murray
Row 3 NickName:Rafa LastName:Nadal
Row n-1 Fru...
Graph databases
The graph data model
Node [1]
Name = ‘John’
Node [2]
Name = ‘Sara’
Node [5]
Name = ‘Joe’
Node [3]
Name = ‘Maria’
Node [4]
...
http://vallo29.deviantart.com/art/The-choice-150871274
Decisions, decisions…
Borat @BoratNoSQL
With every database I have to take tradeoffs
into account, I don’t want to choose only one!
Pol·y·glot – Adjective
Knowing or using several languages
Per·sist·ence – Noun
The continued or prolonged existence of som...
Retail Store
Recomendations
Neo4J
Product Catalog
Raven DB
Financial Data
MSSQL
Shopping Cart
Redis
Polyglot persistence i...
Borat @BoratNoSQL
Sounds great! But where is the catch?
We need appropriate skills
http://www.deviantart.com/art/architecture-71406568
Invest in software architecture
Integration databases have been used for years
Database
Application
A
Application
B
Polyglot persistence doesn’t work here
Database
1
Application
A
Application
B
Database
2
Application databases do not share it’s data
Database
1
Application
A
Application
B
Database
2
Borat @BoratNoSQL
Fine! But I have not only one
application.
Application database with SOA
Database
1
Application
A
Application
B
Database
2
Service
It’s all about layers
Well known layers
Presentation Layer
Domain Layer
Resource Access Layer ( Data Access Layer)
Resources
Common data tier design
Presentation
Domain
DAL
Resources RDBMS
Search
Transactions
Caching
Blobs
Triggers
Reporting
User ...
http://gundross.deviantart.com/art/Chair-72928743
The truth of reusability
Data access with reusable and seamless services
Presentation
Domain
DAL
Resources RDBMS
Search
Transactions
Caching
Blobs
...
Putting all together
Key-Value Document RDBMS
Search
Caching
Reporting
Domain
Services
User Interface
Database Tier
Middle...
Use the right tool!
Resources
NoSQL Distilled
Author: Martin Fowler, Pramod J. Sadalage
ISBN: 978-0321826626
Making Sense of NoSQL
Author: Dan...
Thank you!
NoSQL Search Roadshow Zurich 2013 - Polyglot persistence with no sql
Upcoming SlideShare
Loading in...5
×

NoSQL Search Roadshow Zurich 2013 - Polyglot persistence with no sql

640

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
640
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Do you know that? Put the hands up in the air!Almoust everybody is comfortable with SQLRoman
  • We developers are familiar with Relational databases and we know what we get when we use them. We are familiar using…SQL (Structures query language)  SQL is a standardACID operations  TransactionsRelational Schema that implies referential integrity and it’s constraints (PK, FK) to ensure data integrityData consistency must be ensured SQL is our holy cow (untouchable, established and accepted)Roman
  • Today Relational Databases such as SQL Server are a de facto standardWe choose from products (SQL Server, Oracle, MySQL...) not from technologiesFor (almost) every data persistency solutionThis reminds us to the Swiss army knife that can be used for everything.But what would you say, if you build a house and the electrician arrives with a Swiss Army Knife only?Michael
  • Before we dive deeper let’s introduce ourselves.Michael:Is a senior software engineer at Zühlke since 2012Focuses on enterprise and cloud application development in .NetRoman:Is a senior software engineer at Zühlke since 2011Focuses on data(base) architectures and technologies including BI and Big DataAgenda:For the next 40’ we are going to talk about:A short briefing of aternativenosql technologiesAnd what it means when use nosql databases in a polyglot persistence environmentBeide
  • Sure you may ask:Why should we decide for anything else than a SQL database?In almost every case we are fine using SQL.What reasons do we have to use another technology where everybody has to learn something new?Michael
  • You are right to challenge our statement to use other technologies than SQL.But nowadays we have other circumstances than a few years ago.(with cloud computing and BigData we have..)There are some business drivers that challenge RDBMS.We didn’t invent those business drivers.Let’s have a look at most reasonable drivers.Michael
  • One of the main drivers is Big Data.Hence, our first driver is Volume:Amount of data: from MB  TB  PBMore users, applications or devices accessing dataRDBMS reached their physical or financial limits. we need scalable and affordable solutionsRoman
  • Our second driver is velocity: Encreasing incoming data frequencyWe want to work everytime with the “newest” and up to date data without delays (think of a tweet arriving to late)  From daily data to “realtime” The speed the data arrives and getting out, in other words the whole data lifecycle gets accelerated.Roman
  • Another aspect is Variability:Data origin from different devices, user produced or generally sources (mobile data, web content, sensors…)Often unstructured data, the amount of structured data is relatively small.Roman
  • Agility is our last driver and second main aspect beside Big Data.Productivity of development workResponsive for changes Time to market and low costsRoman
  • All these business drivers challenges and put pressure onto relational databases.Let’s see what options do we have to encounter those problems Michael
  • NoSQL database emerged in the market to meet those business driversHence, the characteristics of NoSQL databases address these problemsThere are main 2 reasons to use NoSQL:ScalingProductivityMichael
  • Why scaling?First of all most NoSQL technologies are built to scale out well.Scaling out means:We build a database cluster, based on commodity hardware (cheap), and not only spend money in a single box solution.Hence, we get availability, better performance due to load balancing, and capacityMichael
  • There are two techniques for scaling:With Sharding we distribute our data to different nodes: e.g. customer [a-m] on node #2 and customer [n-z] on node #2Some of the NoSQL database even provide autosharding, means we just can add new nodes to the cluster and the database will redistribute the data.Michael
  • There are two techniques for scaling:With Replication we duplicate our data on different nodes. This gives us a better failover and increases performance with parallel access.Michael
  • As a developer, we want to store our in-memory object structures.Using Relational databases, we have to map our objects to a relational model.Here we talk about the so called impedance mismatch. =&gt; a simple example list of strings
  • In general, NoSQL databases are schemaless, what brings us advantages in gaining development productivity.Usually data is stored in XML or Json format. This allows us to store our objects straight forward.Data format is version tolerant. If we change our data structures in code (i.e. we add or remove a property), NoSQL database does not care about it. In a relational database we have to update the schema and migrate the data as well.Schemaless does not mean, having no schema. We do have an implicit schema, but not enforced by the database! 2. FoliefürSchemalos und versionstolerantMichael
  • Let’s talk about consistencyScaling out and the fact, that no schema is present, influences data consistency.1. Having no schema means: We cannot enforce data integrity by the database.Roman
  • NoSQL databases generally don’t support ACID Operations (transactions)They rather provide eventual consistentcy.Roman
  • Let’s illustrate this by an example:We want to book a hotel room and we see the room is free.Roman
  • We book the room.On ZH server the room is still free, because update has not been processed yet.Roman
  • Data is inconsistent for a short period of time.We call this the inconsistency window!As soon as updates are processed on ZH, data will be eventually consistent!Means: Nobody else has booked the room on ZH server before updates had been processed!Roman
  • BUT: What happens, if someone else has booked the room on ZH before synchronization?Roman
  • To avoid conflicts:We have to wait for a commit of ZH to complete the updateWe define one server as the master, and only the master accept changes.BUT: In case of such a conflict, do we need strong consistency?What is more important? No conflict or risk to lose customers?Roman
  • we may handle conflicts by business  discount, spare roomsRoman
  • To avoid cpnflicts we take “latency time” into accountWe have to reconsider when strong consistency is required Hence, we have to balance between performance and strong consistencyRoman
  • At the beginning we spoke about the Swiss army knife. Now we want to discover the tools available to build our solution. Let’s open the tool box.Roman
  • We have a great variety of NoSQL database products.For example:- Riak, MongoDB, Cassandra, HBase, Neo4J, etc.Roman
  • There are so many products, there is no way to no all these databases in detail.But we can classify them along their main characteristics.Nowadays 4 well known NoSQL Categories got generally accepted in the NoSQL community.Michael
  • We start with the most simple databases are so called Key-Values Stores. Originally developed for distributed caches (e.g. web sites)- Typical characteristics are, that the data is stored and accessed by a unique key, comparable to c# dictionary.- The data is just a bucket (or a set) of any data, which generally can not be queried.- Very fast to write and read data.- Easy to scale.Typical products: Memcached, Redis, RiakMichael
  • The most widely used databases are document stores.Typical products: CouchDB, MongoDB, RavenDB (written in .net!), (OrientDB hybrid, also Graph database)- As the name implies, these databases store the data as documents. The whole document is a serialized object tree (Aggregates),which makes this kind of databases very intuitive and easy to work with.Michael
  • Here is an example of such a document:- Generally the documents are stored in xml or json (bison) format.- These databases are query enabled, so we can search for a value of a given property in hierarchical documentsand we can apply indexes on data fields to optimise the queries.Michael
  • Column-family stores are the most close to table like in relational databases. They are also known as Wide column databases or big tables.Typical products: Cassandra, Hbase, HypertableRoman
  • Column family stores are semi-schematic:- You can think of a column-family store as one huge, big table with lots of columns- These columns are organised in so called “Column Families”, which are equivalent to tables in RDMBS.- Data is stored as rows and accessed by a row key and the column name.- Not every row has to contain the same columns- Even more, columns can dynamically added or removed to a row.Roman
  • Last but not least, graph databases are a bit exotic.Typical products: Neo4J, Infinite Graph, OrientDBMichael
  • Relational databases and the NoSQL categories we discussed already are not strong in modelling complex relationships.Imaging you have a graph like this (see example). How would you model and query the nodes and relationships in SQL?You see there are some limitations. Queries to traverse a graph would decrease the performance drastically.Using RDBMS, the data model defines how to query the relationships.Graph databases have another architectural approach and focuson the relationships of the data. Data is modelled as a graph with nodes and edges. Edges are the relationships between the nodes and can contain data as well.Special query languages like Cipher for Neo4J allow to traverse the graph intuitively.Example: see illustrationMichael
  • After this excurse, we now have a variety of databases to store our dataNot only NoSQL databases, but also the Relational databases. Yes, they still do have a right to exist, and they have a place in our toolbelt.Examples:Facebook, EBay  CassandraCERN for ATLAS Detector used for Large Hadron Collider  CassandraForbes.com for articles  MongoDBSalesforce Marketing Cloud  MongoDBAdobe, HP  Neo4JRDBMS still can be used!Michael
  • But now, which database is the best for my application.All of the databases have their advantages and disadvantages for a certain scenario.Roman
  • Do we really have to pick only one database? In any scenario, we have to take trade-offs into account. Actually, we want to use the most accurate database for every job.Roman
  • Why we don’t do this?We can use different databases in an application for different storage scenarios.Martin Fowler calls this polyglot persistence.Michael
  • Title:Use the right tool for the jobScenario webshop:Caching:Redis (KV)Session storage: Redis (KV)Shopping Cart: Redis (KV)Product Catalog: RavenDB (Doc)Recommondation Engine: Neo4J (Graph)Financial Transactions: MS SQL Server (RDBMS)Reporting: MS SQL Server (RDBMS)Event Logging: Cassandra (CF)Michael
  • This is nice.But it introduces some new issues.Roman
  • First of all, we need people whoHave Knowledge in developing with these databases (API, characteristics in detail, advantages &amp; disadvantages)Have Knowledge in operating these databases (administration, installing &amp; upgrading, monitoring, backup &amp; restore, performance tuning, storage management)Roman
  • Beside the skills, polyglot persistence impacts our architecture.Using multiple databases in the same application (system) increase complexity in the code and architecture.We need to design the architecture to handle this complexity. Roman
  • In the past SQL databases had been used as so called integration databases:Relational databases have a common, platform independant interface  SQLMultiple applications accessed the same databases. So databases acted as integration platforms.The database was the master of the data, and the model ensured data quality (consistency, integrity)Using NoSQL databases, we have different circumstances:NoSQL database cannot enforce the schema and strong consistency.This requires that the application becomes the master of the data and is responsible to ensure schema and consistency.Roman
  • Polyglot Persistence does not match with the integration database idea:We don’t have a master database anymore.Means: We cannot ensure data quality across multiple databases, if they are accessed by multiple applications.Roman
  • To overcome this problem, we need a central unit being the master of the data.Here application databases come into account:The application owns a database and is the master of the data.Hence, the data can only be accessed by exactly one application and no other application accesses the database.Roman
  • In an enterprise environment we not only have one single application, but multiple applications using the same data.Since application databases don’t work with multiple applications, we need an enhancement of that design!Michael
  • We all know about the term SOA.A service acts as an application in the “application databases” scenario:The service is master of the data and can ensure the schema and consistencyMultiple applications can access the service.They never access the data(base) directly!Michael
  • We discussed the approach to ensure data quality and schema with polyglot persistence.Now let’s have a look at how we can avoid a mess in our code.The key is that we structure our code in layers.Michael
  • For this we use the commonly well-known layer architecture:PresentationBusinessResource Access (DAL)Resources (databases)Michael
  • Usually, the data access layer contains all of the logic to access the data.Bloated, heavy and application-specificNot reusableExpensive to maintainMichael
  • Example reusability of a chair:Object-related vs. interface-related  the interface can be reused, not the object! Interfaces are reusable!!!!!!!!!! Not objectsMichael
  • Especially with NoSQL databases we should to aspire towards small reusable data services:Provide functionality accessing only one database.Example: Caching Service.Because these services provide not only data access, but also business logic (domain-independent), they are part of the business layer.Hence, the former bloated data access layer gets split into several independent services and is moved to the business layer.In Fact, the services have it’s own DAL, that’s why we placed we placed it across the layers.Data services are domain-independent. They don’t implement domain-specific domain logic!Michael
  • We now have several independent data services:They can be used by multiple applications.Data services ensure data consistency and schema in their owned databaseData consistency across several data services is not guaranteed by the data services.BeispielWebshop (Catalog/Shopping Cart/Order System/Recommendation Engine/Financial Transaction)A few minutes ago we talked about a central unit to ensure data consistency across several databases.To meet this requirement applied to our data services, we introduce business services:They provide the domain logic and use one or even more data services.Responsible to guarantee data consistency.Michael
  • Relational Databases give us confidence, because they are established and approved and robust. But that does not mean, that we use it for everything.At the very beginning, we compared our affection to Relational Databases with a Swiss army knife (one tool fits all).Now we have a toolbox full of individual tools to do our job, like a professional.That means, we have a variety of database technologies and products.We know about their strengthAnd how to use them  REMEMBER: Use the right tool for the job!Roman
  • NoSQL DistilledMaking Sense of NoSQLhttp://www.datastax.com/documentation/gettingstarted/index.htmlhttp://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/de//archive/bigtable-osdi06.pdfhttp://nosql-database.org/http://en.wikipedia.org/wiki/NoSQLRoman
  • Thank youZühlke LogoTwitter names
  • Transcript of "NoSQL Search Roadshow Zurich 2013 - Polyglot persistence with no sql"

    1. 1. © Zühlke 2013 Michael Lehmann & Roman Kuczynski Polyglot Persistence with NoSQL Advanced software architecture by using multiple persistence technologies 19. September 2013
    2. 2. Abstract Alternative data persistence technologies like NoSQL emerged since more than 10 years, but we developers hesitate to open our horizon for these new approaches. Why should we? Relational databases dominated the IT industry for a long time and served us very well. Everybody knows SQL and is used to the relational data model with all its advantages and disadvantages. But the one who are looking beyond their borders will find a richness of NoSQL technologies and products. Every product has its own properties and characteristics. How can we differentiate them? Is it all about smart decisions, or do we have more possibilities? We will go into the world of NoSQL and explain the different kind of NoSQL products, when to use them and what is about polyglot persistence to be.
    3. 3. SELECT * FROM dbo.Presentation WHERE Title LIKE 'Polyglot pers%'
    4. 4. http://mechiz.deviantart.com/art/India-32-327938771 Our holy cow!
    5. 5. http://www.deviantart.com/art/swiss-army-knife-185060119 One size fits all
    6. 6. Michael Lehmann @lehmamic Senior Software Engineer @Zühlke since 2012 .Net enterprise and cloud applications Roman Kuczynski @qtschi Senior Software Engineer @Zühlke since 2011 Data(base) architectures, BI and Big Data
    7. 7. Borat @BoratNoSQL Why should I change? It worked for me until now!
    8. 8. http://memod.deviantart.com/art/Racing-Lights-12889655 Listen to the business
    9. 9. RDBMS Volume
    10. 10. RDBMS Volume Velocity
    11. 11. RDBMS Volume Velocity Variability
    12. 12. RDBMS Volume AgilityVelocity Variability
    13. 13. Borat @BoratNoSQL Sounds plausible, but what options do we have?
    14. 14. #NoSQL
    15. 15. Increasing performance through scale out
    16. 16. Roger Federer Rafael Nadal Andy Murray N. Djokovic Scaling by sharding
    17. 17. Roger Federer Roger Federer Scaling by replication Roger Federer N. Djokovic N. Djokovic N. Djokovic
    18. 18. Impedance mismatch using relational databases public class BlogPost { public int Id { get; set; } public string Content { get; set; } public List<string> Tags { get; set; } }
    19. 19. Design for the relational model public class BlogPost { public int Id { get; set;} public List<Tag> Tags { get; set; } } public class Tag { public int Id { get; set; } public PlogPost BelongsTo { get; set; } public string Name { get; set; } } BlogPost - Id (int) - Content (varchar) Tag - Id (int) - BlogPostId (int) - Name (varchar)
    20. 20. NoSQL databases increase productivity var post = new BlogPost { Id = 1, Content = "Any text content", Tags = new [] { "NoSQL", "Cloud", "PolyglotPersistence" } }; collection.Insert(post);
    21. 21. Data integrity cannot be enforced
    22. 22. NoSQL databases are eventual consistent
    23. 23. NY ZH Free Free We look for a hotel room
    24. 24. NY ZH booked Free We book the room
    25. 25. NY ZH booked Free Inconsistency Inconsistency window
    26. 26. NY ZH booked Free Inconsistency Someone else books the same room
    27. 27. NY ZH booked booked Inconsistency Conflict!
    28. 28. Why not handle such cases by business?
    29. 29. performance consistency
    30. 30. What do we have in our toolbox?
    31. 31. A lot of database products
    32. 32. Borat @BoratNoSQL I feel swamped, how can I differentiate these products?
    33. 33. Key-value stores
    34. 34. Document stores
    35. 35. { "playerId": 1, "firstName": "Roger", "lastName": "Federer", "ranking": "#1", "address": { "city": "Wollerau" } "sponsors“: [ { "id": 1, "name": "Nike" "amount": "16’000 SFR" }, { "id": 2, "name": "Lindt" "amount": "5’000 SFR" }, { "id": 3, "name": "Credit Suisse" "amount": “13’000 SFR" }] } The document store data model
    36. 36. Column-family stores
    37. 37. Row 1 FirstName:Roger LastName:Federer Row 2 FirstName:Andy LastName:Murray Row 3 NickName:Rafa LastName:Nadal Row n-1 Fruit:Apple Price:1.40$ Row n Fruit:Cherry Price:2.60$ Column-Family: Players Column-Family: Fruits The column-family data model
    38. 38. Graph databases
    39. 39. The graph data model Node [1] Name = ‘John’ Node [2] Name = ‘Sara’ Node [5] Name = ‘Joe’ Node [3] Name = ‘Maria’ Node [4] Name = ‘Steve’ friend friend friend friend
    40. 40. http://vallo29.deviantart.com/art/The-choice-150871274 Decisions, decisions…
    41. 41. Borat @BoratNoSQL With every database I have to take tradeoffs into account, I don’t want to choose only one!
    42. 42. Pol·y·glot – Adjective Knowing or using several languages Per·sist·ence – Noun The continued or prolonged existence of something
    43. 43. Retail Store Recomendations Neo4J Product Catalog Raven DB Financial Data MSSQL Shopping Cart Redis Polyglot persistence illustrated
    44. 44. Borat @BoratNoSQL Sounds great! But where is the catch?
    45. 45. We need appropriate skills
    46. 46. http://www.deviantart.com/art/architecture-71406568 Invest in software architecture
    47. 47. Integration databases have been used for years Database Application A Application B
    48. 48. Polyglot persistence doesn’t work here Database 1 Application A Application B Database 2
    49. 49. Application databases do not share it’s data Database 1 Application A Application B Database 2
    50. 50. Borat @BoratNoSQL Fine! But I have not only one application.
    51. 51. Application database with SOA Database 1 Application A Application B Database 2 Service
    52. 52. It’s all about layers
    53. 53. Well known layers Presentation Layer Domain Layer Resource Access Layer ( Data Access Layer) Resources
    54. 54. Common data tier design Presentation Domain DAL Resources RDBMS Search Transactions Caching Blobs Triggers Reporting User Interface Relational-ObjectObject-Relational
    55. 55. http://gundross.deviantart.com/art/Chair-72928743 The truth of reusability
    56. 56. Data access with reusable and seamless services Presentation Domain DAL Resources RDBMS Search Transactions Caching Blobs Triggers Reporting User Interface Relational-ObjectObject-Relational Presentation Domain DAL Resources User Interface Search Transactions Caching Blobs Batch Reporting Key-Value Document RDBMS
    57. 57. Putting all together Key-Value Document RDBMS Search Caching Reporting Domain Services User Interface Database Tier Middle Tier
    58. 58. Use the right tool!
    59. 59. Resources NoSQL Distilled Author: Martin Fowler, Pramod J. Sadalage ISBN: 978-0321826626 Making Sense of NoSQL Author: Dan McCreary, Ann Kelly ISBN: 978-1617291074 Links http://nosql-database.org/ http://en.wikipedia.org/wiki/NoSQL
    60. 60. Thank you!
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×