Your SlideShare is downloading. ×
NoSQL Search Roadshow Zurich 2013 - Polyglot persistence with no sql
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

NoSQL Search Roadshow Zurich 2013 - Polyglot persistence with no sql


Published on

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Do you know that? Put the hands up in the air!Almoust everybody is comfortable with SQLRoman
  • We developers are familiar with Relational databases and we know what we get when we use them. We are familiar using…SQL (Structures query language)  SQL is a standardACID operations  TransactionsRelational Schema that implies referential integrity and it’s constraints (PK, FK) to ensure data integrityData consistency must be ensured SQL is our holy cow (untouchable, established and accepted)Roman
  • Today Relational Databases such as SQL Server are a de facto standardWe choose from products (SQL Server, Oracle, MySQL...) not from technologiesFor (almost) every data persistency solutionThis reminds us to the Swiss army knife that can be used for everything.But what would you say, if you build a house and the electrician arrives with a Swiss Army Knife only?Michael
  • Before we dive deeper let’s introduce ourselves.Michael:Is a senior software engineer at Zühlke since 2012Focuses on enterprise and cloud application development in .NetRoman:Is a senior software engineer at Zühlke since 2011Focuses on data(base) architectures and technologies including BI and Big DataAgenda:For the next 40’ we are going to talk about:A short briefing of aternativenosql technologiesAnd what it means when use nosql databases in a polyglot persistence environmentBeide
  • Sure you may ask:Why should we decide for anything else than a SQL database?In almost every case we are fine using SQL.What reasons do we have to use another technology where everybody has to learn something new?Michael
  • You are right to challenge our statement to use other technologies than SQL.But nowadays we have other circumstances than a few years ago.(with cloud computing and BigData we have..)There are some business drivers that challenge RDBMS.We didn’t invent those business drivers.Let’s have a look at most reasonable drivers.Michael
  • One of the main drivers is Big Data.Hence, our first driver is Volume:Amount of data: from MB  TB  PBMore users, applications or devices accessing dataRDBMS reached their physical or financial limits. we need scalable and affordable solutionsRoman
  • Our second driver is velocity: Encreasing incoming data frequencyWe want to work everytime with the “newest” and up to date data without delays (think of a tweet arriving to late)  From daily data to “realtime” The speed the data arrives and getting out, in other words the whole data lifecycle gets accelerated.Roman
  • Another aspect is Variability:Data origin from different devices, user produced or generally sources (mobile data, web content, sensors…)Often unstructured data, the amount of structured data is relatively small.Roman
  • Agility is our last driver and second main aspect beside Big Data.Productivity of development workResponsive for changes Time to market and low costsRoman
  • All these business drivers challenges and put pressure onto relational databases.Let’s see what options do we have to encounter those problems Michael
  • NoSQL database emerged in the market to meet those business driversHence, the characteristics of NoSQL databases address these problemsThere are main 2 reasons to use NoSQL:ScalingProductivityMichael
  • Why scaling?First of all most NoSQL technologies are built to scale out well.Scaling out means:We build a database cluster, based on commodity hardware (cheap), and not only spend money in a single box solution.Hence, we get availability, better performance due to load balancing, and capacityMichael
  • There are two techniques for scaling:With Sharding we distribute our data to different nodes: e.g. customer [a-m] on node #2 and customer [n-z] on node #2Some of the NoSQL database even provide autosharding, means we just can add new nodes to the cluster and the database will redistribute the data.Michael
  • There are two techniques for scaling:With Replication we duplicate our data on different nodes. This gives us a better failover and increases performance with parallel access.Michael
  • As a developer, we want to store our in-memory object structures.Using Relational databases, we have to map our objects to a relational model.Here we talk about the so called impedance mismatch. => a simple example list of strings
  • In general, NoSQL databases are schemaless, what brings us advantages in gaining development productivity.Usually data is stored in XML or Json format. This allows us to store our objects straight forward.Data format is version tolerant. If we change our data structures in code (i.e. we add or remove a property), NoSQL database does not care about it. In a relational database we have to update the schema and migrate the data as well.Schemaless does not mean, having no schema. We do have an implicit schema, but not enforced by the database! 2. FoliefürSchemalos und versionstolerantMichael
  • Let’s talk about consistencyScaling out and the fact, that no schema is present, influences data consistency.1. Having no schema means: We cannot enforce data integrity by the database.Roman
  • NoSQL databases generally don’t support ACID Operations (transactions)They rather provide eventual consistentcy.Roman
  • Let’s illustrate this by an example:We want to book a hotel room and we see the room is free.Roman
  • We book the room.On ZH server the room is still free, because update has not been processed yet.Roman
  • Data is inconsistent for a short period of time.We call this the inconsistency window!As soon as updates are processed on ZH, data will be eventually consistent!Means: Nobody else has booked the room on ZH server before updates had been processed!Roman
  • BUT: What happens, if someone else has booked the room on ZH before synchronization?Roman
  • To avoid conflicts:We have to wait for a commit of ZH to complete the updateWe define one server as the master, and only the master accept changes.BUT: In case of such a conflict, do we need strong consistency?What is more important? No conflict or risk to lose customers?Roman
  • we may handle conflicts by business  discount, spare roomsRoman
  • To avoid cpnflicts we take “latency time” into accountWe have to reconsider when strong consistency is required Hence, we have to balance between performance and strong consistencyRoman
  • At the beginning we spoke about the Swiss army knife. Now we want to discover the tools available to build our solution. Let’s open the tool box.Roman
  • We have a great variety of NoSQL database products.For example:- Riak, MongoDB, Cassandra, HBase, Neo4J, etc.Roman
  • There are so many products, there is no way to no all these databases in detail.But we can classify them along their main characteristics.Nowadays 4 well known NoSQL Categories got generally accepted in the NoSQL community.Michael
  • We start with the most simple databases are so called Key-Values Stores. Originally developed for distributed caches (e.g. web sites)- Typical characteristics are, that the data is stored and accessed by a unique key, comparable to c# dictionary.- The data is just a bucket (or a set) of any data, which generally can not be queried.- Very fast to write and read data.- Easy to scale.Typical products: Memcached, Redis, RiakMichael
  • The most widely used databases are document stores.Typical products: CouchDB, MongoDB, RavenDB (written in .net!), (OrientDB hybrid, also Graph database)- As the name implies, these databases store the data as documents. The whole document is a serialized object tree (Aggregates),which makes this kind of databases very intuitive and easy to work with.Michael
  • Here is an example of such a document:- Generally the documents are stored in xml or json (bison) format.- These databases are query enabled, so we can search for a value of a given property in hierarchical documentsand we can apply indexes on data fields to optimise the queries.Michael
  • Column-family stores are the most close to table like in relational databases. They are also known as Wide column databases or big tables.Typical products: Cassandra, Hbase, HypertableRoman
  • Column family stores are semi-schematic:- You can think of a column-family store as one huge, big table with lots of columns- These columns are organised in so called “Column Families”, which are equivalent to tables in RDMBS.- Data is stored as rows and accessed by a row key and the column name.- Not every row has to contain the same columns- Even more, columns can dynamically added or removed to a row.Roman
  • Last but not least, graph databases are a bit exotic.Typical products: Neo4J, Infinite Graph, OrientDBMichael
  • Relational databases and the NoSQL categories we discussed already are not strong in modelling complex relationships.Imaging you have a graph like this (see example). How would you model and query the nodes and relationships in SQL?You see there are some limitations. Queries to traverse a graph would decrease the performance drastically.Using RDBMS, the data model defines how to query the relationships.Graph databases have another architectural approach and focuson the relationships of the data. Data is modelled as a graph with nodes and edges. Edges are the relationships between the nodes and can contain data as well.Special query languages like Cipher for Neo4J allow to traverse the graph intuitively.Example: see illustrationMichael
  • After this excurse, we now have a variety of databases to store our dataNot only NoSQL databases, but also the Relational databases. Yes, they still do have a right to exist, and they have a place in our toolbelt.Examples:Facebook, EBay  CassandraCERN for ATLAS Detector used for Large Hadron Collider  for articles  MongoDBSalesforce Marketing Cloud  MongoDBAdobe, HP  Neo4JRDBMS still can be used!Michael
  • But now, which database is the best for my application.All of the databases have their advantages and disadvantages for a certain scenario.Roman
  • Do we really have to pick only one database? In any scenario, we have to take trade-offs into account. Actually, we want to use the most accurate database for every job.Roman
  • Why we don’t do this?We can use different databases in an application for different storage scenarios.Martin Fowler calls this polyglot persistence.Michael
  • Title:Use the right tool for the jobScenario webshop:Caching:Redis (KV)Session storage: Redis (KV)Shopping Cart: Redis (KV)Product Catalog: RavenDB (Doc)Recommondation Engine: Neo4J (Graph)Financial Transactions: MS SQL Server (RDBMS)Reporting: MS SQL Server (RDBMS)Event Logging: Cassandra (CF)Michael
  • This is nice.But it introduces some new issues.Roman
  • First of all, we need people whoHave Knowledge in developing with these databases (API, characteristics in detail, advantages & disadvantages)Have Knowledge in operating these databases (administration, installing & upgrading, monitoring, backup & restore, performance tuning, storage management)Roman
  • Beside the skills, polyglot persistence impacts our architecture.Using multiple databases in the same application (system) increase complexity in the code and architecture.We need to design the architecture to handle this complexity. Roman
  • In the past SQL databases had been used as so called integration databases:Relational databases have a common, platform independant interface  SQLMultiple applications accessed the same databases. So databases acted as integration platforms.The database was the master of the data, and the model ensured data quality (consistency, integrity)Using NoSQL databases, we have different circumstances:NoSQL database cannot enforce the schema and strong consistency.This requires that the application becomes the master of the data and is responsible to ensure schema and consistency.Roman
  • Polyglot Persistence does not match with the integration database idea:We don’t have a master database anymore.Means: We cannot ensure data quality across multiple databases, if they are accessed by multiple applications.Roman
  • To overcome this problem, we need a central unit being the master of the data.Here application databases come into account:The application owns a database and is the master of the data.Hence, the data can only be accessed by exactly one application and no other application accesses the database.Roman
  • In an enterprise environment we not only have one single application, but multiple applications using the same data.Since application databases don’t work with multiple applications, we need an enhancement of that design!Michael
  • We all know about the term SOA.A service acts as an application in the “application databases” scenario:The service is master of the data and can ensure the schema and consistencyMultiple applications can access the service.They never access the data(base) directly!Michael
  • We discussed the approach to ensure data quality and schema with polyglot persistence.Now let’s have a look at how we can avoid a mess in our code.The key is that we structure our code in layers.Michael
  • For this we use the commonly well-known layer architecture:PresentationBusinessResource Access (DAL)Resources (databases)Michael
  • Usually, the data access layer contains all of the logic to access the data.Bloated, heavy and application-specificNot reusableExpensive to maintainMichael
  • Example reusability of a chair:Object-related vs. interface-related  the interface can be reused, not the object! Interfaces are reusable!!!!!!!!!! Not objectsMichael
  • Especially with NoSQL databases we should to aspire towards small reusable data services:Provide functionality accessing only one database.Example: Caching Service.Because these services provide not only data access, but also business logic (domain-independent), they are part of the business layer.Hence, the former bloated data access layer gets split into several independent services and is moved to the business layer.In Fact, the services have it’s own DAL, that’s why we placed we placed it across the layers.Data services are domain-independent. They don’t implement domain-specific domain logic!Michael
  • We now have several independent data services:They can be used by multiple applications.Data services ensure data consistency and schema in their owned databaseData consistency across several data services is not guaranteed by the data services.BeispielWebshop (Catalog/Shopping Cart/Order System/Recommendation Engine/Financial Transaction)A few minutes ago we talked about a central unit to ensure data consistency across several databases.To meet this requirement applied to our data services, we introduce business services:They provide the domain logic and use one or even more data services.Responsible to guarantee data consistency.Michael
  • Relational Databases give us confidence, because they are established and approved and robust. But that does not mean, that we use it for everything.At the very beginning, we compared our affection to Relational Databases with a Swiss army knife (one tool fits all).Now we have a toolbox full of individual tools to do our job, like a professional.That means, we have a variety of database technologies and products.We know about their strengthAnd how to use them  REMEMBER: Use the right tool for the job!Roman
  • NoSQL DistilledMaking Sense of NoSQL
  • Thank youZühlke LogoTwitter names
  • Transcript

    • 1. © Zühlke 2013 Michael Lehmann & Roman Kuczynski Polyglot Persistence with NoSQL Advanced software architecture by using multiple persistence technologies 19. September 2013
    • 2. Abstract Alternative data persistence technologies like NoSQL emerged since more than 10 years, but we developers hesitate to open our horizon for these new approaches. Why should we? Relational databases dominated the IT industry for a long time and served us very well. Everybody knows SQL and is used to the relational data model with all its advantages and disadvantages. But the one who are looking beyond their borders will find a richness of NoSQL technologies and products. Every product has its own properties and characteristics. How can we differentiate them? Is it all about smart decisions, or do we have more possibilities? We will go into the world of NoSQL and explain the different kind of NoSQL products, when to use them and what is about polyglot persistence to be.
    • 3. SELECT * FROM dbo.Presentation WHERE Title LIKE 'Polyglot pers%'
    • 4. Our holy cow!
    • 5. One size fits all
    • 6. Michael Lehmann @lehmamic Senior Software Engineer @Zühlke since 2012 .Net enterprise and cloud applications Roman Kuczynski @qtschi Senior Software Engineer @Zühlke since 2011 Data(base) architectures, BI and Big Data
    • 7. Borat @BoratNoSQL Why should I change? It worked for me until now!
    • 8. Listen to the business
    • 9. RDBMS Volume
    • 10. RDBMS Volume Velocity
    • 11. RDBMS Volume Velocity Variability
    • 12. RDBMS Volume AgilityVelocity Variability
    • 13. Borat @BoratNoSQL Sounds plausible, but what options do we have?
    • 14. #NoSQL
    • 15. Increasing performance through scale out
    • 16. Roger Federer Rafael Nadal Andy Murray N. Djokovic Scaling by sharding
    • 17. Roger Federer Roger Federer Scaling by replication Roger Federer N. Djokovic N. Djokovic N. Djokovic
    • 18. Impedance mismatch using relational databases public class BlogPost { public int Id { get; set; } public string Content { get; set; } public List<string> Tags { get; set; } }
    • 19. Design for the relational model public class BlogPost { public int Id { get; set;} public List<Tag> Tags { get; set; } } public class Tag { public int Id { get; set; } public PlogPost BelongsTo { get; set; } public string Name { get; set; } } BlogPost - Id (int) - Content (varchar) Tag - Id (int) - BlogPostId (int) - Name (varchar)
    • 20. NoSQL databases increase productivity var post = new BlogPost { Id = 1, Content = "Any text content", Tags = new [] { "NoSQL", "Cloud", "PolyglotPersistence" } }; collection.Insert(post);
    • 21. Data integrity cannot be enforced
    • 22. NoSQL databases are eventual consistent
    • 23. NY ZH Free Free We look for a hotel room
    • 24. NY ZH booked Free We book the room
    • 25. NY ZH booked Free Inconsistency Inconsistency window
    • 26. NY ZH booked Free Inconsistency Someone else books the same room
    • 27. NY ZH booked booked Inconsistency Conflict!
    • 28. Why not handle such cases by business?
    • 29. performance consistency
    • 30. What do we have in our toolbox?
    • 31. A lot of database products
    • 32. Borat @BoratNoSQL I feel swamped, how can I differentiate these products?
    • 33. Key-value stores
    • 34. Document stores
    • 35. { "playerId": 1, "firstName": "Roger", "lastName": "Federer", "ranking": "#1", "address": { "city": "Wollerau" } "sponsors“: [ { "id": 1, "name": "Nike" "amount": "16’000 SFR" }, { "id": 2, "name": "Lindt" "amount": "5’000 SFR" }, { "id": 3, "name": "Credit Suisse" "amount": “13’000 SFR" }] } The document store data model
    • 36. Column-family stores
    • 37. Row 1 FirstName:Roger LastName:Federer Row 2 FirstName:Andy LastName:Murray Row 3 NickName:Rafa LastName:Nadal Row n-1 Fruit:Apple Price:1.40$ Row n Fruit:Cherry Price:2.60$ Column-Family: Players Column-Family: Fruits The column-family data model
    • 38. Graph databases
    • 39. The graph data model Node [1] Name = ‘John’ Node [2] Name = ‘Sara’ Node [5] Name = ‘Joe’ Node [3] Name = ‘Maria’ Node [4] Name = ‘Steve’ friend friend friend friend
    • 40. Decisions, decisions…
    • 41. Borat @BoratNoSQL With every database I have to take tradeoffs into account, I don’t want to choose only one!
    • 42. Pol·y·glot – Adjective Knowing or using several languages Per·sist·ence – Noun The continued or prolonged existence of something
    • 43. Retail Store Recomendations Neo4J Product Catalog Raven DB Financial Data MSSQL Shopping Cart Redis Polyglot persistence illustrated
    • 44. Borat @BoratNoSQL Sounds great! But where is the catch?
    • 45. We need appropriate skills
    • 46. Invest in software architecture
    • 47. Integration databases have been used for years Database Application A Application B
    • 48. Polyglot persistence doesn’t work here Database 1 Application A Application B Database 2
    • 49. Application databases do not share it’s data Database 1 Application A Application B Database 2
    • 50. Borat @BoratNoSQL Fine! But I have not only one application.
    • 51. Application database with SOA Database 1 Application A Application B Database 2 Service
    • 52. It’s all about layers
    • 53. Well known layers Presentation Layer Domain Layer Resource Access Layer ( Data Access Layer) Resources
    • 54. Common data tier design Presentation Domain DAL Resources RDBMS Search Transactions Caching Blobs Triggers Reporting User Interface Relational-ObjectObject-Relational
    • 55. The truth of reusability
    • 56. Data access with reusable and seamless services Presentation Domain DAL Resources RDBMS Search Transactions Caching Blobs Triggers Reporting User Interface Relational-ObjectObject-Relational Presentation Domain DAL Resources User Interface Search Transactions Caching Blobs Batch Reporting Key-Value Document RDBMS
    • 57. Putting all together Key-Value Document RDBMS Search Caching Reporting Domain Services User Interface Database Tier Middle Tier
    • 58. Use the right tool!
    • 59. Resources NoSQL Distilled Author: Martin Fowler, Pramod J. Sadalage ISBN: 978-0321826626 Making Sense of NoSQL Author: Dan McCreary, Ann Kelly ISBN: 978-1617291074 Links
    • 60. Thank you!