Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2015 02-09 - NoSQL Vorlesung Mosbach

4,311 views

Published on

Vorlesung

Published in: Technology
  • Be the first to comment

  • Be the first to like this

2015 02-09 - NoSQL Vorlesung Mosbach

  1. 1. 09.02.2015 Dipl-Inf. (FH) Johannes Hoppe
  2. 2. .Daten 2.Vernetzung 3. Individualisierung Trends!
  3. 3. Scale-up Vertikale Skalierung Server auf mehr Leistungsfähigkeit trimmen
  4. 4. Scale-out horizontale Skalierung Einfügen von Nodes (Rechnerknoten)
  5. 5. kein relationales Datenmodell (kein SQL) verteilte und horizontale Skalierbarkeit schemafrei / schwache Schemarestriktionen anderes Konsistenzmodelle
  6. 6. Schemafrei kein ALTER TABLE kein Wartungsfenster * Datenversionierung im Code! * morgens ausschlafen
  7. 7. Anforderungen an ein verteiltes System Consistency Konsistenz Availability Verfügbarkeit Partition Tolerance Ausfalltoleranz
  8. 8. CAP Theorem 2000: E. Brewer, N. Lynch You can satisfy at most 2 out of the 3 requirements
  9. 9. Consistency The system is in a consistent state after an operation All clients see the same data Strong consistency (ACID) vs. eventual consistency (BASE) ACID: Atomicity, Consistency, Isolation and Durability BASE: Basically Available, Soft state, Eventually consistent
  10. 10. AvailabilitySystem is “always on”, no downtime Node failure tolerance – all clients can find some available replica Software/hardware upgrade tolerance
  11. 11. Partition toleranceSystem continues to function even when split into disconnected subsets (network disruption) Not only for reads, but writes as well
  12. 12. CAP Theorem  CA› Single site clusters (easier to ensure all nodes are always in contact) › When a partition occurs, the system blocks › e.g. usable for two-phase commits (2PC) which already require/use blocks
  13. 13. CAP Theorem  CA› Single site clusters (easier to ensure all nodes are always in contact) › When a partition occurs, the system blocks › e.g. usable for two-phase commits (2PC) which already require/use blocks Obviously, any horizontal scaling strategy is based on data partitioning; therefore, we are forced to decide between consistency and availability.
  14. 14. CAP Theorem  CP› Some data may be inaccessible (availability sacrificed), but the rest is still consistent/accurate › e.g. sharded database
  15. 15. CAP Theorem  AP› System is still available under partitioning, but some of the data returned my be inaccurate › Need some conflict resolution strategy › e.g. Master/Slave replication
  16. 16. “Drum prüfe, wer sich ewig bindet.” Friedrich Schiller
  17. 17. Klassifizierung Key-Value stores  Redis Document stores  MongoDB & RavenDB Wide Column stores Graph-Datenbanken und viele weitere
  18. 18. Redis
  19. 19. Caching Queuing Counting views
  20. 20. Speed
  21. 21. + Persistenz Snapshot Journa l oder
  22. 22. key value
  23. 23. customer_2 2 String, binary safe key value
  24. 24. customer_2 2 key value Strings Listen Mengen (Sets) Sortierte Mengen Hash-Werte (String-Paare)
  25. 25. GET & SET In der Shell › SET note1:title "Mittag" › SET note1:message "nicht vergessen" › KEYS note1:* › GET note1:title › DEL note1:title note1:message
  26. 26. GET mit C# / .NET
  27. 27. Live Demo https://github.com/JohannesHoppe/WebNoteNoSQL
  28. 28. RavenDB
  29. 29. JSON Transactional LINQ Lucene .NET first AGPL / dual
  30. 30. RavenDb Written by Oren Eini aka Ayende Rahien › Hibernating Rhinos › Rhino Mocks & Rhino.ServiceBus Written in C#
  31. 31. Deployment Get it via NuGet Change defaults in Raven.Server.exe.config › It’s safe by default Just run the Raven.Server.exe in the /server/ folder
  32. 32. Units › Documents › Collections › Indexes › Attachments
  33. 33. Safeby default Useful defaults › E.g. Limited page size – No Accidental SELECT * ACID (Transactional) *
  34. 34. Designed to “just work” Schema Free › Hardly any mapping required › dynamic (C# 4) yields great power
  35. 35. Designed to “just work” (with .NET) Fluent API Unit of Work Pattern Extensible – Plugin Support
  36. 36. Makes developers happy › Testable › Interfaces all over › In-Memory Database › Extensible – Plugin Support
  37. 37. In Memory Instance Embedded Mode using (var documentStore = new EmbeddableDocumentStore{ RunInMemory = true}.Initialize()) { using (var session = documentStore.OpenSession()) { // Run complex test scenarious } }
  38. 38. APIs › Native .NET Client API › HTTP API (Pseudo REST) Indexes › Written as Linq Queries › Indexed with Lucene .NET › Lucene Syntax for querying
  39. 39. “While being RESTful is a goal of the HTTP API, it is secondary to the goal of exposing easy to use and powerful functionality” Ayende Rahien on the HTTP API - http://ravendb.net/documentation/docs-http-api-restful
  40. 40. HTTP API › Caching › E-Tags › Lucene Queries possible C:>curl -X GET http://localhost:8080/docs/Categories/1 -i HTTP/1.1 200 OK Content-Type: application/json; charset=utf-8 ETag: 00000000-0000-0200-0000-000000000004 { "Name" : "Normal Importance", "Color" : "green" }
  41. 41. MongoDB
  42. 42. data
  43. 43. Scale-out horizontale Skalierung Einfügen von Nodes (Rechnerknoten)
  44. 44. Database Timeline IBM’s IMS Codd publishes relational model paper in 1970 1966 1969 1970 1985 2000 2004 2007 Agile becoming more popular 1990’s 2009 CODASYL model published Term “object-oriented database” appears Brewer’s CAP born Google BigTable Amazon Dynamo Apache Cassandra initial release 2008 MongoDB initial release 1973 1974 INGRES SQL invented 1977 Oracle founded 10gen founded NoSQL Movement
  45. 45. NoSQL MongoDB Quick Reference Cards http://www.10gen.com/reference
  46. 46. BSON Master/Slave JavaScript C# Driver Sharding GNU AGPL*
  47. 47. “Deployment” › Standardverzeichnis erstellen: c:datadb › Server-Start: mongod.exe › Shell: mongo.exe
  48. 48. CRUD – Create In der Shell › use WebNote › db.Notes.save( { Title: 'Mittag', Message: 'nicht vergessen‘ } ); So funktioniert der Befehl › db.Notes.save
  49. 49. CRUD – Create …with a bit JavaScript for(i=0; i<1000; i++) { ['quiz', 'essay', 'exam'].forEach(function(name) { var score = Math.floor(Math.random() * 50) + 50; db.scores.save({student: i, name: name, score: score}); }); } db.scores.count();
  50. 50. CRUD – Read Queries werden ebenso im Dokument-Stil spezifiziert › db.Notes.find(); › db.Notes.find({ Title: /Test/i }); › db.Notes.find( { "Categories.Color": "red"}).limit(1);
  51. 51. CRUD – Update › db.Notes.update({Title: 'Test'}, {'$set': {Categories: []}}); › db.Notes.update({Title: 'Test'}, {'$push': { Categories: {Color: 'Red'} } });
  52. 52. CRUD – Delete › db.dropDatabase(); › db.Notes.drop(); › db.Notes.remove();
  53. 53. C# Driver
  54. 54. Live Demo https://github.com/JohannesHoppe/WebNoteNoSQL
  55. 55. Consistency
  56. 56. Anforderungen an ein verteiltes System Consistency Konsistenz Availability Verfügbarkeit Partition Tolerance Ausfalltoleranz
  57. 57. C# Driver Strong consistency Eventually consistency Read
  58. 58. Write Primary Secondary Secondary Read Strong Consistency C# Driver
  59. 59. Eventual Consistency Primary Secondary Secondary Read Write Read C# Driver
  60. 60. Sharding Primary C# Driver Primary Primary
  61. 61. C# Driver Fire and forget Wait for error Wait for fsync Wait for journal sync Wait for replication Write
  62. 62. Atomic!
  63. 63. kein relationales Datenmodell (kein SQL) verteilte und horizontale Skalierbarkeit schemafrei / schwache Schemarestriktionen anderes Konsistenzmodell
  64. 64. Hands ON!
  65. 65. Data Import (hands-on.zip) cd dump_training mongorestore -d training -c scores scores.bson cd dump_digg mongorestore -d digg -c stories stories.bson
  66. 66. Test (in the shell) use digg db.stories.findOne();
  67. 67. Exercises 1. Find all scores less than 65. 2. Find the lowest quiz score. Find the highest quiz score. 3. Write a query to find all digg stories where the view count is greater than 1000. 4. Query for all digg stories whose media type is either 'news' or 'images' and where the topic name is 'Comedy’. 5. Find all digg stories where the topic name is 'Television' or the media type is 'videos'. Skip the first 5 results, and limit the result set to 10.
  68. 68. CRUD – Update › use digg; › db.people.update({name: 'Smith'}, {'$set': {interests: []}}); › db.people.update({name: 'Smith'}, {'$push': {interests: ['chess']}});
  69. 69. Exercises 1. Set the proper 'grade' attribute for all scores. For example, users with scores greater than 90 get an 'A.' Set the grade to ‘B’ for scores falling between 80 and 90. 2. You're being nice, so you decide to add 10 points to every score on every “final” exam whose score is lower than 60. How do you do this update?
  70. 70. “MapReduce is the Uzi of aggregation tools. Everything described with count, distinct and group can be done with MapReduce, and more.” Kristina Chadorow, Michael Dirolf in MongoDB – The Definitive Guide
  71. 71. Map Reduce 2 1 3 2 1 3 Input data Intermediate data Output dataMAP REDUCE
  72. 72. MapReduce To use map-reduce, you first write a map function. var map = function() { emit(this.user.name, {diggs: this.diggs, posts: 0}); };
  73. 73. MapReduce The reduce functions then aggregation those docs by key. var reduce = function(key, values) { var diggs = 0; var posts = 0; values.forEach(function(doc) { diggs += doc.diggs; posts += 1; }); return {diggs: diggs, posts: posts}; };
  74. 74. MapReduce Now both are used to perform custom aggregation. db.stories.mapReduce(map, reduce, {out: 'digg_users'}); db.digg_users.find();
  75. 75. Vorsicht mein Freund!
  76. 76. “MapReduce is slower and is not supposed to be used in ‘real time’. You ran MapReduce as a background job.” Kristina Chadorow, Michael Dirolf in MongoDB – The Definitive Guide
  77. 77. Schema Design
  78. 78. BSONhttp://bsonspec.org JSON
  79. 79. JSON  BSON All JSON documents are stored in a binary format called BSON. BSON supports a richer set of types than JSON. http://bsonspec.org
  80. 80. Terminologie RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Join Embedding & Linking Partition Shard Partition Key Shard Key
  81. 81. Schema Design Relationale Datenbank
  82. 82. Schema Design Dokumentenbasierte DB
  83. 83. embedding Schema Design Dokumentenbasierte DB
  84. 84. embedding linking Schema Design Dokumentenbasierte DB
  85. 85. Patterns
  86. 86. Vererbung
  87. 87. Vererbung - Tabelle id type area radius length width 1 circle 3.14 1 NULL NULL 2 square 4 NULL 2 NULL 3 rect 10 NULL 5 2
  88. 88. Vererbung - Dokument > db.shapes.find() › { _id: "1", type: "c", area: 3.14, radius: 1} › { _id: "2", type: "s", area: 4, length: 2} › { _id: "3", type: "r", area: 10, length: 5, width: 2} // Shapes mit radius > 0 finden > db.shapes.find( { radius: { $gt: 0 } } )
  89. 89. One to Many
  90. 90. One to Many Embedded Array blogs: { author : “Johannes", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [ { author : “Klaus", date : ISODate("2011-09-19T09:56:06.298Z"), text : “toller Artikel" } ] }
  91. 91. ist erlaubt!
  92. 92. One to Many Normalisiert (2 Collections) blogs: { _id: 1000, author: “Johannes", date: ISODate("2011-09-18"), comments: [ {comment : 1)} ]} comments : { _id : 1, blog: 1000, author : “Klaus", date : ISODate("2011-09-19")} > blog = db.blogs.find({ text: "Destination Moon" }); > db.comments.find( { blog: blog._id } );
  93. 93. Many - Many
  94. 94. // Jedes Produkt verlinkt die IDs der Kategorien products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } Many - Many
  95. 95. // Jedes Produkt verlinkt die IDs der Kategorien products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } // Jede Kategorie verlinkt die IDs der Produkte categories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] } categories: { _id: 21, name: "movie", product_ids: [ 10 ] } Many - Many
  96. 96. // Jedes Produkt verlinkt die IDs der Kategorien products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } // Jede Kategorie verlinkt die IDs der Produkte categories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] } categories: { _id: 21, name: "movie", product_ids: [ 10 ] } // Alle Kategorien für ein Produkt > db.categories.find( { product_ids: 10 } ) Many - Many
  97. 97. ist erlaubt!
  98. 98. // Jedes Produkt verlinkt die IDs der Kategorien products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } // Kategorien beinhalten keine Assoziationen categories: { _id: 20, name: "adventure"} Alternative: Many - Many
  99. 99. // Jedes Produkt verlinkt die IDs der Kategorien products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } // Kategorien beinhalten keine Assoziationen categories: { _id: 20, name: "adventure"} // Alle Produkte für eine Kategorie > db.products.find( { category_ids: 20 } ) Alternative: Many - Many
  100. 100. // Jedes Produkt verlinkt die IDs der Kategorien products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } // Kategorien beinhalten keine Assoziationen categories: { _id: 20, name: "adventure"} // Alle Produkte für eine Kategorie > db.products.find( { category_ids: 20 } ) // Alle Kategorien für ein Produkt product > product = db.products.find( { _id: some_id } ) > db.categories.find({_id: {$in : product.category_ids}}) Alternative: Many - Many
  101. 101. JSON = BSON BSON in, BSON inside, BSON out Embedding oder Linking Alles ist erlaubt *
  102. 102. Software Tests
  103. 103. your code is broken …until proven otherwise!
  104. 104. Unit Test Checklist
  105. 105. .deserialize 2.map-reduce 3. queries Most important things to test.
  106. 106. ExternalDependencies Integration Tests
  107. 107. “Integration Tests are a Scam” J.B. Rainsberger
  108. 108. Usual Problems with Integration Tests false red unpredictable, network down, software updates… 1.
  109. 109. Usual Problems with Integration Tests long running slow feedback no feedack false security 2.
  110. 110. Usual Problems with Integration Tests bad design excessive setup AAA  AAAAAAA hides defects 3.
  111. 111. Usual Problems with Integration Tests comfortable managers, business constraints, pragmatic solutions, own laziness… Bugs will come back to haunt you! 4.
  112. 112. Solutions Or better: how to reduce the amount of problems
  113. 113. false red1. Express
  114. 114. long running2. bad design3. In Memory Instance Embedded Mode
  115. 115. In Memory Instance Embedded Mode using (var documentStore = new EmbeddableDocumentStore{ RunInMemory = true}.Initialize()) { using (var session = documentStore.OpenSession()) { // Run complex test scenarious } }
  116. 116. Vielen Dank!
  117. 117. NoSQL: Einstieg in die Welt nicht- relationaler Web 2.0 Datenbanken MongoDB: The Definitive Guide MongoDB in ActionRavenDB Mythology Documentation https://s3.amazonaws.com/ daily-builds/RavenDBMythology-11.pdf
  118. 118. Bildnachweise Bug © 123RF Stock Foto Cloud web © vege – Fotolia.com Race car - red and black © braverabbit – Fotolia.com PC - Computerkomponenten - Icons Nr. 1 © vanhorden – Fotolia.com Der Ordner © beermedia – Fotolia.com Ausgewählter Ordner © Spectral-Design – Fotolia.com funny cartoon builder © artenot – Fotolia.com 3D rendering of an architecture model 2 © Franck Boston – Fotolia.com Alle verwendeten Logos und Markenzeichen sind Eigentum ihrer eingetragenen Besitzer.
  119. 119. PAUSE!

×