Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Event (CodeOne 2018, San Francisco)

190 views

Published on

Data has been and will be the key ingredient to enterprise IT. What is changing is the nature, scope, and volume of data and its place in the IT architecture. Big data, unstructured data, and nonrelational data stored on Hadoop; NoSQL databases; and in Elasticsearch, caches, and message queues complements data in the enterprise RDBMS. Trends such as microservices that contain their own data, BASE, CQRS, and event sourcing have changed the way we store, share, and govern data. This session introduces patterns, technologies, and hypes for storing, processing, and retrieving data with products such as Oracle Database, Cassandra, MySQL, Neo4J, Kafka, Redis, Elasticsearch, Blockchain (Hyperledger) and Hadoop/Spark—locally, in containers, and in the cloud.

Published in: Software
  • Be the first to comment

  • Be the first to like this

50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Event (CodeOne 2018, San Francisco)

  1. 1. 50 Shades of Data how, when and why Big, Fast, Relational, NoSQL, Elastic, Event, CQRS On the many types of data, data stores and data usages 50 Shades of Data 1 µ µ Lucas Jellema, CTO of AMIS CodeOne 2018, San Francisco, USA
  2. 2. Lucas Jellema Architect / Developer 1994 started in IT at Oracle 2002 joined AMIS Currently CTO & Solution Architect Implementing Microservices on Oracle Cloud: Open, Manageable, Polyglot, and Scalable 2
  3. 3. Overview • Multiple types of data • Stored and processed in different ways • Same data sometimes used in multiple, different ways • Stored and processed multiple times – optimized for each use case • The meaning of some terms cannot be taken too literally • Real Time and Fresh • Integrity and Truth • Consistency and transactions • Understand your data • Meta: What does it mean? • Master: Where is the source? Implementing Microservices on Oracle Cloud: Open, Manageable, Polyglot, and Scalable 3
  4. 4. Tweet! #codeone
  5. 5. Select from <stream of tweet events> select text , author , timestamp from tweets Where tag = 'codeone' <--- streaming data
  6. 6. Select Running Count from <stream of tweet events> select tag , count(*) tweet_count from tweets group by tag
  7. 7. Tweets on #JEEConf #java #oraclecode Tweets Topic Oracle Cloud Event HubApplication Container TWEET_COUNT Topic Running Tweets Aggregation Client Client Client Client IoT metrics from hundreds of devices User actions & click events from webshop Live Traffic EventsMicroservices chatter Social Media events (Facebook, Whatsapp, …) IT Operations – monitoring metrics µ µ µ µ
  8. 8. Tweets on #JEEConf #java #oraclecode Tweets Topic Oracle Cloud Event HubApplication Container TWEET_COUNT Topic Running Tweets Aggregation Client Client Client Client IoT metrics from hundreds of devices User actions & click events from webshop Live Traffic EventsMicroservices chatter Social Media events (Facebook, Whatsapp, …) IT Operations – monitoring metrics µ µ µ µ
  9. 9. Real Time live | fresh | instantaneous | on line | synchronous
  10. 10. 50 Shades of Data 11
  11. 11. 50 Shades of Data 12
  12. 12. 50 Shades of Data 13
  13. 13. < 10 ms < 100 ms < 500 ms <3 secs > 3 secs 50 Shades of Data 14 Machine Response Human Reaction 14
  14. 14. < 10 ms < 100 ms < 500 ms <3 secs > 3 secs 50 Shades of Data 15 Machine Response Human Reaction 15
  15. 15. Integrity • Madelon’s pasje • Real world vs World of Databases • Relax! • Anomaly detection 50 Shades of Data 17
  16. 16. Data Constraints to protect integrity • Allowable values • Mandatory attributes • (Foreign Key) References • NULL • Constraints on • type • length • format • Spelling • Character encoding
  17. 17. Data is representation of the known real world • How useful is it to enforce data integrity?
  18. 18. Data Integrity • Why? • Is it about truth? • About regulations and by-the-book? • Allow IT systems to run smoothly and not get confused? • About auditability and non-repudiation? • What about the real world? • Data in IT is just a representation; if the world is not by the book – what should IT do?
  19. 19. 50 Shades of Data 21
  20. 20. Anomaly Detection • Find fishy values and derive business integrity rules by scanning data 50 Shades of Data 22
  21. 21. BOL - CQRS 50 Shades of Data 23
  22. 22. Books Online - WebShop 50 Shades of Data 24 Products Product updates firewall Data manipulation Data Quality (enforcement) <10K transactions Batch jobs next to online Speed is nice Read only On line Speed is crucial XHTML & JSON > 5M visits Webshop visits - searches - product details - Orders
  23. 23. 50 Shades of Data 25 Products Products Products Webshop visits - searches - product details - Orders firewall Data manipulation Data Quality (enforcement) <10K transactions Batch jobs next to online Speed is nice Read only On line Speed is crucial XHTML & JSON > 1M visits DMZ Read only JSON documents Images Text Search Scale Horizontally Stale but consistent Products Nightly generation Product updates
  24. 24. Hoe integreer je applicaties en data? 26 Products Data Manipulation Data Retrieval
  25. 25. Hoe integreer je applicaties en data? 27 Special Products Product Clusters ProductsData Manipulation Data Retrieval Food Stuff Toys Quick Product Search Index Product Store in SaaS app
  26. 26. Comand Query Responsbility Segregation = CQRS 50 Shades of Data 28 Special Products Product Clusters ProductsData Manipulation Data Retrieval Food Stuff Toys Quick Product Search Index Product Store in SaaS app Detect changes Extract Data Transport Data Convert Data Apply Data
  27. 27. From C to Q • How quickly? • How frequently? • How reliably? • How atomically? • 50 Shades of Data 29 Products Quick Product Search Index
  28. 28. 50 Shades of Data 30
  29. 29. From C to Q • How quickly? • How frequently? • How reliably? • How atomic? • • Data Authorization Considerations • Locations & Connectivity • Full resynch | restore of Query Store 50 Shades of Data 31 Products Quick Product Search Index
  30. 30. [let go of] The Holy Grail of Normalization • Normalize to prevent • data redundancy • discrepancies (split brain) • storage waste 50 Shades of Data 32
  31. 31. CQRS is not new 50 Shades of Data 33
  32. 32. Event Sourcing Driving CQRS 50 Shades of Data 34 Events Event Store Current State accountId: 123 amount: 10 Owner: Jane Doe
  33. 33. Event Sourcing Driving CQRS 50 Shades of Data 35 Events Event Store Current State Other State Aggregate
  34. 34. Distributed Database with Event Sourcing & Current State Implementing Microservices on Oracle Cloud: Open, Manageable, Polyglot, and Scalable36 World State
  35. 35. SQL is not good at anything • But it sucks at nothing
  36. 36. Graph Database • Natural fit during development • Superior (10-1000 times better) performance Person liked by anyone liked by Bob Find People liked by anyone liked by Bob Find People liked by anyone liked by Bob
  37. 37. From relational SQL to Graph query
  38. 38. SQL vs NoSQL
  39. 39. SQL vs NoSQL ACID vs BASE Relational vs …
  40. 40. Relational Databases • Based on relational model of data (E.F. Codd), a mathematical foundation • Uses SQL for query, DML and DDL • Transactions are ACID (Atomicity, Consistency, Isolation, Durability) • All or nothing • Constraint Compliant • Individual experience [in a multi-session environment] (aka concurrency) • Down does not hurt
  41. 41. ACID comes at a cost – performance & scalability • Transaction results have to be persisted [before the transaction completes] in order to guarantee D • Concurrency requires some degree of locking (and multi-versioning) in order to have I • Constraint compliance (unique key, foreign key) means all data hangs together (as do all transactions) in order to have C • Two-phase commit (across multiple participants) introduces complexity, dependencies and delays, yet required for A
  42. 42. 50 Shades of Data 45
  43. 43. Types of NoSQL
  44. 44. 50 Shades of Data 48
  45. 45. NoSQL n’est pas No SQL 50 Shades of Data 49
  46. 46. 50 Shades of Data 50
  47. 47. When things were simple RDBMS SQL ACID Data files Log Files Backup Backup Backup SAN
  48. 48. And then stuff happened Middle Tier: Java EE (Stateful) application Client Tier: Browser Client Tier: Browser Client Tier: Browser Mobile App (offline) Mobile App (offline) Mobile App (offline) Data Warehouse OO, XML, JSON Content Management Big Data Fast Data API API API µ λ
  49. 49. 50 Shades of Data 53
  50. 50. 50 Shades of Data Oracle Database SQL RDBMS ACID
  51. 51. 50 Shades of Data 55 http IoT Fast Data Ingestion Sharding http Machine Learning No SQL Big Data SQL Multitenant (Pluggable Database) Architecture Flashback
  52. 52. 50 Shades of Data 56
  53. 53. 50 Shades of Data 57
  54. 54. 50 Shades of Data 58
  55. 55. 50 Shades of Data 59
  56. 56. 50 Shades of Data 60 http IoT Fast Data Ingestion Sharding http Machine Learning No SQL Big Data SQL Multitenant (Pluggable Database) Architecture Flashback
  57. 57. 50 Shades of Data 61
  58. 58. Oracle Database XE – eXpress Edition • Current version: XE 11gR2 • Coming in October 2018: XE 18c, with yearly releases (19c, 20c, …) • All functionality of single instance Oracle Database Enterprise Edition plus Extra Options • (including R, Machine Learning, Spatial, Compression, Multi Tenant, Partitioning) • Code and Data Compatible with other editions – including plug/unplug • Resource Limitations for 18c: • 2 CPUs • 2 GB of memory • 12 GB of disk space (using Compression effectively 40 GB of data) • No patches or support 50 Shades of Data 62
  59. 59. Final Demo • Microservice 50 Shades of Data 68
  60. 60. Microservices • Agile | Flexible | Scalable | (Re)Deployable • Independent | Decoupled | Isolated • Communicate asynchronously, via events • Have their own private bounded context – the data they require to function • Their lifeblood 50 Shades of Data 69
  61. 61. Microservices State Cache RDBMS Document Store NoSQL Generic Platform for running microservices Event Hub Big Data Block Storage LDAP
  62. 62. Bounded context of microservices • A micoservice needs to be able to run independently • It needs to contain & own all data required to run • It cannot depend on other microservices API Customer APIUI OrderCustomerModified event
  63. 63. Order Microservice Demo – Maintaining Derived Data in Bounded Context 50 Shades of Data 72 Application Container Customer Microservice Customers Topic Event Hub Application Container DBaaS
  64. 64. Wrap Up
  65. 65. 74
  66. 66. usage Total Cost of Data Ownership authorization distribution formatvolatility volume ACID demands availability freshness requirements (staleness allowance) location speed ownership required consistency integrity query patterns
  67. 67. 50 Shades of Data 77
  68. 68. Summary • Multiple types of data • Stored and processed in different ways • Same data sometimes used in multiple, different ways • Stored and processed multiple times – optimized for each use case • The meaning of some terms cannot be taken too literally • Real Time and Fresh • Integrity and Truth • Consistency and transactions • Understand your data • Meta: What does it mean? • Master: Where is the source? Implementing Microservices on Oracle Cloud: Open, Manageable, Polyglot, and Scalable 78
  69. 69. 50 Shades of Data 79
  70. 70. 50 Shades of Data 80
  71. 71. Wrap Up DATA DATADATA
  72. 72. Thank you Dank je wel • Blog: technology.amis.nl • Email: lucas.jellema@amis.nl • : @lucasjellema • : lucas-jellema • : www.amis.nl, info@amis.nl https://github.com/lucasjellema

×