Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Oracle Week 2016 - Modern Data Architecture

120 views

Published on

Arthur Gimpel, Director of DataZone's talk about Modern Data Architecture - Polyglot persistence, NoSQL and other trans in the Big Data World.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Oracle Week 2016 - Modern Data Architecture

  1. 1. Modern Operational Data Architecture Arthur Gimpel, DataZone
  2. 2. About Me • Name: Arthur Gimpel • Position: Technology Evangelist, Solutions Architect, Trainer • Tech Stack: MongoDB, SQL Server, Couchbase, Elastic Stack, Redis, Kafka, Python, .NET
  3. 3. Relational Databases • First RDBMS was introduced in late 1970s • Exist in all possible flavors but share one thing - ACID • Still dominate the database market
  4. 4. RDBMS In Theory • Atomicity: All or nothing approach, transactions • Consistency: Hard state, every transaction changes the whole DBMS • Isolation: Transactions cannot interfere with each other • Durability: Every transaction is persisted
  5. 5. RDBMS Is Not Perfect • Everything is persisted, synchronously. Limited by IO performance • All data is bound to a tabular schema, hard to make changes in big databases • ACID makes horizontal scaling nearly* impossible • Complex schema slows down aggregations and queries drastically
  6. 6. NoSQL • Distributed / Horizontal Scalability • Mostly Open Source • Mostly schema less: • Key - Value • Document • Graph • Serves specific purposes
  7. 7. NoSQL - Key Value Stores • Key: • Usually string, equivalent to primary key in a relational database • Value: • Simple values: Int, Float, DateTime • Complex values: Array, Binary, XML, JSON
  8. 8. Key Value - Characteristics • Database is usually a set of unique keys, and its values • KV data stores are usually easy to distribute • Key Value access usually is VERY fast • Indexing and querying values is usually challenging
  9. 9. Key Value - Use Cases • Distributed caching • Session / temporary user data • Ad tech: Impressions • Ad tech: Serving data - profiles, segments • Recommendation engines - main data store
  10. 10. NoSQL - Graph Stores “In computing, a graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data” (Wikipedia)
  11. 11. Graph - Characteristics • Nodes are entities - for example a person • Properties describe nodes - for example age, name • Edges are relations between nodes and/or properties
  12. 12. Graph - Use Cases • Fraud detection • Recommendation engines - link analysis • Intelligence systems • Social Networks • Medical Research
  13. 13. NoSQL - Document Stores • Document databases usually store JSON • Used to store object oriented data • Usually used to avoid relational - object mismatch • Document stores have the highest adoption rate among NoSQL databases
  14. 14. Document Store - Characteristics • Information is stored in JSON variations • Some document stores support secondary indexes for easier querying • Documents are usually divided to logical groups (collections, buckets, types - instead of RDBMS tables)
  15. 15. Document Store - Use Cases • “Relational” use cases where there is a need for high scale (volume, velocity, variety) • Hierarchal data - aggregations • Search use cases
  16. 16. NoSQL - Challenges • Every data store has its purpose. There is no single solution to all database needs • NoSQL does not implement all of RDBMS’s abilities (CDC, Jobs, Stored Procedures, Triggers) • Every data store has its own languages, and APIs. There is no ANSI SQL
  17. 17. Not Only SQL
  18. 18. Polyglot Persistence Sample Use Cases • Add search capabilities to your database • Split session / temporary data processing to key value stores • Add Graph analysis capabilities to your operational database
  19. 19. Search Use Case
  20. 20. Search: Architecture #1
  21. 21. Search: Architecture #2
  22. 22. Architecture Comparison Architecture #1 Architecture #2 Data distribution strategy Data store based Application based Data distribution component Data Pipeline Message Queue Implementation Team Data Engineers / DevOps DevOps / Developers Implementation Complexity Low: Data pipeline development High: data access layer refactor Scalability Limited to RDBMS Scale Fully scalable regardless of RDBMS
  23. 23. Summary • Chose the relevant database engine for the right mission - replacing databases is not easy • Do not hesitate to use more than one database engine in your operational application, single point of truth will be created in the analytical stack • Sizing is no replacement for benchmark. Check your deployment carefully
  24. 24. DataZone Advanced Data Solutions Enterprise Search Data Flow Management Centralized Logging Operational Analytics Polyglot Persistence Business Analytics
  25. 25. DataZone Scale With Confidence Troubleshooting 
 & Tuning Technological 
 Evaluation Training Services Architecture Review Cost Management End-to-End Implementations Infrastructure Support / DevOps
  26. 26. Our Ecosystem
  27. 27. Keep in touch: contact@DataZone.io

×