SlideShare a Scribd company logo
Session 1: Foundations
Introducing Concepts & Principles
Pascal Desmarets
3
4
5
Why is Data Modeling a key success factor?
6
Schemas are everywhere! (even in schemaless DBs…)
7
Before
Now
Polyglot Persistence and Data Pipelines
8
Relational Document Columnar Property
Graph
Semantic Web
Triple Stores
Key-Value APIs & Storage
formats
Schemas aren’t limited to relational databases
■ for data storage (data-at-rest)
● Databases: RDBMS and some NoSQL
● File formats: Avro, JSON, Parquet, YAML, …
■ for data exchanges (data-in-motion)
● APIs (Application Programming Interfaces):
● in web services
● Representational State Transfer
(REST) – including Swagger,
OpenAPI, AsyncAPI
● GraphQL
● RPCs (remote procedure calls)
● ProtoBuf, Thrift, …
● MQs (message-passing systems)
● Kafka, Azure EventHub, Pulsar,
…
● ETLs (Extract-Transform-Load)
9
Copyright © 2016-2023 Hackolade
Schema changes happen frequently, and often without
warning, resulting in both ugly and unmaintainable code.
10
Architecture complexity
Volume and pace of changes
Data gets lost and out-of-sync
Diverging requirements and
objectives
11
Data Modeling is a Key Success Factor
Data models and schemas are
perhaps the most important part of
developing software, because they
have such a profound effect:
■ not only on how the software is
written,
■ but also on how we think about
the problem that we are solving.
Martin Kleppmann,
Designing Data-Intensive Applications
Data Modeling for NoSQL
13
Different databases require different types of data models
14
Relational
Analytics
Key-Value Document Property Graph
Knowledge Graph
Columnar Storage Formats
SQL NoSQL
Objectives of a
good NoSQL
data model
15
Achieve good performance by
leveraging features of NoSQL
Maximize developer
productivity
by allowing
agile schema evolution
Minimize total cost of
ownership of the whole
solution
Mindshift from
application-agnostic to application-specific modeling
16
Data Data Model Application
Application
Design
Access
patterns
& Queries
Data Model Data
Relational
NoSQL
Conceptual Logical Physical
Polyglot
Data Model
Target-specific
Data Model
Relational
data modeling
NoSQL/SQL/API/…
data modeling
Domain
Driven
Data
Modeling
Multiple data modeling stages
Modern process with Domain-Driven Data Modeling
■ DDDM allows you to
● focus on your core ("domain and sub-domains")
● break down complex problems into smaller ones ("bounded context")
● use data modeling as a communication tool ("ubiquitous language")
● keep together what belongs together ("aggregates")
● reach a shared understanding between business and tech ("collaboration of domain
experts and developers")
● iterate and evolve ("continuous refinement")
18
Migrating relational database structures to ScyllaDB
19
RDBMS ScyllaDB
Benefits of data modeling
■ While traditional data modeling may be perceived to
get in the way of development and take too much
time…
■ Next-gen data modeling tools such as Hackolade
Studio are recognized to:
● facilitate Agile development
● reduce development time
● increase application quality
● implement consistent definitions of data
● improve data quality
● enable better data governance and compliance
● facilitate documentation and communication
To leverage the dynamic schema of ScyllaDB, data modeling
turns out to be even more important than with relational
databases
20
Presenter
Pascal Desmarets
Founder & CEO
Hackolade
pascal@hackolade.com
@hackolade
Download free trial at
https://hackolade.com
Original slides
not used
22
It may be misleading that…
■ ScyllaDB tables look like RDBMS tables
■ CQL looks like SQL
23
The ideal ScyllaDB application has
the following characteristics
■ Writes exceed reads by a large margin
■ Data is rarely updated and when updates are made, they are
idempotent (the result of a successful performed operation is
independent of the number of times it is executed)
■ Read Access is by a known primary key
■ Data can be partitioned via a key that allows the database to be
spread evenly across multiple nodes
■ There is no need for joins or aggregates
24
Excellent ScyllaDB Use Cases
■ Transaction logging: purchases, test scores, movies watched and
movie latest location
■ Recommendation and personalization engines
■ Fraud detection
■ Tracking pretty much anything including order status, packages, etc
■ Storing time series data (as long as you do your own aggregates)
• Health tracker data
• Weather service history
• Internet of things status and event history
• Sensor data in general
■ Messaging systems: chats, collaboration, and instant messaging
apps, etc.
25
26
Denormalization is expected
Writes are (almost) free
No DB-level joins
No referential integrity
Indexing useful in specific
circumstances
Differences
between
ScyllaDB and
relational
databases
ScyllaDB Data Model Principles (1 of 3)
■ Keyspace: container for tables in a Cassandra data model
■ Table: container for an ordered collection of rows
■ Rows: made of a primary key plus an ordered set of columns,
themselves made of name/value pairs.
■ No need to store a value for every column each time a new row is
stored.
27
ScyllaDB Data Model Principles (2 of 3)
■ Primary key: a composite made of a partition key plus an
optional set of clustering columns.
● Partition key: is responsible for data distribution across the nodes. It
determines which node will store a given row. It can be one or more columns.
● Clustering columns: is responsible for sorting the rows within the partition. It
can be zero or more columns.
28
ScyllaDB Data Model Principles (3 of 3)
■ Data type: defined to constrain the values stored in a column. Data
types include character and numeric types, collections, and user-
defined types. A column also has other attributes: timestamps and
time-to-live.
■ Secondary index: an index on any columns that is not part of the
primary key. Secondary indexes are not recommended on columns
with high cardinality or very low cardinality, or on columns that a
frequently updated or deleted.
■ Joins: cannot be performed at the database level. If there is need for a
join, either it must be performed at the application level, or preferably,
the data model should be adapted to create a denormalized table
that represents the join results.
29
Data modeling for ScyllaDB is a balancing act
■ Two primary rules of data modeling in ScyllaDB:
● each partition should have roughly same amount of data
● read operations should access minimum partitions, ideally only one
■ The two data modeling principles often conflict, therefore you
have to find a balance between the two based on domain
understanding and business needs
■ Anticipate growth: a data model that may make sense with a
particular transaction volume, may not longer make sense when
multiplied 100x or 1000x
30
Slide Title
31
Slide Title
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent
consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus
sed sollicitudin.
■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu.
Fusce viverra nibh sed egestas faucibus.
■ Nunc bibendum eget metus eget gravida.
■ Cras ultrices tortor mauris, nec porta sapien placerat ut.
32
Slide Title
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent
consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus
sed sollicitudin.
■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu.
Fusce viverra nibh sed egestas faucibus.
■ Nunc bibendum eget metus eget gravida.
■ Cras ultrices tortor mauris, nec porta sapien placerat ut.
33
Slide Title
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent
consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus
sed sollicitudin.
■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu.
Fusce viverra nibh sed egestas faucibus.
■ Nunc bibendum eget metus eget gravida.
■ Cras ultrices tortor mauris, nec porta sapien placerat ut.
34
Slide Title
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent
consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus
sed sollicitudin.
■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu.
Fusce viverra nibh sed egestas faucibus.
■ Nunc bibendum eget metus eget gravida.
■ Cras ultrices tortor mauris, nec porta sapien placerat ut.
35
35
Key Point
36
Dark Section Title Slide
Slide Title Top Gradient 2
Pellentesque habitant morbi tristique senectus et netus et malesuada
fames ac turpis egestas. Morbi ultrices sed nulla non pellentesque. Ut
ac tortor facilisis neque ultricies egestas quis tempus erat. Aenean a
finibus leo, sit amet congue nibh.
38
38
How to Use This Slide Deck Template
There are multiple background options, dark and light with more or
less gradient colors.
The font is Montserrat
Base font size for body text is 18pts, adjust accordingly.
Color Palette
#008dff
#03ddda
#2b3990
#f9ae00
#3d444c
39
All Blank Slide Title Goes Here
40
Table Example
Column 1 Column 2 Column 3
Requests/Minut
e
12M 500K
AVG Latency 4 ms 8 ms
Max Latency 8 ms 35 ms
41
Keep in touch!
Presenter Name
Job title
Company name
email@email.com
@socialhandle

More Related Content

What's hot

Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
confluent
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
ScyllaDB
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
Yoshinori Matsunobu
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
Databricks
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Anant Corporation
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
Alkin Tezuysal
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
Databricks
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Databricks
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
No sqlpresentation
No sqlpresentationNo sqlpresentation
No sqlpresentation
Salma Gouia
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
Ivo Andreev
 

What's hot (20)

Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
No sqlpresentation
No sqlpresentationNo sqlpresentation
No sqlpresentation
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 

Similar to NoSQL Data Modeling Foundations — Introducing Concepts & Principles

Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
ScyllaDB
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
ScyllaDB
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
Alex Meadows
 
Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...
Marco Tusa
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Ali Hodroj
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Aaron Saray
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
DataStax
 
The True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsThe True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS Options
ScyllaDB
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Edunomica
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
How to get started in Big Data for master's students
How to get started in Big Data for master's studentsHow to get started in Big Data for master's students
How to get started in Big Data for master's students
Mohamed Nadjib MAMI
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Daniel Zivkovic
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
Mostafa Majidpour
 
From Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data ApplicationsFrom Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data Applications
Databricks
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j
 
Introduction to asdfghjkln b vfgh n v
Introduction to asdfghjkln b vfgh n    vIntroduction to asdfghjkln b vfgh n    v
Introduction to asdfghjkln b vfgh n v
23mz02
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Kent Graziano
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
ScyllaDB
 
AWS User Group October
AWS User Group OctoberAWS User Group October
AWS User Group October
PolarSeven Pty Ltd
 

Similar to NoSQL Data Modeling Foundations — Introducing Concepts & Principles (20)

Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
 
Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
The True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsThe True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS Options
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
How to get started in Big Data for master's students
How to get started in Big Data for master's studentsHow to get started in Big Data for master's students
How to get started in Big Data for master's students
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
From Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data ApplicationsFrom Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data Applications
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You
 
Introduction to asdfghjkln b vfgh n v
Introduction to asdfghjkln b vfgh n    vIntroduction to asdfghjkln b vfgh n    v
Introduction to asdfghjkln b vfgh n v
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
AWS User Group October
AWS User Group OctoberAWS User Group October
AWS User Group October
 

More from ScyllaDB

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
ScyllaDB
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
ScyllaDB
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
ScyllaDB
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
ScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
ScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
ScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
ScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
ScyllaDB
 

More from ScyllaDB (20)

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 

Recently uploaded

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 

Recently uploaded (20)

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 

NoSQL Data Modeling Foundations — Introducing Concepts & Principles

  • 1. Session 1: Foundations Introducing Concepts & Principles Pascal Desmarets
  • 2.
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. Why is Data Modeling a key success factor? 6
  • 7. Schemas are everywhere! (even in schemaless DBs…) 7 Before Now
  • 8. Polyglot Persistence and Data Pipelines 8 Relational Document Columnar Property Graph Semantic Web Triple Stores Key-Value APIs & Storage formats
  • 9. Schemas aren’t limited to relational databases ■ for data storage (data-at-rest) ● Databases: RDBMS and some NoSQL ● File formats: Avro, JSON, Parquet, YAML, … ■ for data exchanges (data-in-motion) ● APIs (Application Programming Interfaces): ● in web services ● Representational State Transfer (REST) – including Swagger, OpenAPI, AsyncAPI ● GraphQL ● RPCs (remote procedure calls) ● ProtoBuf, Thrift, … ● MQs (message-passing systems) ● Kafka, Azure EventHub, Pulsar, … ● ETLs (Extract-Transform-Load) 9 Copyright © 2016-2023 Hackolade
  • 10. Schema changes happen frequently, and often without warning, resulting in both ugly and unmaintainable code. 10 Architecture complexity Volume and pace of changes Data gets lost and out-of-sync Diverging requirements and objectives
  • 11. 11
  • 12. Data Modeling is a Key Success Factor Data models and schemas are perhaps the most important part of developing software, because they have such a profound effect: ■ not only on how the software is written, ■ but also on how we think about the problem that we are solving. Martin Kleppmann, Designing Data-Intensive Applications
  • 13. Data Modeling for NoSQL 13
  • 14. Different databases require different types of data models 14 Relational Analytics Key-Value Document Property Graph Knowledge Graph Columnar Storage Formats SQL NoSQL
  • 15. Objectives of a good NoSQL data model 15 Achieve good performance by leveraging features of NoSQL Maximize developer productivity by allowing agile schema evolution Minimize total cost of ownership of the whole solution
  • 16. Mindshift from application-agnostic to application-specific modeling 16 Data Data Model Application Application Design Access patterns & Queries Data Model Data Relational NoSQL
  • 17. Conceptual Logical Physical Polyglot Data Model Target-specific Data Model Relational data modeling NoSQL/SQL/API/… data modeling Domain Driven Data Modeling Multiple data modeling stages
  • 18. Modern process with Domain-Driven Data Modeling ■ DDDM allows you to ● focus on your core ("domain and sub-domains") ● break down complex problems into smaller ones ("bounded context") ● use data modeling as a communication tool ("ubiquitous language") ● keep together what belongs together ("aggregates") ● reach a shared understanding between business and tech ("collaboration of domain experts and developers") ● iterate and evolve ("continuous refinement") 18
  • 19. Migrating relational database structures to ScyllaDB 19 RDBMS ScyllaDB
  • 20. Benefits of data modeling ■ While traditional data modeling may be perceived to get in the way of development and take too much time… ■ Next-gen data modeling tools such as Hackolade Studio are recognized to: ● facilitate Agile development ● reduce development time ● increase application quality ● implement consistent definitions of data ● improve data quality ● enable better data governance and compliance ● facilitate documentation and communication To leverage the dynamic schema of ScyllaDB, data modeling turns out to be even more important than with relational databases 20
  • 21. Presenter Pascal Desmarets Founder & CEO Hackolade pascal@hackolade.com @hackolade Download free trial at https://hackolade.com
  • 23. It may be misleading that… ■ ScyllaDB tables look like RDBMS tables ■ CQL looks like SQL 23
  • 24. The ideal ScyllaDB application has the following characteristics ■ Writes exceed reads by a large margin ■ Data is rarely updated and when updates are made, they are idempotent (the result of a successful performed operation is independent of the number of times it is executed) ■ Read Access is by a known primary key ■ Data can be partitioned via a key that allows the database to be spread evenly across multiple nodes ■ There is no need for joins or aggregates 24
  • 25. Excellent ScyllaDB Use Cases ■ Transaction logging: purchases, test scores, movies watched and movie latest location ■ Recommendation and personalization engines ■ Fraud detection ■ Tracking pretty much anything including order status, packages, etc ■ Storing time series data (as long as you do your own aggregates) • Health tracker data • Weather service history • Internet of things status and event history • Sensor data in general ■ Messaging systems: chats, collaboration, and instant messaging apps, etc. 25
  • 26. 26 Denormalization is expected Writes are (almost) free No DB-level joins No referential integrity Indexing useful in specific circumstances Differences between ScyllaDB and relational databases
  • 27. ScyllaDB Data Model Principles (1 of 3) ■ Keyspace: container for tables in a Cassandra data model ■ Table: container for an ordered collection of rows ■ Rows: made of a primary key plus an ordered set of columns, themselves made of name/value pairs. ■ No need to store a value for every column each time a new row is stored. 27
  • 28. ScyllaDB Data Model Principles (2 of 3) ■ Primary key: a composite made of a partition key plus an optional set of clustering columns. ● Partition key: is responsible for data distribution across the nodes. It determines which node will store a given row. It can be one or more columns. ● Clustering columns: is responsible for sorting the rows within the partition. It can be zero or more columns. 28
  • 29. ScyllaDB Data Model Principles (3 of 3) ■ Data type: defined to constrain the values stored in a column. Data types include character and numeric types, collections, and user- defined types. A column also has other attributes: timestamps and time-to-live. ■ Secondary index: an index on any columns that is not part of the primary key. Secondary indexes are not recommended on columns with high cardinality or very low cardinality, or on columns that a frequently updated or deleted. ■ Joins: cannot be performed at the database level. If there is need for a join, either it must be performed at the application level, or preferably, the data model should be adapted to create a denormalized table that represents the join results. 29
  • 30. Data modeling for ScyllaDB is a balancing act ■ Two primary rules of data modeling in ScyllaDB: ● each partition should have roughly same amount of data ● read operations should access minimum partitions, ideally only one ■ The two data modeling principles often conflict, therefore you have to find a balance between the two based on domain understanding and business needs ■ Anticipate growth: a data model that may make sense with a particular transaction volume, may not longer make sense when multiplied 100x or 1000x 30
  • 32. Slide Title Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus sed sollicitudin. ■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu. Fusce viverra nibh sed egestas faucibus. ■ Nunc bibendum eget metus eget gravida. ■ Cras ultrices tortor mauris, nec porta sapien placerat ut. 32
  • 33. Slide Title Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus sed sollicitudin. ■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu. Fusce viverra nibh sed egestas faucibus. ■ Nunc bibendum eget metus eget gravida. ■ Cras ultrices tortor mauris, nec porta sapien placerat ut. 33
  • 34. Slide Title Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus sed sollicitudin. ■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu. Fusce viverra nibh sed egestas faucibus. ■ Nunc bibendum eget metus eget gravida. ■ Cras ultrices tortor mauris, nec porta sapien placerat ut. 34
  • 35. Slide Title Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent consectetur ipsum eu mauris finibus gravida. In lacinia volutpat lectus sed sollicitudin. ■ Morbi pellentesque purus quam, vel feugiat ipsum pharetra eu. Fusce viverra nibh sed egestas faucibus. ■ Nunc bibendum eget metus eget gravida. ■ Cras ultrices tortor mauris, nec porta sapien placerat ut. 35 35
  • 38. Slide Title Top Gradient 2 Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Morbi ultrices sed nulla non pellentesque. Ut ac tortor facilisis neque ultricies egestas quis tempus erat. Aenean a finibus leo, sit amet congue nibh. 38 38
  • 39. How to Use This Slide Deck Template There are multiple background options, dark and light with more or less gradient colors. The font is Montserrat Base font size for body text is 18pts, adjust accordingly. Color Palette #008dff #03ddda #2b3990 #f9ae00 #3d444c 39
  • 40. All Blank Slide Title Goes Here 40
  • 41. Table Example Column 1 Column 2 Column 3 Requests/Minut e 12M 500K AVG Latency 4 ms 8 ms Max Latency 8 ms 35 ms 41
  • 42. Keep in touch! Presenter Name Job title Company name email@email.com @socialhandle

Editor's Notes

  1. The benefits of data modeling are at all levels: In the perspective of end users, because the delivered application will more closely match their expectations For management, because it reduces risks and is more productive and efficient And for developers because collaboration brings clearer requirements, less frustrating rework, and better performance