SlideShare a Scribd company logo
Semi Formal Model for Document
Oriented Databases
Daniel Coupal
Universia.com
1
Agenda
1.Why Having a Model?
2.Modeling Steps
3.Capturing the Model
4.Tools
2
Why having a Model?
• Documentation, common language
• Repeatable process
• Abstraction from database implementations
• Support for tools
• A document DB is supposed to be “schemaless”!
• No! Having a schema is a good thing.
Need to declare everything is the problem.
3
What if you have many apps?
Info about the schema is in
the code of Application A
Application B wants to read
the data in the DB.
Where is the description of
what it can read, write, ...?
4
Why we choose NoSQL?
• Rewards
• Huge amount of data
• Cheap hardware
• Blazing fast
5
Why we choose NoSQL?
• Rewards
• Huge amount of data
• Cheap hardware
• Blazing fast
• Compromises
• No joins, no transactions, less integrity
• Not as mature technology
• Less tools
6
Tradeoff between Performance and Data Integrity
NoSQL Little Secrets
• No experience on maintaining
databases and apps over the
years, which is the most
expensive activity in software
development.
• Not all the same vendors will
be there in few years.
• What if your DB is not
maintained anymore?
• What if there is a better DB
available?
7
NoSQL State of the Art
• Designing by Example
• Used in most tutorials
• Works well on small examples, like blogs
• Database with more tables needs a better way
to capture the design
8
{
"_id" : ObjectId("508d27069cc1ae293b36928d"),
"title" : "This is the title",
"body" : "This is the body text.",
"tags" : [
"chocolate",
"spleen",
"piano",
"spatula"
],
"created_date" : ISODate("2012-10-28T12:41:39.110Z"),
"author_id" : ObjectId("508d280e9cc1ae293b36928e"),
"category_id" : ObjectId("508d29709cc1ae293b369295"),
"comments" : [
{
"subject" : "This is comment 1",
"body" : "This is the body of comment 1.",
"author_id" : ObjectId("508d345f9cc1ae293b369296"),
"created_date" : ISODate("2012-10-28T13:34:23.929Z")
},
{
"subject" : "This is comment 2",
"body" : "This is the body of comment 2.",
"author_id" : ObjectId("508d34739cc1ae293b369297"),
"created_date" : ISODate("2012-10-28T13:34:43.192Z")
},
]
}
9
NoSQL State of the Art
Complex ER Diagram
10
Northwind ER Diagram
11
Northwind Doc Diagram
11 tables in those 5 collections
No need for:
- CustomerCustomerDemographics
- EmployeeTerritories
because they are N-to-N relationships,
and don’t contain any data
Products
Suppliers
OrdersEmployees Customers
Customer
Demographics
Shippers
OrderDetails
Region
Categories
12
Territories
That was a bad example...
• Why?
13
That was a bad example...
• Why?
• With a document database, you don’t model
data as your first step!
• Data is modeled based on the usage
• SQL’s model first approach leads to bad
performance for every app.
NOSQL does the opposite.
14
Modeling Steps
SQL NoSQL
Goal
Answer to
Step 1
Step 2
Step 3
Step 4
general usage current usage
what answer do I have? what questions do I have?
model data write queries
write application add indexes
write queries model data
add indexes write application
15
Step 1: Write Queries
• Basic fields to retrieve
• Frequency of the query, requested speed
• Criticality of the query for the system
• Design notes
➡ Sort the queries by importance
16
Step 2: Add Indexes
• Which indexes do you need for the queries to go
fast?
• Attributes of your indexes
17
Step 3: Model Data
• List the collections
• How many documents per collection?
➡ NoSQL is all about size and performance, no?
• Attributes on the collections (capped, ...)
• List the fields, their types, constraints
➡ Only for the important fields
18
Step 4: Write Application
• Integration code/driver/queries/database
• Balance between using the product functionality and
isolating the layer that deals with the database.
• Interesting new tools to normalize to a common
query language: JSONiq, BigSQL, ...
19
Capturing the Model
• JSON is a cool format!
• Your document database is a cool storage facility!
• Language for the model: JSON Schema
• supports things like: types, cardinality, references, acceptable values, ...
20
JSON Schema
{
"address": {
"streetAddress": "21 2nd Street",
"city":"New York"
},
"phoneNumber":
[
{
"type":"home",
"number":"212 555-1234"
}
]
}
{
"type": "object",
"properties": {
"address": {
"type": "object",
"properties": {
"city": {
"type": "string"
},
"streetAddress": {
"type": "string"
}
}
},
"phoneNumber": {
"type": "array",
"items": {
"properties": {
"number": {
"type": "string"
},
"type": {
"type": "string"
}
}
}
}
}
}
21
Model: Query
• Use:
• the native DB notation
• or use SQL (everyone can read SQL)
• Avoid joins!!!
• Example:
• Product by ProductID, ProductName, SupplierID
• Order by OrderID, CustomerID, ContactName
• Customer by CustomerID, ContactName, OrderID
22
Example
23
{
! "id" : "REQ002",
! "name" : "Get product by name",
! "n" : “20000/day”,
“t” : “2 ms”,
! "notes" : [
! ! "User asking about a product availability by product name"
! ],
! "sqlquery" : "select * from product where product.ProductName = abcde",
! "mongoquery" : {
! ! "ProductName" : "abcde"
! }
}
Model: Index
• Again, use the native DB notation
• Example:
• Product.ProductID, .ProductName, .SupplierID
• Order.OrderID, .CustomerID, .ContactName
• Customer by .CustomerID, .ContactName, .OrderID
• Why is it useful, it looks so trivial?
• If written a tool can validate it or create estimates
24
Example
25
{
! "id" : "REQ002",
! "name" : "Get product by name",
! "n" : “20000/day”,
“t” : “2 ms”,
! "notes" : [
! ! "User asking about a product availability by product name"
! ],
! "sqlquery" : "select * from product where product.ProductName = abcde",
! "mongoquery" : {
! ! "ProductName" : "abcde"
! },
! "index" : {
! ! "collection" : "Products",
! ! "field" : "ProductName"
! }
}
Model: Data
• Collection
• One JSON-Schema document per collection
• Fields for collection and database
• Optionally, add a version number
26
Example for ‘Orders’
27
{
“database” : “northwind”,
“collection” : “Orders”,
“version” : 1,
"type":"object",
"$schema": “http://json-schema.org/draft-03/schema”,
"id": "http://jsonschema.net",
“properties”: {
"CustomerID": {
"type":"string",
"id": "http://jsonschema.net/CustomerID"
},
“Details”: {
"type":"array",
"id": "http://jsonschema.net/Details",
"items":
{
“type”: “object”,
"id": "http://jsonschema.net/Details/0",
“required”: [ “ProductID”, “Quantity” ],
"properties": {
"ProductID": {
"type":"number",
"id": "http://jsonschema.net/Details/0/ProductID"
},
"Quantity": {
“type”: “number",
},
Simpler...
28
{
“database” : “northwind”,
“collection” : “Orders”,
“version” : 1,
"type":"object",
"properties": {
"CustomerID": {
"type":"string"
},
"Details": {
"type":"array",
"items":
{
"type":"object",
"properties": {
"ProductID": {
"type":"number"
},
"Quantity": {
"type":"number"
},
...
Model: Versioning
• Each modified version of a
collection is a new document
• db.<database>.find({“version:2”})
➡shows all collections for version
‘2’ of the schema for the DB.
29
Partial Schema
• Example: you just want to validate the ‘version’
field which has values as ‘string’ and as ‘number’
30
{
"type": "object",
"properties": {
"version": {
"type": "string",
}
}
}
{
"version": 1.0,
...
},
{
"version": “1.0.1”,
...
}
JSON SchemaJSON
Tools
• Get some JSON Schema from JSON:
• http://www.jsonschema.net/
• Validate your schema
• http://jsonschemalint.com/
• https://github.com/dcoupal/godbtools.git
• Validate/edit JSON
• http://jsonlint.com/ or RoboMongo
• Import SQL into NoSQL
• Pentaho, Talend
31
Tools considerations
• NoSQL often relies on data being in RAM.
Scanning all your data can make your dataset in
memory “cold”, instead of “hot”
• running incremental validations work better, ensure
you have timestamps on insertions and updates
32
Document Validator
33
Schema
(JSON Schema)
Collection
(JSON)
Validator
“Eventual Integrity”
• NoSQL have eventual consistency
• With tools that validate and fix the data according
to a set of rules, we get “eventual integrity”
34
Tools to be developed
• UI to manipulate a schema graphically
• More Complete Validators:
• constraints
• relationships
• Per language library to validate inserted/updated
documents
35
Conclusion: Take Aways
• Design in this order:
queries, indexes, data,
application.
• Capture your model
outside the application.
• Not having a schema is
not a good thing!
Use the attribute
‘schemaless’ wisely!
36
NoSQL
Goal
Answer to
Step 1
Step 2
Step 3
Step 4
current usage
what questions do I have?
write queries
add indexes
model data
write application
Questions?
• dcoupal@universia.com
37

More Related Content

What's hot

Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated Polystores
Jiaheng Lu
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.
Keshav Murthy
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDB
MongoDB
 
Data Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane FineData Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane Fine
MongoDB
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
MongoDB
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesUnderstanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune Queries
Keshav Murthy
 
OrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsOrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data Relationships
Fabrizio Fortino
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
Neo4j
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model database
Mahdi Atawneh
 
Jumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema DesignJumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema Design
MongoDB
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data Modeling
Vital.AI
 
MongoDB Meetup
MongoDB MeetupMongoDB Meetup
MongoDB Meetup
Maxime Beugnet
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB
MongoDB
 
Getting Started with NoSQL
Getting Started with NoSQLGetting Started with NoSQL
Getting Started with NoSQL
Aaron Benton
 
Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance Implications
MongoDB
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
MongoDB
 
Multi model-databases
Multi model-databasesMulti model-databases
Multi model-databases
Michael Hackstein
 
MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema Design
MongoDB
 

What's hot (20)

Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated Polystores
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDB
 
Data Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane FineData Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane Fine
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesUnderstanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune Queries
 
OrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsOrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data Relationships
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model database
 
Jumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema DesignJumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema Design
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data Modeling
 
MongoDB Meetup
MongoDB MeetupMongoDB Meetup
MongoDB Meetup
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB
 
Getting Started with NoSQL
Getting Started with NoSQLGetting Started with NoSQL
Getting Started with NoSQL
 
Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance Implications
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
Multi model-databases
Multi model-databasesMulti model-databases
Multi model-databases
 
MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema Design
 
Nosql
NosqlNosql
Nosql
 

Similar to Semi Formal Model for Document Oriented Databases

The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
Matias Cascallares
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling database
MUG Perú
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Keshav Murthy
 
No SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDBNo SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDB
Ken Cenerelli
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdf
jill734733
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
David Peyruc
 
Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0
Anuj Sahni
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
IBM Cloud Data Services
 
NoSQL Data Modeling using Couchbase
NoSQL Data Modeling using CouchbaseNoSQL Data Modeling using Couchbase
NoSQL Data Modeling using Couchbase
Brant Burnett
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
MongoDB
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
MongoDB
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
NoSQLmatters
 
Crafting Evolvable Api Responses
Crafting Evolvable Api ResponsesCrafting Evolvable Api Responses
Crafting Evolvable Api Responses
darrelmiller71
 
Inferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its ApplicationsInferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its Applications
Diego Sevilla Ruiz
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
Andrew Liu
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
Uwe Printz
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB MongoDB
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
Mike Broberg
 
Retail referencearchitecture productcatalog
Retail referencearchitecture productcatalogRetail referencearchitecture productcatalog
Retail referencearchitecture productcatalog
MongoDB
 

Similar to Semi Formal Model for Document Oriented Databases (20)

The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling database
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
No SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDBNo SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDB
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdf
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
 
Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
NoSQL Data Modeling using Couchbase
NoSQL Data Modeling using CouchbaseNoSQL Data Modeling using Couchbase
NoSQL Data Modeling using Couchbase
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
 
Crafting Evolvable Api Responses
Crafting Evolvable Api ResponsesCrafting Evolvable Api Responses
Crafting Evolvable Api Responses
 
Inferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its ApplicationsInferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its Applications
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
 
Retail referencearchitecture productcatalog
Retail referencearchitecture productcatalogRetail referencearchitecture productcatalog
Retail referencearchitecture productcatalog
 

More from Daniel Coupal

MongoDB.Live 2020 - Advanced Schema Design Patterns
MongoDB.Live 2020  - Advanced Schema Design PatternsMongoDB.Live 2020  - Advanced Schema Design Patterns
MongoDB.Live 2020 - Advanced Schema Design Patterns
Daniel Coupal
 
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDBMongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
Daniel Coupal
 
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionSilicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in production
Daniel Coupal
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Daniel Coupal
 
MMS: The Easiest Way to Run MongoDB
MMS: The Easiest Way to Run MongoDBMMS: The Easiest Way to Run MongoDB
MMS: The Easiest Way to Run MongoDB
Daniel Coupal
 
Silicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBSilicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDB
Daniel Coupal
 

More from Daniel Coupal (6)

MongoDB.Live 2020 - Advanced Schema Design Patterns
MongoDB.Live 2020  - Advanced Schema Design PatternsMongoDB.Live 2020  - Advanced Schema Design Patterns
MongoDB.Live 2020 - Advanced Schema Design Patterns
 
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDBMongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
 
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionSilicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in production
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
MMS: The Easiest Way to Run MongoDB
MMS: The Easiest Way to Run MongoDBMMS: The Easiest Way to Run MongoDB
MMS: The Easiest Way to Run MongoDB
 
Silicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBSilicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDB
 

Recently uploaded

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 

Recently uploaded (20)

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 

Semi Formal Model for Document Oriented Databases

  • 1. Semi Formal Model for Document Oriented Databases Daniel Coupal Universia.com 1
  • 2. Agenda 1.Why Having a Model? 2.Modeling Steps 3.Capturing the Model 4.Tools 2
  • 3. Why having a Model? • Documentation, common language • Repeatable process • Abstraction from database implementations • Support for tools • A document DB is supposed to be “schemaless”! • No! Having a schema is a good thing. Need to declare everything is the problem. 3
  • 4. What if you have many apps? Info about the schema is in the code of Application A Application B wants to read the data in the DB. Where is the description of what it can read, write, ...? 4
  • 5. Why we choose NoSQL? • Rewards • Huge amount of data • Cheap hardware • Blazing fast 5
  • 6. Why we choose NoSQL? • Rewards • Huge amount of data • Cheap hardware • Blazing fast • Compromises • No joins, no transactions, less integrity • Not as mature technology • Less tools 6 Tradeoff between Performance and Data Integrity
  • 7. NoSQL Little Secrets • No experience on maintaining databases and apps over the years, which is the most expensive activity in software development. • Not all the same vendors will be there in few years. • What if your DB is not maintained anymore? • What if there is a better DB available? 7
  • 8. NoSQL State of the Art • Designing by Example • Used in most tutorials • Works well on small examples, like blogs • Database with more tables needs a better way to capture the design 8
  • 9. { "_id" : ObjectId("508d27069cc1ae293b36928d"), "title" : "This is the title", "body" : "This is the body text.", "tags" : [ "chocolate", "spleen", "piano", "spatula" ], "created_date" : ISODate("2012-10-28T12:41:39.110Z"), "author_id" : ObjectId("508d280e9cc1ae293b36928e"), "category_id" : ObjectId("508d29709cc1ae293b369295"), "comments" : [ { "subject" : "This is comment 1", "body" : "This is the body of comment 1.", "author_id" : ObjectId("508d345f9cc1ae293b369296"), "created_date" : ISODate("2012-10-28T13:34:23.929Z") }, { "subject" : "This is comment 2", "body" : "This is the body of comment 2.", "author_id" : ObjectId("508d34739cc1ae293b369297"), "created_date" : ISODate("2012-10-28T13:34:43.192Z") }, ] } 9 NoSQL State of the Art
  • 12. Northwind Doc Diagram 11 tables in those 5 collections No need for: - CustomerCustomerDemographics - EmployeeTerritories because they are N-to-N relationships, and don’t contain any data Products Suppliers OrdersEmployees Customers Customer Demographics Shippers OrderDetails Region Categories 12 Territories
  • 13. That was a bad example... • Why? 13
  • 14. That was a bad example... • Why? • With a document database, you don’t model data as your first step! • Data is modeled based on the usage • SQL’s model first approach leads to bad performance for every app. NOSQL does the opposite. 14
  • 15. Modeling Steps SQL NoSQL Goal Answer to Step 1 Step 2 Step 3 Step 4 general usage current usage what answer do I have? what questions do I have? model data write queries write application add indexes write queries model data add indexes write application 15
  • 16. Step 1: Write Queries • Basic fields to retrieve • Frequency of the query, requested speed • Criticality of the query for the system • Design notes ➡ Sort the queries by importance 16
  • 17. Step 2: Add Indexes • Which indexes do you need for the queries to go fast? • Attributes of your indexes 17
  • 18. Step 3: Model Data • List the collections • How many documents per collection? ➡ NoSQL is all about size and performance, no? • Attributes on the collections (capped, ...) • List the fields, their types, constraints ➡ Only for the important fields 18
  • 19. Step 4: Write Application • Integration code/driver/queries/database • Balance between using the product functionality and isolating the layer that deals with the database. • Interesting new tools to normalize to a common query language: JSONiq, BigSQL, ... 19
  • 20. Capturing the Model • JSON is a cool format! • Your document database is a cool storage facility! • Language for the model: JSON Schema • supports things like: types, cardinality, references, acceptable values, ... 20
  • 21. JSON Schema { "address": { "streetAddress": "21 2nd Street", "city":"New York" }, "phoneNumber": [ { "type":"home", "number":"212 555-1234" } ] } { "type": "object", "properties": { "address": { "type": "object", "properties": { "city": { "type": "string" }, "streetAddress": { "type": "string" } } }, "phoneNumber": { "type": "array", "items": { "properties": { "number": { "type": "string" }, "type": { "type": "string" } } } } } } 21
  • 22. Model: Query • Use: • the native DB notation • or use SQL (everyone can read SQL) • Avoid joins!!! • Example: • Product by ProductID, ProductName, SupplierID • Order by OrderID, CustomerID, ContactName • Customer by CustomerID, ContactName, OrderID 22
  • 23. Example 23 { ! "id" : "REQ002", ! "name" : "Get product by name", ! "n" : “20000/day”, “t” : “2 ms”, ! "notes" : [ ! ! "User asking about a product availability by product name" ! ], ! "sqlquery" : "select * from product where product.ProductName = abcde", ! "mongoquery" : { ! ! "ProductName" : "abcde" ! } }
  • 24. Model: Index • Again, use the native DB notation • Example: • Product.ProductID, .ProductName, .SupplierID • Order.OrderID, .CustomerID, .ContactName • Customer by .CustomerID, .ContactName, .OrderID • Why is it useful, it looks so trivial? • If written a tool can validate it or create estimates 24
  • 25. Example 25 { ! "id" : "REQ002", ! "name" : "Get product by name", ! "n" : “20000/day”, “t” : “2 ms”, ! "notes" : [ ! ! "User asking about a product availability by product name" ! ], ! "sqlquery" : "select * from product where product.ProductName = abcde", ! "mongoquery" : { ! ! "ProductName" : "abcde" ! }, ! "index" : { ! ! "collection" : "Products", ! ! "field" : "ProductName" ! } }
  • 26. Model: Data • Collection • One JSON-Schema document per collection • Fields for collection and database • Optionally, add a version number 26
  • 27. Example for ‘Orders’ 27 { “database” : “northwind”, “collection” : “Orders”, “version” : 1, "type":"object", "$schema": “http://json-schema.org/draft-03/schema”, "id": "http://jsonschema.net", “properties”: { "CustomerID": { "type":"string", "id": "http://jsonschema.net/CustomerID" }, “Details”: { "type":"array", "id": "http://jsonschema.net/Details", "items": { “type”: “object”, "id": "http://jsonschema.net/Details/0", “required”: [ “ProductID”, “Quantity” ], "properties": { "ProductID": { "type":"number", "id": "http://jsonschema.net/Details/0/ProductID" }, "Quantity": { “type”: “number", },
  • 28. Simpler... 28 { “database” : “northwind”, “collection” : “Orders”, “version” : 1, "type":"object", "properties": { "CustomerID": { "type":"string" }, "Details": { "type":"array", "items": { "type":"object", "properties": { "ProductID": { "type":"number" }, "Quantity": { "type":"number" }, ...
  • 29. Model: Versioning • Each modified version of a collection is a new document • db.<database>.find({“version:2”}) ➡shows all collections for version ‘2’ of the schema for the DB. 29
  • 30. Partial Schema • Example: you just want to validate the ‘version’ field which has values as ‘string’ and as ‘number’ 30 { "type": "object", "properties": { "version": { "type": "string", } } } { "version": 1.0, ... }, { "version": “1.0.1”, ... } JSON SchemaJSON
  • 31. Tools • Get some JSON Schema from JSON: • http://www.jsonschema.net/ • Validate your schema • http://jsonschemalint.com/ • https://github.com/dcoupal/godbtools.git • Validate/edit JSON • http://jsonlint.com/ or RoboMongo • Import SQL into NoSQL • Pentaho, Talend 31
  • 32. Tools considerations • NoSQL often relies on data being in RAM. Scanning all your data can make your dataset in memory “cold”, instead of “hot” • running incremental validations work better, ensure you have timestamps on insertions and updates 32
  • 34. “Eventual Integrity” • NoSQL have eventual consistency • With tools that validate and fix the data according to a set of rules, we get “eventual integrity” 34
  • 35. Tools to be developed • UI to manipulate a schema graphically • More Complete Validators: • constraints • relationships • Per language library to validate inserted/updated documents 35
  • 36. Conclusion: Take Aways • Design in this order: queries, indexes, data, application. • Capture your model outside the application. • Not having a schema is not a good thing! Use the attribute ‘schemaless’ wisely! 36 NoSQL Goal Answer to Step 1 Step 2 Step 3 Step 4 current usage what questions do I have? write queries add indexes model data write application