SlideShare a Scribd company logo
1 of 41
Download to read offline
Christian Amor Kvalheim (MongoDB Staff Engineer)
From SQL to
MongoDB
How to get from A to B in a
reasonably ordered fashion
Whats Up
❖ The Challenge
❖ Explicit Schema
❖ Implicit Schema
❖ Rules of Thumb
❖ Summary
The Challenge
Take an existing SQL Schema
and pick an Appropriate
MongoDB Schema
Our Example SQL Schema
Explicit Schema
❖ Table structure definition
❖ Primary Key definition
❖ Foreign Key relationships
❖ 1:n
❖ 1:1
❖ n:m
Implicit Schema
❖ The SQL Schema as expressed by the following
operations and their associated metadata
❖ Insert operation
❖ Update operations
❖ Select Operations
❖ Join relationships
With Explicit Schema Only
Relationships
1
n
1
n
n
1
1
n
1
n
1
n
1
n
1 n
No duplication of Data
1
n
1
n
n
1
1
n
1
n
1
n
1
n
1 n
Collections
❖ Customers
❖ Payments [array]
❖ Orders
❖ Orderdetails
❖ Employees
❖ Products
❖ Productlines
❖ Offices
What if we allow duplication ?
1
n
1
n
n
1
1
n
1
n
1
n
1
n
1 n
Collections
❖ Customers
❖ Payments [array]
❖ Orders
❖ Orderdetails [array]
❖ Products [document]
❖ Productlines [document]
❖ Offices
❖ Products
❖ Productlines
Important Notes
❖ Foreign Key Relationship in most cases are not
representative of application level queries
❖ Cannot discover the degree of mutability looking at the
SQL in isolation
❖ Cannot know how the average sizes of n in the 1:n
relationships
Implicit Schema
Implicit Schema
❖ The Implicit Schema represents the SQL operations
executed against the relational schema (Application
Schema)
❖ Can vary hugely from the foreign key relationships
❖ Expresses read/vs write ratios for tables
❖ Can be used to deduct entity mutability
❖ Can be used to estimate n in the 1:n relationships
Example - SELECT
❖ SELECT * FROM orders, orderdetails, products WHERE
…. [1000]
❖ SELECT * FROM offices, employees WHERE … [100]
❖ SELECT * FROM productlines, products WHERE … [2000]
❖ SELECT * FROM products WHERE … [4000]
❖ SELECT * FROM employees, customers WHERE … [200]
❖ SELECT * FROM customers, orders WHERE … [200]
What We Can Learn
❖ The frequency of the SQL operations
❖ The Application Schema relationships studying the join
relationships.
❖ If the logs include the number of rows returned we can
make estimates for the size of n in the 1:n relationships
❖ We can also calculate the rate of growth of the n over
time
1
~5 (+1 every 100 min)
1 ~12
1
1
1
~10001
~15
1
~10
1
~20
1
~6
Example - INSERT/UPDATE
❖ INSERT (…) VALUES (…) INTO orders
❖ INSERT (…) VALUES (…) INTO order details
❖ UPDATE … orders WHERE orderNumber = 1
Data Islands
❖ Single Item Mutability Rate (SIMR)
❖ How much an entity mutates in a given time period
❖ A low mutation rate
❖ Entity reaches a stable state and is a good candidate for rolling up
into a single document
❖ Duplication of data is ok as the document is a snapshot in time
❖ A high mutation rate
❖ Entity does not reach a stable state and keeps mutating and might
not be a good candidate for rollup
Single Item Mutability Rate
❖ Order life span example
❖ An order gets created at T=0
❖ 10 order details are created at T+1
❖ Order is filled and order record updated T+10
❖ Order is shipped and order record updated T+15
❖ Past T+15 there are no more mutations
Order Life Span Example
Order Life Span Example
T
T = 0
Order
Created
T = 1
Added
10
Order Details
T = 10
Order
Fulfilled
T = 15
Order
Shipped
1
~5
1 ~12
1
1
1
~10001
~15
1
~10
1
~20
1
~6
❖ Customer[1:n] -> Payment relationship
❖ A payment created at T=5, T=50
❖ Customer[1:n] -> Orders
❖ An order created at T=0, T=15, T=20, T=45
❖ Unbound Relationships
Customer Life Span Example
Customer Life Span Example
T
T = 0
Order
Created
T = 1 T = 5
Payment
Created
T = 15
Order
Created
Order
Created
T = 20
Order
Created
1
1 ~12
1
1
1
~10001
~15
1
~10
1
~20
1
~6
~5 (+1 every 5 days)
❖ The recursive relationship for Employees makes it
unsuitable for rolling up
❖ The same recursive relationship also affects the offices-
>employees relationship
❖ The ProductLines -> Products relationship are big and
possibly unbound
And The Rest ?
Rules of Thumb
1. SQL Schema + Foreign Key Relationships
❖ Only have the Explicit Relationships and Table
definitions
2. SQL Operations Logs (mysql general log)
❖ Contains only SQL operations (no result set size)
3. Full SQL Operations Logs (mysql slow log)
❖ Contains SQL operations (result set size, latencies)
Levels Of Information
1. Use Selects with Joins to draw the new relationship
2. Establish the average n join relationship
3. Establish the mutation rate of over time
❖ Does the relationship go static ?
❖ Are the relationships unbound ? (growing n)
Analysis Steps
1. Roll up relationships
1. If entity relationship reaches a static state
2. If the rate of growth of n is slow enough for the relationship to be
static (analyst discretion)
2. Don’t rollup relationships
1. If the rate of mutability is high
2. If the average size of n is huge
3. If the mutation rate of the entity is large
4. If an entity has a recursive relationship
Algorithms
Applying It
1
1 ~12
1
1
1
~10001
~15
1
~10
1
~20
1
~6
~5 (+1 every 5 days)
Collapsing, Duplicating
Products and Productlines
1
1 ~12
1
1
1
~10001
~15
1
~10
1
~20
1
~6
~5 (+1 every 5 days)
Collapsing, Duplicating
Products and Productlines
Collapsing
Payments into
Customers as never
queried separately
Collections
❖ Customers
❖ Payments [array]
❖ Orders
❖ Orderdetails [array]
❖ Products [document]
❖ Productlines [document]
❖ Offices
❖ Products
❖ Productlines
What’s Cooking
1. We are working on building tooling to help
1. Analyze your relational schema
2. Propose schema recommendations
3. Load and transform your data
2. Push the whole subject of schema transformation
forward doing something never done before
Tooling
1. Are Operation Latencies important for recommending
a Schema ?
2. Can one quantify a schema recommendation (is
recommendation A better than B and if, then why ?)
3. Can Machine Learning produce better
recommendations ?
4. … etc etc
Tons of Open Questions
Are you ready to build a new team, to build a brand new product, and to create
a whole new category of products for the most popular NoSQL database?
 MongoDB, the leader in NoSQL databases is building a new team in Dublin.
This team will develop products that help our customer adopt our technology
by analyzing their legacy relational systems. We need someone who is going to
participate in the research, partner with our staff engineers who are
prototyping solutions, write production ready code, and build a team.
This person will report to the Director of Integrations at MongoDB.
Come Work With Us
http://grnh.se/ge1rfp1
Q/A
http://grnh.se/ge1rfp1

More Related Content

Similar to From SQL to MongoDB

Directions for Multiple Trendlines on a Single Graph· After yo.docx
Directions for Multiple Trendlines on a Single Graph· After yo.docxDirections for Multiple Trendlines on a Single Graph· After yo.docx
Directions for Multiple Trendlines on a Single Graph· After yo.docxlynettearnold46882
 
Data and functional modeling
Data and functional modelingData and functional modeling
Data and functional modelingSlideshare
 
SOA guest lecture at DIKU by Dr. Rasmus Petersen (Dec 17 2015)
SOA guest lecture at DIKU by Dr. Rasmus Petersen (Dec 17 2015)SOA guest lecture at DIKU by Dr. Rasmus Petersen (Dec 17 2015)
SOA guest lecture at DIKU by Dr. Rasmus Petersen (Dec 17 2015)Rasmus Petersen
 
Data architecture principles to accelerate your data strategy
Data architecture principles to accelerate your data strategyData architecture principles to accelerate your data strategy
Data architecture principles to accelerate your data strategyCloverDX
 
Enough Blame for System Performance Issues
Enough Blame for System Performance IssuesEnough Blame for System Performance Issues
Enough Blame for System Performance IssuesMahesh Vallampati
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault ModelingKent Graziano
 
The Complete Lean Enterprise: Value Stream Mapping for Office and Services
The Complete Lean Enterprise: Value Stream Mapping for Office and ServicesThe Complete Lean Enterprise: Value Stream Mapping for Office and Services
The Complete Lean Enterprise: Value Stream Mapping for Office and ServicesAssociation for Manufacturing Excellence
 
Value Stream Mapping Project Template by Operational Excellence Consulting
Value Stream Mapping Project Template by Operational Excellence ConsultingValue Stream Mapping Project Template by Operational Excellence Consulting
Value Stream Mapping Project Template by Operational Excellence ConsultingOperational Excellence Consulting
 
Database management system
Database management systemDatabase management system
Database management systemTushar Desarda
 
Talend AS A Product
Talend AS A ProductTalend AS A Product
Talend AS A ProductAbdul Manaf
 
Informatica Transformations with Examples | Informatica Tutorial | Informatic...
Informatica Transformations with Examples | Informatica Tutorial | Informatic...Informatica Transformations with Examples | Informatica Tutorial | Informatic...
Informatica Transformations with Examples | Informatica Tutorial | Informatic...Edureka!
 
Advanced Cherwell Administration Tips
Advanced Cherwell Administration TipsAdvanced Cherwell Administration Tips
Advanced Cherwell Administration TipsCherwell Software
 
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalyticsconf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalyticsTom LaGatta
 
AjayKumar Resume
AjayKumar Resume AjayKumar Resume
AjayKumar Resume Ajay Kumar
 
Data Quality
Data QualityData Quality
Data QualityVijaya K
 
Testing in the New World of Off-the-Shelf Software
Testing in the New World of Off-the-Shelf SoftwareTesting in the New World of Off-the-Shelf Software
Testing in the New World of Off-the-Shelf SoftwareJosiah Renaudin
 

Similar to From SQL to MongoDB (20)

Directions for Multiple Trendlines on a Single Graph· After yo.docx
Directions for Multiple Trendlines on a Single Graph· After yo.docxDirections for Multiple Trendlines on a Single Graph· After yo.docx
Directions for Multiple Trendlines on a Single Graph· After yo.docx
 
SOA the Oracle way
SOA the Oracle waySOA the Oracle way
SOA the Oracle way
 
Data and functional modeling
Data and functional modelingData and functional modeling
Data and functional modeling
 
SOA guest lecture at DIKU by Dr. Rasmus Petersen (Dec 17 2015)
SOA guest lecture at DIKU by Dr. Rasmus Petersen (Dec 17 2015)SOA guest lecture at DIKU by Dr. Rasmus Petersen (Dec 17 2015)
SOA guest lecture at DIKU by Dr. Rasmus Petersen (Dec 17 2015)
 
Data architecture principles to accelerate your data strategy
Data architecture principles to accelerate your data strategyData architecture principles to accelerate your data strategy
Data architecture principles to accelerate your data strategy
 
Enough Blame for System Performance Issues
Enough Blame for System Performance IssuesEnough Blame for System Performance Issues
Enough Blame for System Performance Issues
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
 
Chapter 5 transactions and dcl statements
Chapter 5  transactions and dcl statementsChapter 5  transactions and dcl statements
Chapter 5 transactions and dcl statements
 
ch15.ppt
ch15.pptch15.ppt
ch15.ppt
 
The Complete Lean Enterprise: Value Stream Mapping for Office and Services
The Complete Lean Enterprise: Value Stream Mapping for Office and ServicesThe Complete Lean Enterprise: Value Stream Mapping for Office and Services
The Complete Lean Enterprise: Value Stream Mapping for Office and Services
 
Value Stream Mapping Project Template by Operational Excellence Consulting
Value Stream Mapping Project Template by Operational Excellence ConsultingValue Stream Mapping Project Template by Operational Excellence Consulting
Value Stream Mapping Project Template by Operational Excellence Consulting
 
Database management system
Database management systemDatabase management system
Database management system
 
Talend AS A Product
Talend AS A ProductTalend AS A Product
Talend AS A Product
 
Informatica Transformations with Examples | Informatica Tutorial | Informatic...
Informatica Transformations with Examples | Informatica Tutorial | Informatic...Informatica Transformations with Examples | Informatica Tutorial | Informatic...
Informatica Transformations with Examples | Informatica Tutorial | Informatic...
 
Advanced Cherwell Administration Tips
Advanced Cherwell Administration TipsAdvanced Cherwell Administration Tips
Advanced Cherwell Administration Tips
 
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalyticsconf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
 
AjayKumar Resume
AjayKumar Resume AjayKumar Resume
AjayKumar Resume
 
Data Quality
Data QualityData Quality
Data Quality
 
Testing in the New World of Off-the-Shelf Software
Testing in the New World of Off-the-Shelf SoftwareTesting in the New World of Off-the-Shelf Software
Testing in the New World of Off-the-Shelf Software
 
Etl testing
Etl testingEtl testing
Etl testing
 

More from christkv

New in MongoDB 2.6
New in MongoDB 2.6New in MongoDB 2.6
New in MongoDB 2.6christkv
 
Lessons from 4 years of driver develoment
Lessons from 4 years of driver develomentLessons from 4 years of driver develoment
Lessons from 4 years of driver develomentchristkv
 
Storage talk
Storage talkStorage talk
Storage talkchristkv
 
Mongo db ecommerce
Mongo db ecommerceMongo db ecommerce
Mongo db ecommercechristkv
 
Cdr stats-vo ip-analytics_solution_mongodb_meetup
Cdr stats-vo ip-analytics_solution_mongodb_meetupCdr stats-vo ip-analytics_solution_mongodb_meetup
Cdr stats-vo ip-analytics_solution_mongodb_meetupchristkv
 
Mongodb intro
Mongodb introMongodb intro
Mongodb introchristkv
 
Schema design
Schema designSchema design
Schema designchristkv
 
Node js mongodriver
Node js mongodriverNode js mongodriver
Node js mongodriverchristkv
 
Node.js and ruby
Node.js and rubyNode.js and ruby
Node.js and rubychristkv
 

More from christkv (9)

New in MongoDB 2.6
New in MongoDB 2.6New in MongoDB 2.6
New in MongoDB 2.6
 
Lessons from 4 years of driver develoment
Lessons from 4 years of driver develomentLessons from 4 years of driver develoment
Lessons from 4 years of driver develoment
 
Storage talk
Storage talkStorage talk
Storage talk
 
Mongo db ecommerce
Mongo db ecommerceMongo db ecommerce
Mongo db ecommerce
 
Cdr stats-vo ip-analytics_solution_mongodb_meetup
Cdr stats-vo ip-analytics_solution_mongodb_meetupCdr stats-vo ip-analytics_solution_mongodb_meetup
Cdr stats-vo ip-analytics_solution_mongodb_meetup
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
Schema design
Schema designSchema design
Schema design
 
Node js mongodriver
Node js mongodriverNode js mongodriver
Node js mongodriver
 
Node.js and ruby
Node.js and rubyNode.js and ruby
Node.js and ruby
 

Recently uploaded

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 

Recently uploaded (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 

From SQL to MongoDB

  • 1. Christian Amor Kvalheim (MongoDB Staff Engineer) From SQL to MongoDB How to get from A to B in a reasonably ordered fashion
  • 2. Whats Up ❖ The Challenge ❖ Explicit Schema ❖ Implicit Schema ❖ Rules of Thumb ❖ Summary
  • 3. The Challenge Take an existing SQL Schema and pick an Appropriate MongoDB Schema
  • 5. Explicit Schema ❖ Table structure definition ❖ Primary Key definition ❖ Foreign Key relationships ❖ 1:n ❖ 1:1 ❖ n:m
  • 6. Implicit Schema ❖ The SQL Schema as expressed by the following operations and their associated metadata ❖ Insert operation ❖ Update operations ❖ Select Operations ❖ Join relationships
  • 9. No duplication of Data 1 n 1 n n 1 1 n 1 n 1 n 1 n 1 n
  • 10. Collections ❖ Customers ❖ Payments [array] ❖ Orders ❖ Orderdetails ❖ Employees ❖ Products ❖ Productlines ❖ Offices
  • 11. What if we allow duplication ? 1 n 1 n n 1 1 n 1 n 1 n 1 n 1 n
  • 12. Collections ❖ Customers ❖ Payments [array] ❖ Orders ❖ Orderdetails [array] ❖ Products [document] ❖ Productlines [document] ❖ Offices ❖ Products ❖ Productlines
  • 13. Important Notes ❖ Foreign Key Relationship in most cases are not representative of application level queries ❖ Cannot discover the degree of mutability looking at the SQL in isolation ❖ Cannot know how the average sizes of n in the 1:n relationships
  • 15. Implicit Schema ❖ The Implicit Schema represents the SQL operations executed against the relational schema (Application Schema) ❖ Can vary hugely from the foreign key relationships ❖ Expresses read/vs write ratios for tables ❖ Can be used to deduct entity mutability ❖ Can be used to estimate n in the 1:n relationships
  • 16. Example - SELECT ❖ SELECT * FROM orders, orderdetails, products WHERE …. [1000] ❖ SELECT * FROM offices, employees WHERE … [100] ❖ SELECT * FROM productlines, products WHERE … [2000] ❖ SELECT * FROM products WHERE … [4000] ❖ SELECT * FROM employees, customers WHERE … [200] ❖ SELECT * FROM customers, orders WHERE … [200]
  • 17. What We Can Learn ❖ The frequency of the SQL operations ❖ The Application Schema relationships studying the join relationships. ❖ If the logs include the number of rows returned we can make estimates for the size of n in the 1:n relationships ❖ We can also calculate the rate of growth of the n over time
  • 18. 1 ~5 (+1 every 100 min) 1 ~12 1 1 1 ~10001 ~15 1 ~10 1 ~20 1 ~6
  • 19. Example - INSERT/UPDATE ❖ INSERT (…) VALUES (…) INTO orders ❖ INSERT (…) VALUES (…) INTO order details ❖ UPDATE … orders WHERE orderNumber = 1
  • 21. ❖ Single Item Mutability Rate (SIMR) ❖ How much an entity mutates in a given time period ❖ A low mutation rate ❖ Entity reaches a stable state and is a good candidate for rolling up into a single document ❖ Duplication of data is ok as the document is a snapshot in time ❖ A high mutation rate ❖ Entity does not reach a stable state and keeps mutating and might not be a good candidate for rollup Single Item Mutability Rate
  • 22. ❖ Order life span example ❖ An order gets created at T=0 ❖ 10 order details are created at T+1 ❖ Order is filled and order record updated T+10 ❖ Order is shipped and order record updated T+15 ❖ Past T+15 there are no more mutations Order Life Span Example
  • 23. Order Life Span Example T T = 0 Order Created T = 1 Added 10 Order Details T = 10 Order Fulfilled T = 15 Order Shipped
  • 25. ❖ Customer[1:n] -> Payment relationship ❖ A payment created at T=5, T=50 ❖ Customer[1:n] -> Orders ❖ An order created at T=0, T=15, T=20, T=45 ❖ Unbound Relationships Customer Life Span Example
  • 26. Customer Life Span Example T T = 0 Order Created T = 1 T = 5 Payment Created T = 15 Order Created Order Created T = 20 Order Created
  • 28. ❖ The recursive relationship for Employees makes it unsuitable for rolling up ❖ The same recursive relationship also affects the offices- >employees relationship ❖ The ProductLines -> Products relationship are big and possibly unbound And The Rest ?
  • 30. 1. SQL Schema + Foreign Key Relationships ❖ Only have the Explicit Relationships and Table definitions 2. SQL Operations Logs (mysql general log) ❖ Contains only SQL operations (no result set size) 3. Full SQL Operations Logs (mysql slow log) ❖ Contains SQL operations (result set size, latencies) Levels Of Information
  • 31. 1. Use Selects with Joins to draw the new relationship 2. Establish the average n join relationship 3. Establish the mutation rate of over time ❖ Does the relationship go static ? ❖ Are the relationships unbound ? (growing n) Analysis Steps
  • 32. 1. Roll up relationships 1. If entity relationship reaches a static state 2. If the rate of growth of n is slow enough for the relationship to be static (analyst discretion) 2. Don’t rollup relationships 1. If the rate of mutability is high 2. If the average size of n is huge 3. If the mutation rate of the entity is large 4. If an entity has a recursive relationship Algorithms
  • 34. 1 1 ~12 1 1 1 ~10001 ~15 1 ~10 1 ~20 1 ~6 ~5 (+1 every 5 days) Collapsing, Duplicating Products and Productlines
  • 35. 1 1 ~12 1 1 1 ~10001 ~15 1 ~10 1 ~20 1 ~6 ~5 (+1 every 5 days) Collapsing, Duplicating Products and Productlines Collapsing Payments into Customers as never queried separately
  • 36. Collections ❖ Customers ❖ Payments [array] ❖ Orders ❖ Orderdetails [array] ❖ Products [document] ❖ Productlines [document] ❖ Offices ❖ Products ❖ Productlines
  • 38. 1. We are working on building tooling to help 1. Analyze your relational schema 2. Propose schema recommendations 3. Load and transform your data 2. Push the whole subject of schema transformation forward doing something never done before Tooling
  • 39. 1. Are Operation Latencies important for recommending a Schema ? 2. Can one quantify a schema recommendation (is recommendation A better than B and if, then why ?) 3. Can Machine Learning produce better recommendations ? 4. … etc etc Tons of Open Questions
  • 40. Are you ready to build a new team, to build a brand new product, and to create a whole new category of products for the most popular NoSQL database?  MongoDB, the leader in NoSQL databases is building a new team in Dublin. This team will develop products that help our customer adopt our technology by analyzing their legacy relational systems. We need someone who is going to participate in the research, partner with our staff engineers who are prototyping solutions, write production ready code, and build a team. This person will report to the Director of Integrations at MongoDB. Come Work With Us http://grnh.se/ge1rfp1