SlideShare a Scribd company logo
Automated Schema Design for
NoSQL Databases
Michael J. Mior
Supervised by Kenneth Salem
Schema optimization
For a given workload, construct the optimal
physical schema to answer queries in the
workload for a given storage budget
What is NoSQL?
● Not SQL?
● NoACID?
● Not only SQL
Types of NoSQL Databases
● Document stores
● Key-value stores
● Graph databases
● …
● Wide column stores
Cassandra data model
Cassandra data model
Queries specify
A set of row keys
and either
A list of column keys
or
A prefix of the column keys with a range
Cassandra schema design
Physical schema
POIsWithHotels
POI17
Hotel3
“Holiday Inn”
…
…
No single obvious mapping from the conceptual schema
Conceptual schema
Hotel
PointOfInterest
Near
Hotel8
“Motel 6”
Cassandra schema design
POIsWithHotels
POI17
Hotel3
“Holiday Inn”
…
…
HotelsWithPOIs
Hotel3
POI17
“Liberty Bell”
…
…
● Multiple choices of logical and
physical schema are possible
for a given conceptual schema
● Different choices are more
suitable for different queries
● Harder to add additional
structures in the future
Cassandra schema design
POIs
POI17
Name
“Liberty Bell”
…
…
Hotels
Hotel3
Name
“Holiday Inn”
…
…
HotelToPOI
Hotel3
POI17
(null)
…
…
POIToHotel
POI17
Hotel3
(null)
…
…
Cassandra schema design
POIToHotel
POI17
Hotel3
“Holiday Inn”
…
…
HotelToPOI
Hotel3
POI17
“Liberty Bell”
…
…
HotelToPOI
HotelID
POIID
(null)
…
…
POIToHotel
POIID
HotelID
(null)
…
…
Challenges
● Storage costs
● View maintenance
● Transactions
● Data distribution
Relational schema design
● Commercial database vendors have existing tools
○ Microsoft AutoAdmin for SQL Server
○ DB2 Design Advisor
○ Oracle SQL Tuning Advisor
● These tools don’t solve all the problems for NoSQL databases
● Usually an existing schema is assumed, and tools suggest secondary
indices and materialized views
Relational schema design
Conceptual schema Logical schema Physical schema
Hotel
CREATE TABLE Hotel(…);
CREATE TABLE POI(…);
CREATE TABLE HotelToPOI(…,
FOREIGN KEY (HotelID)
REFERENCES Hotel(HotelID)
FOREIGN KEY (POOID)
REFERENCES POI(POIID)
);
CREATE MATERIALIZED VIEW
POIsWithHotels AS SELECT
POI.*, Hotel.* FROM
POI, HotelToPOI, Hotel
WHERE HotelToPOI.HotelID
= Hotel.HotelID
AND HotelToPOI.POIID
= POI.POIID;
Mapping from conceptual to logical schema is straightforward
Near
PointOfInterest
Cassandra schema design
Physical schema
POIsWithHotels
POI17
Hotel3
“Holiday Inn”
…
…
No single obvious mapping from the conceptual schema
Conceptual schema
Hotel
Near
Hotel8
“Motel 6”
PointOfInterest
Abstract example
Query paths
Get all points of interests near
hotels a guest has stayed at
SELECT Name FROM POI WHERE
POI.HotelToPOI
.Hotel.Room.Reservation
.Guest.GuestID = ?
Guest Reservation
Room
Hotel
Point of
Interest
Index paths
● Queries can “skip” entities
on a path using an index
● Indices to the final entity
are equivalent to
materialized views
● Schema design is now the
selection of these indices
Guest Reservation
Room
Hotel
Point of
Interest
Problem definition
Input:
● ER diagram with entity counts and cardinalities
● Weighted queries over the diagram
● Storage constraint
● Cost model for target DB
Output:
● Suggested index configuration
● Plans for executing each query
Proposed solution
1. Enumerate possible useful indices for all queries
2. Determine the benefit of a given index for each query
3. Select the optimal indices for a given space constraint
Future Work
● Retargeting to other NoSQL DBs
● Automatic implementation of query plans
● Handling workloads which include updates
● Richer query language
● Exploitation of DBMS-specific features
Summary
● Schema design in NoSQL databases
presents unique challenges
● A workload-driven approach is necessary
● Conceptual schemata and queries are viable
abstractions for NoSQL schema design tools
Query language
SELECT [attributes] FROM [entity]
WHERE [path](=|<|<=|>|>=) [value] AND …
ORDER BY [path] LIMIT [count]
Conjunctive queries with simple predicates and ordering
Higher level queries can be composed by the application
Solution
Index
Enumerator
Query Planner
Index Selection
Index benefit
analysis
Query Plans
Selected IndicesWorkload
Input Output
Implementation
● Focus on Cassandra as target system
● Simple cost model for read-only in-memory workloads
● DBMS-independent index enumerator and query planner
● Index selection ILP implemented with GLPK
Evaluation - TBD
● Re-implement existing workloads in the target DBMS (e.
g. RUBiS, RUBBoS)
● Develop new benchmarks
● Comparison with a human DBA
● Random ER diagrams/queries with realistic properties

More Related Content

What's hot

Big Data
Big DataBig Data
Big Data
Fernando Parra
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
SQL & NoSQL
SQL & NoSQLSQL & NoSQL
Copy of MongoDB .pptx
Copy of MongoDB .pptxCopy of MongoDB .pptx
Copy of MongoDB .pptx
nehabsairam
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jjexp
 
Neo4j 4 Overview
Neo4j 4 OverviewNeo4j 4 Overview
Neo4j 4 Overview
Neo4j
 
Lineas de investigacion. sistemas
Lineas de investigacion. sistemasLineas de investigacion. sistemas
Lineas de investigacion. sistemas
Andres Colcha Nuñez
 
Neo4j Webinar: Graphs in banking
Neo4j Webinar:  Graphs in banking Neo4j Webinar:  Graphs in banking
Neo4j Webinar: Graphs in banking
Neo4j
 
Antecedentes Históricos de las Bases de Datos
Antecedentes Históricos de las Bases de DatosAntecedentes Históricos de las Bases de Datos
Antecedentes Históricos de las Bases de DatosVictor Alvaresz
 
Data warehouse
Data warehouseData warehouse
Data warehouse
Marian C.
 
¿Cómo implementar con éxito una solución de BI?
¿Cómo implementar con éxito una solución de BI?¿Cómo implementar con éxito una solución de BI?
¿Cómo implementar con éxito una solución de BI?
Daniel Chavez Flores
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
Fabio Fumarola
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
MongoDB
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
Lee Theobald
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
ateeq ateeq
 
Caracteristicas del modelo orientado a objetos
Caracteristicas del modelo orientado a objetosCaracteristicas del modelo orientado a objetos
Caracteristicas del modelo orientado a objetos
Jose Diaz Silva
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL Databases
Abiral Gautam
 
Schema migrations in no sql
Schema migrations in no sqlSchema migrations in no sql
Schema migrations in no sql
Dr-Dipali Meher
 
NoSql
NoSqlNoSql

What's hot (20)

Big Data
Big DataBig Data
Big Data
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
 
SQL & NoSQL
SQL & NoSQLSQL & NoSQL
SQL & NoSQL
 
Copy of MongoDB .pptx
Copy of MongoDB .pptxCopy of MongoDB .pptx
Copy of MongoDB .pptx
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
Neo4j 4 Overview
Neo4j 4 OverviewNeo4j 4 Overview
Neo4j 4 Overview
 
Lineas de investigacion. sistemas
Lineas de investigacion. sistemasLineas de investigacion. sistemas
Lineas de investigacion. sistemas
 
Neo4j Webinar: Graphs in banking
Neo4j Webinar:  Graphs in banking Neo4j Webinar:  Graphs in banking
Neo4j Webinar: Graphs in banking
 
Antecedentes Históricos de las Bases de Datos
Antecedentes Históricos de las Bases de DatosAntecedentes Históricos de las Bases de Datos
Antecedentes Históricos de las Bases de Datos
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
¿Cómo implementar con éxito una solución de BI?
¿Cómo implementar con éxito una solución de BI?¿Cómo implementar con éxito una solución de BI?
¿Cómo implementar con éxito una solución de BI?
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Smbd.
Smbd.Smbd.
Smbd.
 
Caracteristicas del modelo orientado a objetos
Caracteristicas del modelo orientado a objetosCaracteristicas del modelo orientado a objetos
Caracteristicas del modelo orientado a objetos
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL Databases
 
Schema migrations in no sql
Schema migrations in no sqlSchema migrations in no sql
Schema migrations in no sql
 
NoSql
NoSqlNoSql
NoSql
 

Viewers also liked

NoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
NoSQL Plus MySQL From MySQL Practitioner\'s Point Of ViewNoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
NoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
Alex Esterkin
 
NoSQL, Growing up at Oracle
NoSQL, Growing up at OracleNoSQL, Growing up at Oracle
NoSQL, Growing up at Oracle
DATAVERSITY
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
NoSQLmatters
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutes
Karen Lopez
 
NoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL ApplicationsNoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL Applications
Michael Mior
 
NoSQL and Data Modeling for Data Modelers
NoSQL and Data Modeling for Data ModelersNoSQL and Data Modeling for Data Modelers
NoSQL and Data Modeling for Data Modelers
Karen Lopez
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Sid Anand
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
DATAVERSITY
 
Non-Relational Databases & Key/Value Stores
Non-Relational Databases & Key/Value StoresNon-Relational Databases & Key/Value Stores
Non-Relational Databases & Key/Value Stores
Joël Perras
 
Persistence Smoothie: Blending SQL and NoSQL (RubyNation Edition)
Persistence  Smoothie: Blending SQL and NoSQL (RubyNation Edition)Persistence  Smoothie: Blending SQL and NoSQL (RubyNation Edition)
Persistence Smoothie: Blending SQL and NoSQL (RubyNation Edition)
Michael Bleigh
 
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Alexandre Morgaut
 
NoSQL meets Microservices
NoSQL meets MicroservicesNoSQL meets Microservices
NoSQL meets Microservices
ArangoDB Database
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
Fabio Fumarola
 
Real-World NoSQL Schema Design
Real-World NoSQL Schema DesignReal-World NoSQL Schema Design
Real-World NoSQL Schema Design
DataWorks Summit/Hadoop Summit
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
Fabio Fumarola
 
Query mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesQuery mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesArangoDB Database
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
Patrick McFadin
 
The 7 Key Ingredients of Web Content and Experience Management
The 7 Key Ingredients of Web Content and Experience ManagementThe 7 Key Ingredients of Web Content and Experience Management
The 7 Key Ingredients of Web Content and Experience Management
rivetlogic
 
NoSQL Design Considerations and Lessons Learned
NoSQL Design Considerations and Lessons LearnedNoSQL Design Considerations and Lessons Learned
NoSQL Design Considerations and Lessons Learned
rivetlogic
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 

Viewers also liked (20)

NoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
NoSQL Plus MySQL From MySQL Practitioner\'s Point Of ViewNoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
NoSQL Plus MySQL From MySQL Practitioner\'s Point Of View
 
NoSQL, Growing up at Oracle
NoSQL, Growing up at OracleNoSQL, Growing up at Oracle
NoSQL, Growing up at Oracle
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutes
 
NoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL ApplicationsNoSE: Schema Design for NoSQL Applications
NoSE: Schema Design for NoSQL Applications
 
NoSQL and Data Modeling for Data Modelers
NoSQL and Data Modeling for Data ModelersNoSQL and Data Modeling for Data Modelers
NoSQL and Data Modeling for Data Modelers
 
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)Software Developer and Architecture @ LinkedIn (QCon SF 2014)
Software Developer and Architecture @ LinkedIn (QCon SF 2014)
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
 
Non-Relational Databases & Key/Value Stores
Non-Relational Databases & Key/Value StoresNon-Relational Databases & Key/Value Stores
Non-Relational Databases & Key/Value Stores
 
Persistence Smoothie: Blending SQL and NoSQL (RubyNation Edition)
Persistence  Smoothie: Blending SQL and NoSQL (RubyNation Edition)Persistence  Smoothie: Blending SQL and NoSQL (RubyNation Edition)
Persistence Smoothie: Blending SQL and NoSQL (RubyNation Edition)
 
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
 
NoSQL meets Microservices
NoSQL meets MicroservicesNoSQL meets Microservices
NoSQL meets Microservices
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Real-World NoSQL Schema Design
Real-World NoSQL Schema DesignReal-World NoSQL Schema Design
Real-World NoSQL Schema Design
 
5 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/25 Data Modeling for NoSQL 1/2
5 Data Modeling for NoSQL 1/2
 
Query mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesQuery mechanisms for NoSQL databases
Query mechanisms for NoSQL databases
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
The 7 Key Ingredients of Web Content and Experience Management
The 7 Key Ingredients of Web Content and Experience ManagementThe 7 Key Ingredients of Web Content and Experience Management
The 7 Key Ingredients of Web Content and Experience Management
 
NoSQL Design Considerations and Lessons Learned
NoSQL Design Considerations and Lessons LearnedNoSQL Design Considerations and Lessons Learned
NoSQL Design Considerations and Lessons Learned
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 

Similar to Automated Schema Design for NoSQL Databases

Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDB
MongoDB
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
MongoDB
 
Building your First MEAN App
Building your First MEAN AppBuilding your First MEAN App
Building your First MEAN AppMongoDB
 
FULL stack -> MEAN stack
FULL stack -> MEAN stackFULL stack -> MEAN stack
FULL stack -> MEAN stack
Ashok Raj
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design Patterns
Matthew Kalan
 
Database.pdf
Database.pdfDatabase.pdf
Database.pdf
stephanedjeukam1
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
Brent Ozar
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
Zaid Shabbir
 
Developers Guide to Cosmos DB
Developers Guide to Cosmos DBDevelopers Guide to Cosmos DB
Developers Guide to Cosmos DB
Seven Peaks Speaks
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
Data Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysData Integration Solutions Created By Koneksys
Data Integration Solutions Created By Koneksys
Koneksys
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Ali Hodroj
 
MongoDB.pdf
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
KuldeepKumar778733
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Research
 
Orm and hibernate
Orm and hibernateOrm and hibernate
Orm and hibernate
s4al_com
 
Physical Design for Non-Relational Data Systems
Physical Design for Non-Relational Data SystemsPhysical Design for Non-Relational Data Systems
Physical Design for Non-Relational Data Systems
Michael Mior
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Garindra Prahandono
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
Sarang Shravagi
 
Presentation
PresentationPresentation
Presentation
Dimitris Stripelis
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest Córdoba
Jose Mº Muñoz
 

Similar to Automated Schema Design for NoSQL Databases (20)

Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDB
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Building your First MEAN App
Building your First MEAN AppBuilding your First MEAN App
Building your First MEAN App
 
FULL stack -> MEAN stack
FULL stack -> MEAN stackFULL stack -> MEAN stack
FULL stack -> MEAN stack
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design Patterns
 
Database.pdf
Database.pdfDatabase.pdf
Database.pdf
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
 
Developers Guide to Cosmos DB
Developers Guide to Cosmos DBDevelopers Guide to Cosmos DB
Developers Guide to Cosmos DB
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Data Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysData Integration Solutions Created By Koneksys
Data Integration Solutions Created By Koneksys
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
MongoDB.pdf
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
 
Orm and hibernate
Orm and hibernateOrm and hibernate
Orm and hibernate
 
Physical Design for Non-Relational Data Systems
Physical Design for Non-Relational Data SystemsPhysical Design for Non-Relational Data Systems
Physical Design for Non-Relational Data Systems
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Presentation
PresentationPresentation
Presentation
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest Córdoba
 

Recently uploaded

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 

Automated Schema Design for NoSQL Databases

  • 1. Automated Schema Design for NoSQL Databases Michael J. Mior Supervised by Kenneth Salem
  • 2. Schema optimization For a given workload, construct the optimal physical schema to answer queries in the workload for a given storage budget
  • 3. What is NoSQL? ● Not SQL? ● NoACID? ● Not only SQL
  • 4. Types of NoSQL Databases ● Document stores ● Key-value stores ● Graph databases ● … ● Wide column stores
  • 6. Cassandra data model Queries specify A set of row keys and either A list of column keys or A prefix of the column keys with a range
  • 7. Cassandra schema design Physical schema POIsWithHotels POI17 Hotel3 “Holiday Inn” … … No single obvious mapping from the conceptual schema Conceptual schema Hotel PointOfInterest Near Hotel8 “Motel 6”
  • 8. Cassandra schema design POIsWithHotels POI17 Hotel3 “Holiday Inn” … … HotelsWithPOIs Hotel3 POI17 “Liberty Bell” … … ● Multiple choices of logical and physical schema are possible for a given conceptual schema ● Different choices are more suitable for different queries ● Harder to add additional structures in the future
  • 9. Cassandra schema design POIs POI17 Name “Liberty Bell” … … Hotels Hotel3 Name “Holiday Inn” … … HotelToPOI Hotel3 POI17 (null) … … POIToHotel POI17 Hotel3 (null) … …
  • 10. Cassandra schema design POIToHotel POI17 Hotel3 “Holiday Inn” … … HotelToPOI Hotel3 POI17 “Liberty Bell” … … HotelToPOI HotelID POIID (null) … … POIToHotel POIID HotelID (null) … …
  • 11. Challenges ● Storage costs ● View maintenance ● Transactions ● Data distribution
  • 12. Relational schema design ● Commercial database vendors have existing tools ○ Microsoft AutoAdmin for SQL Server ○ DB2 Design Advisor ○ Oracle SQL Tuning Advisor ● These tools don’t solve all the problems for NoSQL databases ● Usually an existing schema is assumed, and tools suggest secondary indices and materialized views
  • 13. Relational schema design Conceptual schema Logical schema Physical schema Hotel CREATE TABLE Hotel(…); CREATE TABLE POI(…); CREATE TABLE HotelToPOI(…, FOREIGN KEY (HotelID) REFERENCES Hotel(HotelID) FOREIGN KEY (POOID) REFERENCES POI(POIID) ); CREATE MATERIALIZED VIEW POIsWithHotels AS SELECT POI.*, Hotel.* FROM POI, HotelToPOI, Hotel WHERE HotelToPOI.HotelID = Hotel.HotelID AND HotelToPOI.POIID = POI.POIID; Mapping from conceptual to logical schema is straightforward Near PointOfInterest
  • 14. Cassandra schema design Physical schema POIsWithHotels POI17 Hotel3 “Holiday Inn” … … No single obvious mapping from the conceptual schema Conceptual schema Hotel Near Hotel8 “Motel 6” PointOfInterest
  • 16. Query paths Get all points of interests near hotels a guest has stayed at SELECT Name FROM POI WHERE POI.HotelToPOI .Hotel.Room.Reservation .Guest.GuestID = ? Guest Reservation Room Hotel Point of Interest
  • 17. Index paths ● Queries can “skip” entities on a path using an index ● Indices to the final entity are equivalent to materialized views ● Schema design is now the selection of these indices Guest Reservation Room Hotel Point of Interest
  • 18. Problem definition Input: ● ER diagram with entity counts and cardinalities ● Weighted queries over the diagram ● Storage constraint ● Cost model for target DB Output: ● Suggested index configuration ● Plans for executing each query
  • 19. Proposed solution 1. Enumerate possible useful indices for all queries 2. Determine the benefit of a given index for each query 3. Select the optimal indices for a given space constraint
  • 20. Future Work ● Retargeting to other NoSQL DBs ● Automatic implementation of query plans ● Handling workloads which include updates ● Richer query language ● Exploitation of DBMS-specific features
  • 21. Summary ● Schema design in NoSQL databases presents unique challenges ● A workload-driven approach is necessary ● Conceptual schemata and queries are viable abstractions for NoSQL schema design tools
  • 22. Query language SELECT [attributes] FROM [entity] WHERE [path](=|<|<=|>|>=) [value] AND … ORDER BY [path] LIMIT [count] Conjunctive queries with simple predicates and ordering Higher level queries can be composed by the application
  • 23. Solution Index Enumerator Query Planner Index Selection Index benefit analysis Query Plans Selected IndicesWorkload Input Output
  • 24. Implementation ● Focus on Cassandra as target system ● Simple cost model for read-only in-memory workloads ● DBMS-independent index enumerator and query planner ● Index selection ILP implemented with GLPK
  • 25. Evaluation - TBD ● Re-implement existing workloads in the target DBMS (e. g. RUBiS, RUBBoS) ● Develop new benchmarks ● Comparison with a human DBA ● Random ER diagrams/queries with realistic properties