SlideShare a Scribd company logo
1 of 21
Lyticsware
technologies
FROM DATABASES OPTIMIZERS TO DATA SCIENCE
Books and
DVD store
exemple
Use case 1
Use case 1
 The figures are:
 10 000 articles in total!
 50% of books
 50% of DVDs
 50% of products in English langage
 50% of products in French langage
 So what is the fraction of rows when
the langage is English and the product
is a DVD?
Use case 1
 select * from test_orders where language='english' and product='books’;
 select * from test_orders where language='english' and product='DVD’;
 select * from test_orders where language='french' and product='DVD’;
 select * from test_orders where language='french' and product='books';
Use case 1
(Oracle)
 So , for the optimizer, the estimation
of fraction of 10000 rows when
querying both the langage and the
product is simply:
 P(books) x P(product)= ?
 P(50%) x p(50%) = P(25%)
 Simple !!! 2500 rows !!!
 But … WRONG !!!

Use case 1 (SQL
Server )
 So , for the optimizer, the estimation
of fraction of 10000 rows when
querying both the langage and the
product is simply:
 P(books) x P(product)= ?
 P(50%) x p(50%) = P(25%)
 Simple !!! 2500 rows !!!

Use case 1 (Oracle)
 With Oracle, we can use extended
statistics or dynamic sampling to
solve this problem. We used the
dynamic sampling in our exemple
and the estimation is much better
for the small fraction (about 100
rows)
Use case 1 (Oracle)
 Much better estimation for the
big fraction as well( 4900 rows)
Use case 1 (SQL Server)
 Good estimation with suitable
index (with where clause!!!) for
the big fraction
Use case 1 (SQL Server)
 Better estimation with suitable
index (with where clause!!!) for
the small fraction
Use case 1, one NOSQL exemple:
MongoDB
 MongoDB always uses an index if
the index exists, in spite of the
good estimate
Use case 1, one NOSQL
exemple: MongoDB
 Good estimate, but (most
probably) the wrong plan
Use case 1, one NOSQL exemple:
MongoDB
 The solution shown is to use the
hint
The way SQL Server did it…
 The histograms and statistics
What could be a data scientist way of
thinking on this ?
 P(product) , P(langage) , P(product) x
P(langage) ???
 We have dependent variables, so why not use
the Bayes theorem!
 P(A|B)= P(B|A)* P(A)/P(B)
 P(product|langage)=P(langage|product)*
p(product)/p(langage)
 P(DVD|french)=P(french|DVD)*P(DVD)/P(f
rench)
What could be a data scientist way of
thinking on this ?
 P(DVD|french)=P(french|DVD)*P(DVD)/P(french)
 P(french|DVD)=10%, P(DVD)=50%, P(french)=50%
 P(DVD|french)=10%
Database optimizers and machine
learning ?
 Mostly standard statistics are still used …
 DB2 intelligent optimizer, Oracle 20c, it’s only a begining.
 So while waiting for optimizers to became more intelligent and fully use
machine learning ….
The classical way of thinking when
tuning
 Oracle: adjust SGA, PGA, parallelism, create indexes, create materialized
views …
 SQL Server: Adjust parameters with SP_configure, adjust parallelism,
create/rebuild indexes
 ETC Every database has its own parameters to tune memory/disks, IOs,
CPUs …
 Those techniques are of course still needed but….
 If you think to tune with really understanding your data, understanding a)
cardinalities, b) correlation, c) dispersion and even d) causalities inside your
data then…
…you will be able tune almost every
database !!!
 SQL or NOSQL !!!
 All of them had similar principles, so once you learn, you will be able to
tune them…
Lyticsware
 Lyticsware is a young innovative
company that can help you to
tune your databases
 We are also partners of Amazon
Web Services and we are
helping our clients to migrate
their databases /informations
systems to cloud architectures

More Related Content

Similar to From Database Optimizers To Data Science

Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthrough
gabriellekuruvilla
 

Similar to From Database Optimizers To Data Science (20)

Python overview
Python overviewPython overview
Python overview
 
Sequencing run grief counseling: counting kmers at MG-RAST
Sequencing run grief counseling: counting kmers at MG-RASTSequencing run grief counseling: counting kmers at MG-RAST
Sequencing run grief counseling: counting kmers at MG-RAST
 
The Ring programming language version 1.8 book - Part 93 of 202
The Ring programming language version 1.8 book - Part 93 of 202The Ring programming language version 1.8 book - Part 93 of 202
The Ring programming language version 1.8 book - Part 93 of 202
 
TDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-LanguageTDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-Language
 
AI in Production
AI in ProductionAI in Production
AI in Production
 
ForLoops.pptx
ForLoops.pptxForLoops.pptx
ForLoops.pptx
 
Mixed Effects Models - Empirical Logit
Mixed Effects Models - Empirical LogitMixed Effects Models - Empirical Logit
Mixed Effects Models - Empirical Logit
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
 
Machine Learning on Code - SF meetup
Machine Learning on Code - SF meetupMachine Learning on Code - SF meetup
Machine Learning on Code - SF meetup
 
CPPDS Slide.pdf
CPPDS Slide.pdfCPPDS Slide.pdf
CPPDS Slide.pdf
 
Python Workshop
Python WorkshopPython Workshop
Python Workshop
 
if statements in Python -A lecture class
if statements in Python -A lecture classif statements in Python -A lecture class
if statements in Python -A lecture class
 
Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthrough
 
AI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, FlexudyAI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, Flexudy
 
Logic programming in python
Logic programming in pythonLogic programming in python
Logic programming in python
 
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
 
Assumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourselfAssumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourself
 
NLP Project Full Cycle
NLP Project Full CycleNLP Project Full Cycle
NLP Project Full Cycle
 
4535092.ppt
4535092.ppt4535092.ppt
4535092.ppt
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

From Database Optimizers To Data Science

  • 4. Use case 1  The figures are:  10 000 articles in total!  50% of books  50% of DVDs  50% of products in English langage  50% of products in French langage  So what is the fraction of rows when the langage is English and the product is a DVD?
  • 5. Use case 1  select * from test_orders where language='english' and product='books’;  select * from test_orders where language='english' and product='DVD’;  select * from test_orders where language='french' and product='DVD’;  select * from test_orders where language='french' and product='books';
  • 6. Use case 1 (Oracle)  So , for the optimizer, the estimation of fraction of 10000 rows when querying both the langage and the product is simply:  P(books) x P(product)= ?  P(50%) x p(50%) = P(25%)  Simple !!! 2500 rows !!!  But … WRONG !!! 
  • 7. Use case 1 (SQL Server )  So , for the optimizer, the estimation of fraction of 10000 rows when querying both the langage and the product is simply:  P(books) x P(product)= ?  P(50%) x p(50%) = P(25%)  Simple !!! 2500 rows !!! 
  • 8. Use case 1 (Oracle)  With Oracle, we can use extended statistics or dynamic sampling to solve this problem. We used the dynamic sampling in our exemple and the estimation is much better for the small fraction (about 100 rows)
  • 9. Use case 1 (Oracle)  Much better estimation for the big fraction as well( 4900 rows)
  • 10. Use case 1 (SQL Server)  Good estimation with suitable index (with where clause!!!) for the big fraction
  • 11. Use case 1 (SQL Server)  Better estimation with suitable index (with where clause!!!) for the small fraction
  • 12. Use case 1, one NOSQL exemple: MongoDB  MongoDB always uses an index if the index exists, in spite of the good estimate
  • 13. Use case 1, one NOSQL exemple: MongoDB  Good estimate, but (most probably) the wrong plan
  • 14. Use case 1, one NOSQL exemple: MongoDB  The solution shown is to use the hint
  • 15. The way SQL Server did it…  The histograms and statistics
  • 16. What could be a data scientist way of thinking on this ?  P(product) , P(langage) , P(product) x P(langage) ???  We have dependent variables, so why not use the Bayes theorem!  P(A|B)= P(B|A)* P(A)/P(B)  P(product|langage)=P(langage|product)* p(product)/p(langage)  P(DVD|french)=P(french|DVD)*P(DVD)/P(f rench)
  • 17. What could be a data scientist way of thinking on this ?  P(DVD|french)=P(french|DVD)*P(DVD)/P(french)  P(french|DVD)=10%, P(DVD)=50%, P(french)=50%  P(DVD|french)=10%
  • 18. Database optimizers and machine learning ?  Mostly standard statistics are still used …  DB2 intelligent optimizer, Oracle 20c, it’s only a begining.  So while waiting for optimizers to became more intelligent and fully use machine learning ….
  • 19. The classical way of thinking when tuning  Oracle: adjust SGA, PGA, parallelism, create indexes, create materialized views …  SQL Server: Adjust parameters with SP_configure, adjust parallelism, create/rebuild indexes  ETC Every database has its own parameters to tune memory/disks, IOs, CPUs …  Those techniques are of course still needed but….  If you think to tune with really understanding your data, understanding a) cardinalities, b) correlation, c) dispersion and even d) causalities inside your data then…
  • 20. …you will be able tune almost every database !!!  SQL or NOSQL !!!  All of them had similar principles, so once you learn, you will be able to tune them…
  • 21. Lyticsware  Lyticsware is a young innovative company that can help you to tune your databases  We are also partners of Amazon Web Services and we are helping our clients to migrate their databases /informations systems to cloud architectures