From Database Optimizers To Data Science

•Download as PPTX, PDF•

0 likes•46 views

Lyticsware
technologies
FROM DATABASES OPTIMIZERS TO DATA SCIENCE

$Use case 1  The figures are:  10 000 articles in total!  50% of books  50% of DVDs  50% of products in English langage  50% of products in French langage  So what is the fraction of rows when the langage is English and the product is a DVD?$

Use case 1
 select * from test_orders where language='english' and product='books’;
 select * from test_orders where language='english' and product='DVD’;
 select * from test_orders where language='french' and product='DVD’;
 select * from test_orders where language='french' and product='books';

$Use case 1 (Oracle)  So , for the optimizer, the estimation of fraction of 10000 rows when querying both the langage and the product is simply:  P(books) x P(product)= ?  P(50%) x p(50%) = P(25%)  Simple !!! 2500 rows !!!  But … WRONG !!! $

$Use case 1 (SQL Server )  So , for the optimizer, the estimation of fraction of 10000 rows when querying both the langage and the product is simply:  P(books) x P(product)= ?  P(50%) x p(50%) = P(25%)  Simple !!! 2500 rows !!! $

$Use case 1 (Oracle)  With Oracle, we can use extended statistics or dynamic sampling to solve this problem. We used the dynamic sampling in our exemple and the estimation is much better for the small fraction (about 100 rows)$

$Use case 1 (Oracle)  Much better estimation for the big fraction as well( 4900 rows)$

$Use case 1 (SQL Server)  Good estimation with suitable index (with where clause!!!) for the big fraction$

$Use case 1 (SQL Server)  Better estimation with suitable index (with where clause!!!) for the small fraction$

Use case 1, one NOSQL exemple:
MongoDB
 MongoDB always uses an index if
the index exists, in spite of the
good estimate

Use case 1, one NOSQL
exemple: MongoDB
 Good estimate, but (most
probably) the wrong plan

Use case 1, one NOSQL exemple:
MongoDB
 The solution shown is to use the
hint

The way SQL Server did it…
 The histograms and statistics

What could be a data scientist way of
thinking on this ?
 P(product) , P(langage) , P(product) x
P(langage) ???
 We have dependent variables, so why not use
the Bayes theorem!
 P(A|B)= P(B|A)* P(A)/P(B)
 P(product|langage)=P(langage|product)*
p(product)/p(langage)
 P(DVD|french)=P(french|DVD)*P(DVD)/P(f
rench)

What could be a data scientist way of
thinking on this ?
 P(DVD|french)=P(french|DVD)*P(DVD)/P(french)
 P(french|DVD)=10%, P(DVD)=50%, P(french)=50%
 P(DVD|french)=10%

Database optimizers and machine
learning ?
 Mostly standard statistics are still used …
 DB2 intelligent optimizer, Oracle 20c, it’s only a begining.
 So while waiting for optimizers to became more intelligent and fully use
machine learning ….

The classical way of thinking when
tuning
 Oracle: adjust SGA, PGA, parallelism, create indexes, create materialized
views …
 SQL Server: Adjust parameters with SP_configure, adjust parallelism,
create/rebuild indexes
 ETC Every database has its own parameters to tune memory/disks, IOs,
CPUs …
 Those techniques are of course still needed but….
 If you think to tune with really understanding your data, understanding a)
cardinalities, b) correlation, c) dispersion and even d) causalities inside your
data then…

…you will be able tune almost every
database !!!
 SQL or NOSQL !!!
 All of them had similar principles, so once you learn, you will be able to
tune them…

Lyticsware
 Lyticsware is a young innovative
company that can help you to
tune your databases
 We are also partners of Amazon
Web Services and we are
helping our clients to migrate
their databases /informations
systems to cloud architectures

Similar to From Database Optimizers To Data Science

Python overview

Haroon Karim

Sequencing run grief counseling: counting kmers at MG-RAST

wltrimbl

The Ring programming language version 1.8 book - Part 93 of 202

Mahmoud Samir Fayed

TDC 2020 - Implementing a Mini-Language

Luciano Sabença

AI in Production

Giovanni Fernandez-Kincade

ForLoops.pptx

RabiyaZhexembayeva

Mixed Effects Models - Empirical Logit

Scott Fraundorf

tutorial.ppt

GuioGonza2

Music recommendations @ MLConf 2014

Erik Bernhardsson

source{d} is building the open-source components to enable large-scale code analysis and machine learning on source code. Their powerful tools can ingest all of the world’s public git repositories turning code into ASTs ready for machine learning and other analyses, all exposed through a flexible and friendly API. Francesc will show you how to run machine learning on source code with a series of live demos.

Machine Learning on Code - SF meetup

source{d}

CPPDS Slide.pdf

Fadlie Ahdon

Python Workshop

Saket Choudhary

if statements in Python -A lecture class

binzbinz3

Python and Pytorch tutorial and walkthrough

gabriellekuruvilla

In the quest of improving the quality of education, Flexudy leverages the power of AI to help people learn more efficiently. During the talk, I will show how we trained an automatic extractive text summarizer based on concepts from Reinforcement Learning, Deep Learning and Natural Language Processing. Also, I will talk about how we use pre-trained NLP models to generate simple questions for self-assessment.

AI applications in education, Pascal Zoleko, Flexudy

Erlangen Artificial Intelligence & Machine Learning Meetup

Logic programming in python

Pierre Carbonnelle

[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법

NAVER D2

Predicting the future is hard and it requires a lot of assumptions, also known as beliefs, also known as faith. In “Assumptions: Check yo self, before you wreck yo self” we explore the consequences of beliefs when constructing predictive models. We’ll walk through the process of developing a demand forecast for Evo, a Seattle-based outdoor recreation retailer, and discuss how assumptions influence the behavior of your application and ultimately the decisions you make.

Assumptions: Check yo'self before you wreck yourself

Erin Shellman

NLP Project Full Cycle

Vsevolod Dyomkin

4535092.ppt

ntabdelnaiem

Similar to From Database Optimizers To Data Science (20)

Python overview

Sequencing run grief counseling: counting kmers at MG-RAST

The Ring programming language version 1.8 book - Part 93 of 202

TDC 2020 - Implementing a Mini-Language

AI in Production

ForLoops.pptx

Mixed Effects Models - Empirical Logit

tutorial.ppt

Music recommendations @ MLConf 2014

Machine Learning on Code - SF meetup

CPPDS Slide.pdf

Python Workshop

if statements in Python -A lecture class

Python and Pytorch tutorial and walkthrough

AI applications in education, Pascal Zoleko, Flexudy

Logic programming in python

[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법

Assumptions: Check yo'self before you wreck yourself

NLP Project Full Cycle

4535092.ppt

Recently uploaded

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

The action of the next cyber saga takes place in the mystical lands of the Asia-Pacific region, where the main characters began their digital activities in the middle of 2021 and qualitatively strengthened it in 2022. Corporate espionage, document theft, audio recordings, and data leaks from messaging platforms were all a matter of one day for Dark Pink. Their geographical focus may have started in the Asia-Pacific region, but their ambitions knew no bounds, targeting a European government ministry in a bold move to expand their portfolio. Their victim profile was as diverse as a UN meeting, targeting military organizations, government agencies, and even a religious organization. Because discrimination is not a fashionable agenda. In the world of cybercrime, they serve as a reminder that sometimes the most serious threats come in the most unassuming packages with a pink bow.

Cyberprint. Dark Pink Apt Group [EN].pdf

Overkill Security

Angeliki Cooney has spent over twenty years at the forefront of the life sciences industry, working out of Wynantskill, NY. She is highly regarded for her dedication to advancing the development and accessibility of innovative treatments for chronic diseases, rare disorders, and cancer. Her professional journey has centered on strategic consulting for biopharmaceutical companies, facilitating digital transformation, enhancing omnichannel engagement, and refining strategic commercial practices. Angeliki's innovative contributions include pioneering several software-as-a-service (SaaS) products for the life sciences sector, earning her three patents. As the Senior Vice President of Life Sciences at Avenga, Angeliki orchestrated the firm's strategic entry into the U.S. market. Avenga, a renowned digital engineering and consulting firm, partners with significant entities in the pharmaceutical and biotechnology fields. Her leadership was instrumental in expanding Avenga's client base and establishing its presence in the competitive U.S. market.

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Angeliki Cooney

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

MINDCTI Revenue Release Quarter One 2024

MIND CTI

Accelerating FinTech Innovation: Unleashing API Economy and GenAI Vasa Krishnan, Chief Technology Officer - FinResults Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

apidays

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

Passkeys: Developing APIs to enable passwordless authentication Cody Salas, Sr Developer Advocate | Solutions Architect - Yubico Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

apidays

CNIC Information System with Pakdata Cf In Pakistan

danishmna97

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

MS Copilot expands with MS Graph connectors

Nanddeep Nachan

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

The microservices honeymoon is over. When starting a new project or revamping a legacy monolith, teams started looking for alternatives to microservices. The Modular Monolith, or 'Modulith', is an architecture that reaps the benefits of (vertical) functional decoupling without the high costs associated with separate deployments. This talk will delve into the advantages and challenges of this progressive architecture, beginning with exploring the concept of a 'module', its internal structure, public API, and inter-module communication patterns. Supported by spring-modulith, the talk provides practical guidance on addressing the main challenges of a Modultith Architecture: finding and guarding module boundaries, data decoupling, and integration module-testing. You should not miss this talk if you are a software architect or tech lead seeking practical, scalable solutions. About the author With two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Victor Rentea

Exploring Multimodal Embeddings with Milvus

Zilliz

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Zilliz

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Zilliz

DBX First Quarter 2024 Investor Presentation

Dropbox

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Recently uploaded (20)

presentation ICT roal in 21st century education

Cyberprint. Dark Pink Apt Group [EN].pdf

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

MINDCTI Revenue Release Quarter One 2024

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

CNIC Information System with Pakdata Cf In Pakistan

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Strategies for Landing an Oracle DBA Job as a Fresher

MS Copilot expands with MS Graph connectors

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Exploring Multimodal Embeddings with Milvus

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Why Teams call analytics are critical to your entire business

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

DBX First Quarter 2024 Investor Presentation

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

From Database Optimizers To Data Science

1. Lyticsware technologies FROM DATABASES OPTIMIZERS TO DATA SCIENCE

2. Books and DVD store exemple

3. Use case 1

4. Use case 1  The figures are:  10 000 articles in total!  50% of books  50% of DVDs  50% of products in English langage  50% of products in French langage  So what is the fraction of rows when the langage is English and the product is a DVD?

5. Use case 1  select * from test_orders where language='english' and product='books’;  select * from test_orders where language='english' and product='DVD’;  select * from test_orders where language='french' and product='DVD’;  select * from test_orders where language='french' and product='books';

6. Use case 1 (Oracle)  So , for the optimizer, the estimation of fraction of 10000 rows when querying both the langage and the product is simply:  P(books) x P(product)= ?  P(50%) x p(50%) = P(25%)  Simple !!! 2500 rows !!!  But … WRONG !!! 

7. Use case 1 (SQL Server )  So , for the optimizer, the estimation of fraction of 10000 rows when querying both the langage and the product is simply:  P(books) x P(product)= ?  P(50%) x p(50%) = P(25%)  Simple !!! 2500 rows !!! 

8. Use case 1 (Oracle)  With Oracle, we can use extended statistics or dynamic sampling to solve this problem. We used the dynamic sampling in our exemple and the estimation is much better for the small fraction (about 100 rows)

9. Use case 1 (Oracle)  Much better estimation for the big fraction as well( 4900 rows)

10. Use case 1 (SQL Server)  Good estimation with suitable index (with where clause!!!) for the big fraction

11. Use case 1 (SQL Server)  Better estimation with suitable index (with where clause!!!) for the small fraction

12. Use case 1, one NOSQL exemple: MongoDB  MongoDB always uses an index if the index exists, in spite of the good estimate

13. Use case 1, one NOSQL exemple: MongoDB  Good estimate, but (most probably) the wrong plan

14. Use case 1, one NOSQL exemple: MongoDB  The solution shown is to use the hint

15. The way SQL Server did it…  The histograms and statistics

16. What could be a data scientist way of thinking on this ?  P(product) , P(langage) , P(product) x P(langage) ???  We have dependent variables, so why not use the Bayes theorem!  P(A|B)= P(B|A)* P(A)/P(B)  P(product|langage)=P(langage|product)* p(product)/p(langage)  P(DVD|french)=P(french|DVD)*P(DVD)/P(f rench)

17. What could be a data scientist way of thinking on this ?  P(DVD|french)=P(french|DVD)*P(DVD)/P(french)  P(french|DVD)=10%, P(DVD)=50%, P(french)=50%  P(DVD|french)=10%

18. Database optimizers and machine learning ?  Mostly standard statistics are still used …  DB2 intelligent optimizer, Oracle 20c, it’s only a begining.  So while waiting for optimizers to became more intelligent and fully use machine learning ….

19. The classical way of thinking when tuning  Oracle: adjust SGA, PGA, parallelism, create indexes, create materialized views …  SQL Server: Adjust parameters with SP_configure, adjust parallelism, create/rebuild indexes  ETC Every database has its own parameters to tune memory/disks, IOs, CPUs …  Those techniques are of course still needed but….  If you think to tune with really understanding your data, understanding a) cardinalities, b) correlation, c) dispersion and even d) causalities inside your data then…

20. …you will be able tune almost every database !!!  SQL or NOSQL !!!  All of them had similar principles, so once you learn, you will be able to tune them…

21. Lyticsware  Lyticsware is a young innovative company that can help you to tune your databases  We are also partners of Amazon Web Services and we are helping our clients to migrate their databases /informations systems to cloud architectures

From Database Optimizers To Data Science

Recommended

Recommended

More Related Content

Similar to From Database Optimizers To Data Science

Similar to From Database Optimizers To Data Science (20)

Recently uploaded

Recently uploaded (20)

From Database Optimizers To Data Science