This document discusses optimizing data warehouse star schemas with MySQL. It provides tips for database design including using a star schema approach with dimension and fact tables. It recommends MySQL 5.x, optimizing the MySQL configuration file, designing tables and indexes effectively, using the correct data loading architecture, analyzing tables, writing efficient query styles, and examining explain plans. Optimizing these areas can help ensure fast query performance from large data warehouses built with MySQL.
Database systems that were based on the object data model were known originally as object-oriented databases (OODBs).These are mainly used for complex objects
Software Development Life Cycle Models | What are Software Process Models ?
Here you are going to know What is Software Development Life Cycle Model or What are Software Process Models?
Software Process Models defines a distinct set of activities, actions, tasks, milestones, and work products that are required to engineer high-quality software...
For more knowledge watch full video...
Video URL:
https://youtu.be/3Lxnn0O3xaM
YouTube Channel URL:
https://www.youtube.com/channel/UCKVvceV1RGXLz0GeesbQnVg
Google+ Page URL:
https://plus.google.com/113458574960966683976/videos?_ga=1.91477722.157526647.1466331425
My Website Link:
http://appsdisaster.blogspot.com/
If you are interested in learning more about topics like this so Please don't forget to like, share, & Subscribe to this channel.
Thanks
Software Process Models | Software Development Process Models | SDLC | Traditional Software Process Models | Waterfall Model Incremental Model | Prototyping Model | Evolutionary Process Model
Adbms 11 object structure and type constructorVaibhav Khanna
Â
Unique Identity:
An OO database system provides a unique identity to each independent object stored in the database.
This unique identity is typically implemented via a unique, system-generated object identifier, or OID
The main property required of an OID is that it be immutable
Specifically, the OID value of a particular object should not change.
This preserves the identity of the real-world object being represented.
Type Constructors:
In OO databases, the state (current value) of a complex object may be constructed from other objects (or other values) by using certain type constructors.
The three most basic constructors are atom, tuple, and set.
Other commonly used constructors include list, bag, and array.
Introduces the idea of a software process and describes generic plan-based and agile processes.
Accompanies video:
https://www.youtube.com/watch?v=q8X2Rk5sRFI
Database systems that were based on the object data model were known originally as object-oriented databases (OODBs).These are mainly used for complex objects
Software Development Life Cycle Models | What are Software Process Models ?
Here you are going to know What is Software Development Life Cycle Model or What are Software Process Models?
Software Process Models defines a distinct set of activities, actions, tasks, milestones, and work products that are required to engineer high-quality software...
For more knowledge watch full video...
Video URL:
https://youtu.be/3Lxnn0O3xaM
YouTube Channel URL:
https://www.youtube.com/channel/UCKVvceV1RGXLz0GeesbQnVg
Google+ Page URL:
https://plus.google.com/113458574960966683976/videos?_ga=1.91477722.157526647.1466331425
My Website Link:
http://appsdisaster.blogspot.com/
If you are interested in learning more about topics like this so Please don't forget to like, share, & Subscribe to this channel.
Thanks
Software Process Models | Software Development Process Models | SDLC | Traditional Software Process Models | Waterfall Model Incremental Model | Prototyping Model | Evolutionary Process Model
Adbms 11 object structure and type constructorVaibhav Khanna
Â
Unique Identity:
An OO database system provides a unique identity to each independent object stored in the database.
This unique identity is typically implemented via a unique, system-generated object identifier, or OID
The main property required of an OID is that it be immutable
Specifically, the OID value of a particular object should not change.
This preserves the identity of the real-world object being represented.
Type Constructors:
In OO databases, the state (current value) of a complex object may be constructed from other objects (or other values) by using certain type constructors.
The three most basic constructors are atom, tuple, and set.
Other commonly used constructors include list, bag, and array.
Introduces the idea of a software process and describes generic plan-based and agile processes.
Accompanies video:
https://www.youtube.com/watch?v=q8X2Rk5sRFI
Here is my seminar presentation on No-SQL Databases. it includes all the types of nosql databases, merits & demerits of nosql databases, examples of nosql databases etc.
For seminar report of NoSQL Databases please contact me: ndc@live.in
This slide give the basic introduction about UML diagram and it's types, and brief intro about Activity Diagram, use of activity diagram in object oriented programming language..
An unprecedented amount of data is being created and is accessible. This presentation will instruct on using the new NoSQL technologies to make sense of all this data.
This Presentation is about NoSQL which means Not Only SQL. This presentation covers the aspects of using NoSQL for Big Data and the differences from RDBMS.
This presentation discusses the following topics:
Object Oriented Databases
Object Oriented Data Model(OODM)
Characteristics of Object oriented database
Object, Attributes and IdentityÂ
Object oriented methodologies
Benefit of object orientation in programming language
Object oriented model vs Entity Relationship model
Advantages of OODB over RDBMS
Designing Scalable Data Warehouse Using MySQLVenu Anuganti
Â
Orielly MySQL Conference 2011 - Designing MySQL as data warehouse solution to handle tera bytes of data which compromises OLTP, ETL, OLAP and reporting
Here is my seminar presentation on No-SQL Databases. it includes all the types of nosql databases, merits & demerits of nosql databases, examples of nosql databases etc.
For seminar report of NoSQL Databases please contact me: ndc@live.in
This slide give the basic introduction about UML diagram and it's types, and brief intro about Activity Diagram, use of activity diagram in object oriented programming language..
An unprecedented amount of data is being created and is accessible. This presentation will instruct on using the new NoSQL technologies to make sense of all this data.
This Presentation is about NoSQL which means Not Only SQL. This presentation covers the aspects of using NoSQL for Big Data and the differences from RDBMS.
This presentation discusses the following topics:
Object Oriented Databases
Object Oriented Data Model(OODM)
Characteristics of Object oriented database
Object, Attributes and IdentityÂ
Object oriented methodologies
Benefit of object orientation in programming language
Object oriented model vs Entity Relationship model
Advantages of OODB over RDBMS
Designing Scalable Data Warehouse Using MySQLVenu Anuganti
Â
Orielly MySQL Conference 2011 - Designing MySQL as data warehouse solution to handle tera bytes of data which compromises OLTP, ETL, OLAP and reporting
ÎÎľĎĎÎŻÎą κιΚ ĎÎąĎιδξίγΟιĎÎą ĎÎľ SQL (ÎźÎĎÎżĎ 2).
ÎĎĎĎΎΟιĎÎą ΠοΝΝιĎÎťĎν ÎŁĎÎĎÎľĎν, ÎĽĎοξĎĎĎΎΟιĎÎą, IN, EXISTS, ALL, ANY, ÎŁĎ ÎłÎşÎľÎ˝ĎĎĎĎΚκοί ÎĽĎοΝογΚĎΟοί, GROUP BY, HAVING
DATABASE: https://dl.dropboxusercontent.com/u/2690181/myBeersDB.zip
Agile Data Warehouse Design for Big Data PresentationVishal Kumar
Â
Synopsis:
[Video link: http://www.youtube.com/watch?v=ZNrTxSU5IQ0 ]
Jim Stagnitto and John DiPietro of consulting firm a2c) will discuss Agile Data Warehouse Design - a step-by-step method for data warehousing / business intelligence (DW/BI) professionals to better collect and translate business intelligence requirements into successful dimensional data warehouse designs.
The method utilizes BEAMⲠ(Business Event Analysis and Modeling) - an agile approach to dimensional data modeling that can be used throughout analysis and design to improve productivity and communication between DW designers and BI stakeholders. BEAMⲠbuilds upon the body of mature "best practice" dimensional DW design techniques, and collects "just enough" non-technical business process information from BI stakeholders to allow the modeler to slot their business needs directly and simply into proven DW design patterns.
BEAMⲠencourages DW/BI designers to move away from the keyboard and their entity relationship modeling tools and begin "white board" modeling interactively with BI stakeholders. With the right guidance, BI stakeholders can and should model their own BI data requirements, so that they can fully understand and govern what they will be able to report on and analyze.
The BEAMⲠmethod is fully described in
Agile Data Warehouse Design - a text co-written by Lawrence Corr and Jim Stagnitto.
About the speaker:
Jim Stagnitto Director of a2c Data Services Practice
Data Warehouse Architect: specializing in powerful designs that extract the maximum business benefit from Intelligence and Insight investments.
Master Data Management (MDM) and Customer Data Integration (CDI) strategist and architect.
Data Warehousing, Data Quality, and Data Integration thought-leader: co-author with Lawrence Corr of "Agile Data Warehouse Design", guest author of Ralph Kimballâs âData Warehouse Designerâ column, and contributing author to Ralph and Joe Caserta's latest book: âThe DW ETL Toolkitâ.
John DiPietro Chief Technology Officer at A2C IT Consulting
John DiPietro is the Chief Technology Officer for a2c. Mr. DiPietro is responsible
for setting the vision, strategy, delivery, and methodologies for a2câs Solution
Practice Offerings for all national accounts. The a2c CTO brings with him an
expansive depth and breadth of specialized skills in his field.
Sponsor Note:
Thanks to:
Microsoft NERD for providing awesome venue for the event.
http://A2C.com IT Consulting for providing the food/drinks.
http://Cognizeus.com for providing book to give away as raffle.
Database vs Data Warehouse: A Comparative ReviewHealth Catalyst
Â
What are the differences between a database and a data warehouse? A database is any collection of data organized for storage, accessibility, and retrieval. A data warehouse is a type of database the integrates copies of transaction data from disparate source systems and provisions them for analytical use. The important distinction is that data warehouses are designed to handle analytics required for improving quality and costs in the new healthcare environment. A transactional database, like an EHR, doesnât lend itself to analytics.
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Health Catalyst
Â
It can be confusing to know whether or not your health system needs to add a data warehouse unless you understand how itâs different from a clinical data repository. A clinical data repository consolidates data from various clinical sources, such as an EMR, to provide a clinical view of patients. A data warehouse, in comparison, provides a single source of truth for all types of data pulled in from the many source systems across the enterprise. The data warehouse also has these benefits: a faster time to value, flexible architecture to make easy adjustments, reduction in waste and inefficiencies, reduced errors, standardized reports, decreased wait times for reports, data governance and security.
Third Nature - Open Source Data Warehousingmark madsen
Â
An introductory presentation on open source for data warehousing and business intelligence. Covers some history of open source, projects in different areas, and some information on adoption.
You can download this and demo.case study PDFs at
http://thirdnature.net/tdwi_osbi_material.html
15 Ways to Kill Your Mysql Application Performanceguest9912e5
Â
Jay is the North American Community Relations Manager at MySQL. Author of Pro MySQL, Jay has also written articles for Linux Magazine and regularly assists software developers in identifying how to make the most effective use of MySQL. He has given sessions on performance tuning at the MySQL Users Conference, RedHat Summit, NY PHP Conference, OSCON and Ohio LinuxFest, among others.In his abundant free time, when not being pestered by his two needy cats and two noisy dogs, he daydreams in PHP code and ponders the ramifications of __clone().
MySQLÂŽ 5.7 is a great release which has a lot to offer, especially in the development and replication areas. It provides a lot of new optimizer features for developers to take advantage of, a much more powerful GIS function and high performance JSON data type, allowing for a more powerful store for semi-structured data. It also features dramatically improved Performance Schema, Parallel and Multi-Source replication, allowing you to scale much further than ever before, just to give you a taste. In this webinar, we will provide an overview of the most important MySQL 5.7 features.
This webinar will be part of a 3-part series which will include MySQL 5.7 for Developers and MySQL 5.7 for DBAs.
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
[db tech showcase Tokyo 2014] B15: Scalability with MariaDB and MaxScale by ...Insight Technology, Inc.
Â
Scalability with MariaDB and MaxScale talks about MariaDB 10, and MaxScale, a pluggable router for your queries. These are technologies developed at MariaDB Corporation, made opensource, and will help scale your MariaDB and MySQL workloads
Configuring workload-based storage and topologiesMariaDB plc
Â
MariaDB has multiple workload-optimized storage engines, including InnoDB for mixed workloads, MyRocks for write-intensive workloads, Spider for scalable workloads and ColumnStore for analytical workloads. In this session, Kenny Geiselhart discusses how to choose the right storage engine for individual tables, and how replication and asymmetric topologies can be used to further optimize MariaDB and the hardware it runs on for specific workloads.
MariaDB Server 10.3 is a culmination of features from MariaDB Server 10.2+10.1+10.0+5.5+5.3+5.2+5.1 as well as a base branch from MySQL 5.5 and backports from MySQL 5.6/5.7. It has many new features, like a GA-ready sharding engine (SPIDER), MyRocks, as well as some Oracle compatibility, system versioned tables and a whole lot more.
MySQL 5.7 New Features for Developers session for DOAG (Oracle user group conference) in 2016. A similar version was also presented in Israel MySQL User Group on November 2016.
This presentation review new features in MySQL 5.7: Optimizer, InnoDB engine, JSON native data type, performance and sys schemas
Geek Sync I Polybase and Time Travel (Temporal Tables)IDERA Software
Â
You can watch the replay for this Geek Sync webcast in the IDERA Resource Center: http://ow.ly/Tpv450A5oGv
With the release of SQL Server 2016, Microsoft has included two features that could fundamentally change the way we look at Data Warehousing. The addition of Polybase for connecting to external data sources such as Hadoop and Azure Blob for native TSQL queries extends the reach of the DW. The Temporal Table feature creates all kinds of possibilities for implementing Type 1 and 2 dimensions as well as time based reporting and logical data recovery. In this session, we will take an in depth look at the features and scenarios for use.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
đ Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
Â
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Dev Dives: Train smarter, not harder â active learning and UiPath LLMs for do...UiPathCommunity
Â
đĽ Speed, accuracy, and scaling â discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Miningâ˘:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing â with little to no training required
Get an exclusive demo of the new family of UiPath LLMs â GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
đ¨âđŤ Andras Palfi, Senior Product Manager, UiPath
đŠâđŤ Lenka Dulovicova, Product Program Manager, UiPath
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
Â
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more âmechanicalâ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
Â
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
Â
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties â USA
Expansion of bot farms â how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks â Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Â
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Enhancing Performance with Globus and the Science DMZGlobus
Â
ESnet has led the way in helping national facilitiesâand many other institutions in the research communityâconfigure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
Â
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilotâ˘UiPathCommunity
Â
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalitĂ di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
đ Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
đ¨âđŤđ¨âđť Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Â
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Â
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
Â
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
1. Building and Optimizing Data
Warehouse "Star Schemas"
with MySQL
Bert Scalzo, Ph.D.
Bert.Scalzo@Quest.com
2. About the Author
ď§ Oracle DBA for 20+ years, versions 4 through 10g
ď§ Been doing MySQL work for past year (4.x and 5.x)
ď§ Worked for Oracle Education & Consulting
ď§ Holds several Oracle Masters (DBA & CASE)
ď§ BS, MS, PhD in Computer Science and also an MBA
ď§ LOMA insurance industry designations: FLMI and ACS
ď§ Books
â The TOAD Handbook (Feb 2003)
â Oracle DBA Guide to Data Warehousing and Star Schemas (Jun 2003)
â TOAD Pocket Reference 2nd Edition (May 2005)
ď§ Articles
â Oracle Magazine
â Oracle Technology Network (OTN)
â Oracle Informant
â PC Week (now E-Magazine)
â Linux Journal
â www.linux.com
â www.quest-pipelines.com
3.
4. Star Schema Design
âStar schemaâ approach to dimensional data
modeling was pioneered by Ralph Kimball
Dimensions: smaller, de-normalized tables containing
business descriptive columns that users use to query
Facts: very large tables with primary keys formed from
the concatenation of related dimension table foreign key
columns, and also possessing numerically additive, non-
key columns used for calculations during user queries
7. The Ad-Hoc Challenge
How much data would a data miner mine,
if a data miner could mine data?
Dimensions: generally queried selectively to find lookup
value matches that are used to query against the fact table
Facts: must be selectively queried, since they generally
have hundreds of millions to billions of rows â even full
table scans utilizing parallel are too big for most systems
Business Intelligence (BI) tools generally offer end-users the
ability to perform projections of group operations on columns
from facts using restrictions on columns from dimensions âŚ
8. Hardware Not Compensate
Often, people have expectation that using
expensive hardware is only way to obtain
optimal performance for a data warehouse
â˘CPU â˘Disk
â˘SMP â˘15,000 RPM
â˘MPP â˘RAID (EMC)
â˘OS â˘MySQL
â˘UNIX â˘4.x / 5.x
â˘64-bit â˘64-bit
9. DB Design Paramount
In reality, the database design is the key
factor to optimal query performance for a
data warehouse built as a âStar Schemaâ
There are certain minimum hardware and
software requirements that once met, play a
very subordinate role to tuning the database
Golden Rule: get the basic database
design and query explain plan correct
10. Key Tuning Requirements
1. MySQL 5.x
2. MySQL.ini
3. Table Design
4. Index Design
5. Data Loading Architecture
6. Analyze Table
7. Query Style
8. Explain plan
11. 1. MySQL 5.X (help on the way)
â˘Index Merge Explain
â˘Prior to 5.x, only one index used per referenced table
â˘This radically effects both index design and explain plans
â˘Rename Table for MERGE fixed
â˘With 4.x, some scenarios could cause table corruption
â˘New ``greedy search'' optimizer that can significantly reduce the
time spent on query optimization for some many-table joins
â˘Views
â˘Useful for pre-canning/forcing query style or syntax (i.e. hints)
â˘Stored Procedures
â˘Rudimentary Triggers
â˘InnoDB
â˘Compact Record Format
â˘Fast Truncate Table
13. 3. Table Design
SPEED vs. SPACE vs. MANAGEMENT
64 MB / Million Rows (Avg. Fact)
500 Million Rows
===================
32,000 MB (32 GB)
Primary storage engine options:
â˘MyISAM
â˘MyISAM + RAID_TYPE
â˘MERGE
â˘InnoDB
14. ENGINE = MyISAM
CREATE TABLE ss.pos_day (
PERIOD_ID decimal(10,0) NOT NULL default '0',
LOCATION_ID decimal(10,0) NOT NULL default '0',
PRODUCT_ID decimal(10,0) NOT NULL default '0',
SALES_UNIT decimal(10,0) NOT NULL default '0',
SALES_RETAIL decimal(10,0) NOT NULL default '0',
GROSS_PROFIT decimal(10,0) NOT NULL default '0â
PRIMARY KEY(PRODUCT_ID, LOCATION_ID, PERIOD_ID),
ADD INDEX PERIOD(PERIOD_ID),
ADD INDEX LOCATION(LOCATION_ID),
ADD INDEX PRODUCT(PRODUCT_ID)
) ENGINE=MyISAM
PACK_KEYS
DATA_DIRECTORY=âC:mysqldataâ
INDEX_DIRECTORY=âD:mysqldataâ;
Pros:
â˘Non-transactional â faster, lower disk space usage, and less memory
Cons:
â˘2/4GB data file limit on operating systems that don't support big files
â˘2/4GB index file limit on operating systems that don't support big files
â˘One big table poses data archival and index maintenance challenges
(e.g. drop 1998 data, make 1999 read only, rebuild 2000 indexes, etc)
15. ENGINE = MyISAM + RAID_TYPE
CREATE TABLE ss.pos_day (
PERIOD_ID decimal(10,0) NOT NULL default '0',
LOCATION_ID decimal(10,0) NOT NULL default '0',
PRODUCT_ID decimal(10,0) NOT NULL default '0',
SALES_UNIT decimal(10,0) NOT NULL default '0',
SALES_RETAIL decimal(10,0) NOT NULL default '0',
GROSS_PROFIT decimal(10,0) NOT NULL default '0â
PRIMARY KEY(PRODUCT_ID, LOCATION_ID, PERIOD_ID),
ADD INDEX PERIOD(PERIOD_ID),
ADD INDEX LOCATION(LOCATION_ID),
ADD INDEX PRODUCT(PRODUCT_ID)
) ENGINE=MyISAM
PACK_KEYS
RAID_TYPE=STRIPED;
Pros:
â˘Non-transactional â faster, lower disk space usage, and less memory
â˘Can help you to exceed the 2GB/4GB limit for the MyISAM data file
â˘Creates up to 255 subdirectories, each with file named table_name.myd
â˘Distributed IO â put each table subdirectory and file on a different disk
Cons:
â˘2/4GB index file limit on operating systems that don't support big files
â˘One big table poses data archival and index maintenance challenges
(e.g. drop 1998 data, make 1999 read only, rebuild 2000 indexes, etc)
16. ENGINE = MERGE
CREATE TABLE ss.pos_merge (
PERIOD_ID decimal(10,0) NOT NULL default '0',
LOCATION_ID decimal(10,0) NOT NULL default '0',
PRODUCT_ID decimal(10,0) NOT NULL default '0',
SALES_UNIT decimal(10,0) NOT NULL default '0',
SALES_RETAIL decimal(10,0) NOT NULL default '0',
GROSS_PROFIT decimal(10,0) NOT NULL default '0',
INDEX PK(PRODUCT_ID, LOCATION_ID, PERIOD_ID),
INDEX PERIOD(PERIOD_ID),
INDEX LOCATION(LOCATION_ID),
INDEX PRODUCT(PRODUCT_ID)
) ENGINE=MERGE
UNION=(pos_1998,pos_1999,pos_2000) INSERT_METHOD=LAST;
Pros:
â˘Non-transactional â faster, lower disk space usage, and less memory
â˘Partitioned tables offer data archival and index maintenance options
(e.g. drop 1998 data, make 1999 read only, rebuild 2000 indexes, etc)
â˘Distributed IO â put individual tables and indexes on different disks
Cons:
â˘MERGE tables use more file descriptors on database server
â˘MERGE key lookups are much slower on âeq_refâ searches
â˘Can use only identical MyISAM tables for a MERGE table
17. ENGINE = InnoDB
CREATE TABLE ss.pos_day (
PERIOD_ID decimal(10,0) NOT NULL default '0',
LOCATION_ID decimal(10,0) NOT NULL default '0',
PRODUCT_ID decimal(10,0) NOT NULL default '0',
SALES_UNIT decimal(10,0) NOT NULL default '0',
SALES_RETAIL decimal(10,0) NOT NULL default '0',
GROSS_PROFIT decimal(10,0) NOT NULL default '0â
PRIMARY KEY(PRODUCT_ID, LOCATION_ID, PERIOD_ID),
ADD INDEX PERIOD(PERIOD_ID),
ADD INDEX LOCATION(LOCATION_ID),
ADD INDEX PRODUCT(PRODUCT_ID)
) ENGINE=InnoDB
PACK_KEYS;
Pros:
â˘Simple yet flexible tablespace datafile configuration
innodb_data_file_path=ibdata1:1G:autoextend:max:2G; ibdata2:1G:autoextend:max:2G
Cons:
â˘Uses much more disk space â typically 2.5 times as much disk space as MyISAM!!!
â˘Transaction Safe â not needed, consumes resources (SET AUTOCOMMIT=0)
â˘Foreign Keys â not needed, consumes resources (SET FOREIGN_KEY_CHECKS=0)
â˘Prior to MySQL 4.1.1 â no âmutliple tablespacesâ feature (i.e. one table per tablespace)
â˘One big table poses data archival and index maintenance challenges
(e.g. drop 1998 data, make 1999 read only, rebuild 2000 indexes, etc)
18. Space Usage 21 Million Records
MERGE
3GB
MyISAM
3GB
InnoDB
8GB
19. 4. Index Design
Index Design must be driven by DW usersâ nature: you donât know what
theyâll query upon, and the more successful they are data mining â the
more theyâll try (which is a actually a really good thing) âŚ
Therefore you donât know which dimension tables theyâll reference and
which dimension columns they will restrict upon â so:
â˘Fact tables should have primary keys â for data load integrity
â˘Fact table dimension reference (i.e. foreign key) columns should each
be individually indexed â for variable fact/dimension joins
â˘Dimension tables should have primary keys
â˘Dimension tables should be fully indexed
â˘MySQL 4.x â only one index per dimension will be used
â˘If you know that one column will always be used in
conjunction with others, create concatenated indexes
â˘MySQL 5.x â new index merge will use multiple indexes
20. Note: Make sure to build indexes based
off cardinality (i.e. leading portion most
selective), so in this case the index was
built backwards
21. MyISAM Key Cache Magic
1. Utilize two key caches:
â˘Default Key Cache â for fact table indexes
â˘Hot Key Cache â for dimension key indexes
command-line option:
shell> mysqld --hot_cache.key_buffer_size=16M
option file:
[mysqld]
hot_cache.key_buffer_size=16M
CACHE INDEX t1, t2, t3 IN hot_cache;
2. Pre-Load Dimension Indexes:
LOAD INDEX INTO CACHE t1, t2, t3 IGNORE LEAVES;
22. 5. Data Loading Architecture
Archive:
â˘ALTER TABLE fact_table UNION=(mt2, mt3, mt4)
â˘DROP TABLE mt1
Load:
â˘TRUNCATE TABLE staging_table
â˘Run nightly/weekly data load into staging_table
â˘ALTER TABLE merge_table_4 DROP PRIMARY KEY
â˘ALTER TABLE merge_table_4 DROP INDEX
â˘INSERT INTO merge_table_4 SELECT * FROM staging_table
â˘ALTER TABLE merge_table_4 ADD PRIMARY KEY(âŚ)
â˘ALTER TABLE merge_table_4 ADD INDEX(âŚ)
â˘ANALYZE TABLE merge_table_4
23. 6. Analyze Table
ANALYZE TABLE ss.pos_merge;
â˘Analyze Table statement analyzes and stores the
key distribution for a table
â˘MySQL uses the stored key distribution to decide
the order in which tables should be joined
⢠If you have a problem with incorrect index usage,
you should run ANALYZE TABLE to update table
statistics such as cardinality of keys, which can
affect the choices the optimizer makes
24. 7. Query Style
Many Business Intelligence (BI) and Report Writing
tools offer initialization parameters or global settings
which control the style of SQL code they generate.
Options often include:
â˘Simple N-Way Join
â˘Sub-Selects
â˘Derived Tables
â˘Reporting Engine does Join operations
For now â only the first options make senseâŚ
(see following section regarding explain plans)
25. 8. Explain Plan
The EXPLAIN statement can be used either as a synonym for
DESCRIBE or as a way to obtain information about how the
MySQL query optimizer will execute a SELECT statement.
You can get a good indication of how good a join is by taking
the product of the values in the rows column of the
EXPLAIN output. This should tell you roughly how many
rows MySQL must examine to execute the query.
Weâll refer to this calculation as the explain plan cost â and
use this as our primary comparative measure (along with of
course the actual run time) âŚ
28. Method: Simple Joins and
Single-Column Indexes
Join better than Sub-
Select and much better
than Derived Table Cost = 7.1 x 106th
29. Method: Merge Table and
Single-Column Indexes
Same Explain Cost, but
statement ran 2X faster Cost = 7.1 x 106th
30. Method: Merge Table and
Multi-Column Indexes
Concatenated Index
yielded best run time Cost = 3.2 x 105th
31. Method: Merge Table and
Merge Indexes
MySQL 5.0 new Index
Merge = best run time Cost = 5.2 x 105th
32. Conclusion âŚ
â˘MySQL can easily be used to build large and
effective âStar Schemaâ Data Warehouses
â˘MySQL Version 5.x will offer even more useful index
and join query optimizations
â˘MySQL can be better configured for DW use through
effective mysql.ini option settings
â˘Table and Index designs are paramount to success
â˘Query style and resulting explain plans are critical to
achieving the fastest query run times