SlideShare a Scribd company logo
DATAWAREHOUSING WITH MySQL
CHAPTER 1-BASIC CONCEPTS
CHAPTER 1-BASIC CONCEPTS
In chapter 1 of the book dimensional datawarehousing with MySQL, four tasks are primarily covered which include:
• Creating a database user
• Creating the data warehouse database and the source database
• Creating data warehouse tables
• Generating surrogate keys
For the purpose of practice, the first step of creating a user-id has been avoided and the root id is used.
All other tasks have been covered with a screenshot of the outcomes
CREATING A DATABASE
In this task two databases dw and source have been created using the following commands:
Create database dw;
Create database source;
CREATING DATAWAREHOUSING TABLES
Creating customer_dim table:
CREATING DATAWAREHOUSING TABLES
Creating product_dim table:
CREATING DATAWAREHOUSING TABLES
Creating order_dim table:
CREATING DATAWAREHOUSING TABLES
Creating date_dim table:
CREATING DATAWAREHOUSING TABLES
Creating sales_order_fact table:
GENERATING SURROGATE KEYS
In the customer_dim table, 3 entries are inserted with a null value for customer_sk field. The surrogate keys are
automatically created
CHAPTER 2-DIMENSIONAL HISTORY
• Slowly Changing Dimension (SCD) is the technique for implementing dimension history in
a dimensional data warehouse
• For practicing we would use SCD1 and SCD2 techniques
• Slowly Changing Dimension Type1 involves updating and inserting into the customer_dim
table
• Before updating and inserting data a customer_stg table has to be created
• The Update statement copies the value of the customer_name column in the staging table to
the customer_name column in the customer_dim table
• The Insert statement inserts the record in the staging table that is not yet present in the
customer_dim table
• Running the script updates the name of the first customer and inserts the seventh customer
in the staging table to the customer_dim table.
•
Creating and Loading Customer Staging Table
Applying SCD1: Updating Existing Customers and inserting
into customer_dim
Slowly Changing SCD Type 2
• SCD2 has been applied to the product_dim table
• Whenever there is a change in the product_name and product_category
columns, SCD2 would remove the existing row and add a new row that
would describe the same product.
Creating a product_stg file
Applying SCD2 to the product_name and product_category in the
product_dim table
• The next output would show that SCD2 has been applied successfully
• Product 1 has two rows
• One of the rows, with product_sk 1, has expired with expiry date 4th Febuary, 2007
• This is one day earlier to the expiry date before applying SCD2
• Also another row is created with product_sk as 3 and it has a new name
• Its effective date is 5th February 2007 and expiry date 9999-12-31
• This means that it has not yet expired
Chapter 3: Measure Additivity
Testing Full Additivity
Inserting data into order_dim table
Testing Full Additivity:
Inserting data into table date_dim
Testing Full Additivity:
Inserting data into sales_order_fact:
Testing Full Additivity:
Generating the sum of the total order amounts by querying across all dimensions:
Testing Full Additivity:
Generating the sum by querying across date, customer and order
Testing Full Additivity:
Generating the sum of total orders by querying across date and order:
Chapter 4: Dimensional Queries
Aggregate Queries
• Aggregate queries aggregates individual facts
• The values are either summed or counted
• Under aggregate queries we would run two examples: aggregation of daily sales and annual
sales
• In all the cases, joins between tables is done using surrogate keys.
Daily Sales Aggregation:
The aggregation of the order amounts and number of orders is
done by date
Annual Sales Aggregation:
The order amounts and the number of orders are not only aggregated by date, but also
by product and customer city
Specific Queries:
Monthly Storage Product Sales:
The following query aggregates sales amount and the number of orders per month.
Specific Queries:
Quarterly Sales in Mechanicsburg:
The following query produces the quarterly aggregation of the order amounts in
Mechanicsburg
Inside-Out Queries:
Product Performer:
The following query gives you the sales orders of products that have a monthly sales
amount of 7,500 or more.
Inside –Out Queries:
Loyal Customer
The following query shows customers who have placed more than five orders annually in
the past 18 months
Chapter 5: Source Extraction
• Push-by-source CDC- It means that the source system extracts only the changes since the last
extraction
• Push-by-source CDC has been demonstrated on the sales order source data
• It has been done using a stored procedure that extracts sales order data from the sales_order
in the source database
Creating a sales_order table in another database called source and inserting
values in the tables
Order_dim:
Date_dim
Inserting values in the order_dim and date_dim tables in the dw
database
Running the following stored procedure:
USE source;
DELIMITER // ;
DROP PROCEDURE IF EXISTS push_sales_order //
CREATE PROCEDURE push_sales_order()
BEGIN
INSERT INTO dw.sales_order_fact
SELECT a.order_amount, b.order_sk, c.customer_sk, d.product_sk, e.date_sk
FROM sales_order a, dw.order_dim b, dw.customer_dim c, dw.product_dim d, dw.date_dim e
WHERE a.entry_date = CURRENT_DATE
AND a.order_number = b.order_number AND a.customer_number = c.customer_number AND a.product_code = d.product_code AND
a.order_date >= d.effective_date
AND a.order_date <= d.expiry_date
AND a.order date = e.date
;
END
//
DELIMITER ; //
The above stored procedure will make changes to the
sales_order_fact table in dw database.
Chapter 6 : Populating the Date Dimension
Pre-population: It is the simplest of the three techniques where the dates are
inserted for a period of time
For e.g. date could be inserted for 5 years between 2009 and 2014
Truncating the date_dim table
One Date Everyday:
This technique is similar to pre-population technique, but in this technique only one
date is pre-populated in a day
Daily Date population:
Loading dates from the source: The query loads the sales order dates from the
sales_order table of the source database into the date_dim table of the DW
database.
Adding more dates from the additional sales order
Chapter 7: Initial Population
Initial Population: After identifying the source data, a script
is written for initial population
Order_dim table:
Sales_order_fact table
Running the Initial Population Scheme
Truncating the sales_order table:
Running the Initial Population Scheme
Preparing the sales order table
Query to confirm whether the sales_order are loaded correctly or not
Chapter 8: Regular Population
Regular Population Script:
In this script, customer_dim and product_dim have been reloaded with data. SCD2 is
applied to customer addresses, product names, and product groups. SCD1 is applied to
customer names
Order_dim table:
Product_dim:
Product_stg:
Sales_order_fact:
Customer_dim:
Testing Data:
The data is tested by running the select query on the sales_order
table and the sales_order_fact table
Chapter 10: Adding Columns
Adding New Columns to the customer dimension:
When two new columns are added, the null values would be
displayed in the respective rows:
Customer_dim table:
Customer_stg table:
• Adding the order_quantity column in the sales_order
•
Datawarehousing with MySQL

More Related Content

Similar to Datawarehousing with MySQL

Data analysis with Postgres and Power BI
Data analysis with Postgres and Power BIData analysis with Postgres and Power BI
Data analysis with Postgres and Power BI
Sangeetha Subramani
 
Porfolio of Setfocus work
Porfolio of Setfocus workPorfolio of Setfocus work
Porfolio of Setfocus work
KevinPSF
 
Data warehouse project on retail store
Data warehouse project on retail storeData warehouse project on retail store
Data warehouse project on retail store
Siddharth Chaudhary
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Bitanshu Das
 
Nota database object query
Nota database object queryNota database object query
Nota database object query
Azmiah Mahmud
 
Notacd12
Notacd12Notacd12
Notacd12
Azmiah Mahmud
 
Notacd12
Notacd12Notacd12
Notacd12
cikgushaharizan
 
Group - 9 Final Deliverable
Group - 9 Final DeliverableGroup - 9 Final Deliverable
Group - 9 Final Deliverable
Linan(Annabella) Zhao
 
Case Study Scenario - Global Trading PLCGlobal Trading PLC is.docx
Case Study Scenario - Global Trading PLCGlobal Trading PLC is.docxCase Study Scenario - Global Trading PLCGlobal Trading PLC is.docx
Case Study Scenario - Global Trading PLCGlobal Trading PLC is.docx
tidwellveronique
 
Rodney Matejek Portfolio
Rodney Matejek PortfolioRodney Matejek Portfolio
Rodney Matejek Portfolio
rmatejek
 
Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...
Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...
Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...
Srinath Reddy
 
Nota ict form 5
Nota ict form 5Nota ict form 5
Nota ict form 5
Christina Siva
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
sonalighai
 
Tableau.pdf
Tableau.pdfTableau.pdf
Tableau.pdf
blogacademyacademy
 
Learn Database Design with MySQL - Chapter 6 - Database design process
Learn Database Design with MySQL - Chapter 6 - Database design processLearn Database Design with MySQL - Chapter 6 - Database design process
Learn Database Design with MySQL - Chapter 6 - Database design process
Eduonix Learning Solutions
 
Database Connection
Database ConnectionDatabase Connection
Database Connection
John Joseph San Juan
 
SQL Server 2008 Portfolio for Saumya Bhatnagar
SQL Server 2008 Portfolio for Saumya BhatnagarSQL Server 2008 Portfolio for Saumya Bhatnagar
SQL Server 2008 Portfolio for Saumya Bhatnagar
sammykb
 
knowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdf
knowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdfknowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdf
knowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdf
Rame28
 
PowerBI Training
PowerBI Training PowerBI Training
PowerBI Training
Knowledge And Skill Forum
 
Dwbi Project
Dwbi ProjectDwbi Project
Dwbi Project
Sonali Gupta
 

Similar to Datawarehousing with MySQL (20)

Data analysis with Postgres and Power BI
Data analysis with Postgres and Power BIData analysis with Postgres and Power BI
Data analysis with Postgres and Power BI
 
Porfolio of Setfocus work
Porfolio of Setfocus workPorfolio of Setfocus work
Porfolio of Setfocus work
 
Data warehouse project on retail store
Data warehouse project on retail storeData warehouse project on retail store
Data warehouse project on retail store
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Nota database object query
Nota database object queryNota database object query
Nota database object query
 
Notacd12
Notacd12Notacd12
Notacd12
 
Notacd12
Notacd12Notacd12
Notacd12
 
Group - 9 Final Deliverable
Group - 9 Final DeliverableGroup - 9 Final Deliverable
Group - 9 Final Deliverable
 
Case Study Scenario - Global Trading PLCGlobal Trading PLC is.docx
Case Study Scenario - Global Trading PLCGlobal Trading PLC is.docxCase Study Scenario - Global Trading PLCGlobal Trading PLC is.docx
Case Study Scenario - Global Trading PLCGlobal Trading PLC is.docx
 
Rodney Matejek Portfolio
Rodney Matejek PortfolioRodney Matejek Portfolio
Rodney Matejek Portfolio
 
Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...
Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...
Tableau - Learning Objectives for Data, Graphs, Filters, Dashboards and Advan...
 
Nota ict form 5
Nota ict form 5Nota ict form 5
Nota ict form 5
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
 
Tableau.pdf
Tableau.pdfTableau.pdf
Tableau.pdf
 
Learn Database Design with MySQL - Chapter 6 - Database design process
Learn Database Design with MySQL - Chapter 6 - Database design processLearn Database Design with MySQL - Chapter 6 - Database design process
Learn Database Design with MySQL - Chapter 6 - Database design process
 
Database Connection
Database ConnectionDatabase Connection
Database Connection
 
SQL Server 2008 Portfolio for Saumya Bhatnagar
SQL Server 2008 Portfolio for Saumya BhatnagarSQL Server 2008 Portfolio for Saumya Bhatnagar
SQL Server 2008 Portfolio for Saumya Bhatnagar
 
knowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdf
knowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdfknowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdf
knowledgeforumpowerbitrainingnew-230816140827-5eb14be7.pdf
 
PowerBI Training
PowerBI Training PowerBI Training
PowerBI Training
 
Dwbi Project
Dwbi ProjectDwbi Project
Dwbi Project
 

Recently uploaded

GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 

Recently uploaded (20)

GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 

Datawarehousing with MySQL