The Ultimate Guide to SQL CUT
Masters in Data Analytics: Business Intelligence Exam Prep , Exam Date: 27 May 2021
McNamara Chiwaye
25/05/2021
Contents
INTRODUCTION TO SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
SELECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
BUSINESS INTELLIGENCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
FILTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
ORDER BY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
JOINING TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
MAY 2021 EXAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
WHATS NEXT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
INTRODUCTION TO SQL
SQL means Structured Querry Language. It is a language used to query databases.
In this tutorial I will focus on queering your database having said that, lets look at the data sets that we are
going to be working with. For a quick start i encourage you to create an account with DATACAMP Then
if want to use the codes in this tutorial click here Introduction to sql
For this tutorial i will be using a database containing information on almost 5000 films.
We shall be working with the:
“Data Science is the art of story telling with information gatered by a business. Business infor-
mation can be, transaformed into useful insights and can drive business growth, reduce risk and
make profits for shareholders. Data Science is our only hope to remain in business.”
— Macnamara Chiwaye
1. ‘eurovision’
2. ‘films dataset’
3. ‘people dataset’
4. ‘reviews dataset’
1
5. ‘eurovision’
6. ‘grid’
SELECT
Lets try something that you will appreciate later.Write this code
Submit the query in the editor
Now change ‘SQL’ to ‘SQL is’ and click Submit!
SELECT 'SQL'
AS result;
#
SELECT 'SQL with Macnamara is cool'
AS result;
#
SQL , which stands for Structured Query Language, is a language for interacting with data stored in some-
thing called a ‘relational database’. You can think of a relational database as a collection of tables. A table is
just a set of rows and columns, like a spreadsheet, which represents exactly one type of entity. For example,
a table might represent employees in a company or purchases made, but not both.Each row, or record, of a
table contains information about a single entity. For example, in a table representing employees, each row
represents a single person. Each column, or field, of a table contains a single attribute for all rows in the
table.
We will start with Exploratory Data Analysis on our data sets so we can have a feel of our data * We will
use the count function to check the number of records in our table
SELECT COUNT(*)
FROM films
#
• We shall inspect our table by viewing all the records in our table.
SELECT *
FROM films
#
2
-- SELECT the country column FROM the eurovision table
SELECT COUNT(*) AS 'ALL COUNTRIES IN EUROVISION'
FROM eurovision
#
SELECTing single columns
Simple selections
It’s time to begin answering Business Questions! In this first coding exercise, you will use SELECT statements
to retrieve columns from the business database table. This tutirial uses the eurovision table, which contains
data relating to individual country performance at the Eurovision Song Contest from 1998 to 2012 and films
datasets. While SQL can be used to create and modify databases, the focus of this course will be querying
databases. A query is a request for data from a database table (or combination of tables). Querying is an
essential skill for a Data scientist, since the data you need for your analyses will often live in databases.
At this point its note worth mentioning that SQL is not case sensitive.In the next code i am going to use
Mixed case just for you to see. However by convention we most people use UpperCase. 1. SeLECT the title
column fRoM the films Table.
SeLEct title
FrOm films;
#
2. SELECT the country column FROM the eurovision table.
-- SELECT the country column FROM the eurovision table
SELECT country
FROM eurovision;
#
3. Amend your query to return the points column instead of the country column.
3
-- Select the points column
SELECT
points
FROM
eurovision;
Below is a list Fig showing how the SELECT statement can be used including code to limit the number of
rows
SELECT
TOP (50) points
FROM
eurovision;
Figure 1: SELECT statement helps you to select columns from your Table
4. If you noticed I used TOP to change the existing query so that only the first 50 rows are returned.
5. Which countries participated in the Eurovison competition.
-- Return unique countries and use an alias
SELECT
DISTINCT country AS unique_country
FROM
eurovision;
#
4
SELECTing multiple column
Lets suppose you receive a request from the Business Unit requesting all information about 1. SELECT the
country and event_year columns from the eurovision table.
-- -- Select country and event_year from eurovision
SELECT
country,
event_year
FROM
eurovision;
#
2. Use a shortcut to amend the current query, returning ALL rows from ALL columns in the table.
-- -- Shortcut to select all column in a table
SELECT
*
FROM
eurovision;
3. Return all columns, but only include the top half of the table - in other words, return 50 percent of
the rows.
Learning to Count
-- -- Shortcut to select all column in a table
SELECT COUNT(*)
FROM reviews;
BUSINESS INTELLIGENCE
SQL Server & Transact -SQL
• SQL Server - relational database system developed by Microsoft
• Transact-SQL (T-SQL) - Microsoft’s implementation of SQL, with additional functionality
5
• In this tutorial master the fundamentals of T-SQL.
• Learn how to write queries
Here is what the SQL server management studio looks like.
Figure 2: The SQL Server object explorer
Figure 3: The SQL Server Database pane
Let’s take a look at the data we will be working. The digitv data base has four tables Let’s take a look at
the payments .
We see our dim_customer table
Our Final table is the dim_plan_type
6
Figure 4: The SQL Server MENU
Figure 5: The Payment Table
Figure 6: The Customer Table
7
Figure 7: The Plan type Table
Exploratory Data Analysis in SQL
Uniqueness Constraints
Write a querry to select the all the available plan_description, payment
/*** Uniqueness Constraints ***/
SELECT DISTINCT plan_description AS plan_description FROM dim_plan_type;
SELECT DISTINCT result AS "Transaction Status" FROM dim_payments;
SELECT COUNT(DISTINCT plan_code) as 'Count of unique payment results' FROM dim_payments
SELECT COUNT(DISTINCT duration) as 'payment duration' FROM dim_payments
Figure 8: Uniqueness
We calculate max, min and average to see best poor and average performance. Lets see how to calculate
variability and measures of central tendency. This calculation also form the basis of risk management and
Fraud detection. Outliers are a good start for Fraud detection.
8
Figure 9: Uniqueness help model relationship and data types
/*** Variability and Measures of Central Tendency ***/
SELECT MAX(duration) as 'Longest payment duration'
FROM dim_payments
SELECT MIN(duration) as 'Minimum payment duration'
FROM dim_payments
SELECT AVG(CAST(duration AS FLOAT)) as 'Average payment duration'
FROM dim_payments
Figure 10: Average payment duration
Figure 11: Longest payment Duration
FILTER
and Fraud detection. Outliers are a good start for Fraud detection.
9
Figure 12: Minimum Payment
---Average duration of approved transaction
SELECT AVG(CAST(duration AS FLOAT)) as 'Average payment duration'
FROM dim_payments
WHERE result = 'APPROVED'
--- What is the max, min and average amount for basestation AN and plan code 11
SELECT
MAX(CAST(amount AS float)) AS 'Maximum payment amount for base station AN and plan code 11',
MIN(CAST(amount AS float)) AS 'Minimum payment amount',
AVG(CAST(amount AS float)) AS 'Average payment amount'
FROM dim_payments
WHERE base_sation_code = 'AN'
AND plan_code = 11;
--- What is the max, min and average amount for result that start with TRAN and you are unsure of the
SELECT
MAX(CAST(amount AS float)) AS 'Maximum payment amount for result Tstarts with TRAN',
MIN(CAST(amount AS float)) AS 'Minimum payment amount',
AVG(CAST(amount AS float)) AS 'Average payment amount'
FROM dim_payments
WHERE result LIKE 'TRAN%'
;
Results of executing the Queries
#
10
#
ORDER BY
Ordering is a common data problem you will encounter. Its because ordered information is easy to process
and get insights Lets see how you can order your data based on a criteria.
--Order by the amount
SELECT TOP (5) plan_code, duration
FROM dim_payments
ORDER BY amount
--Order in descending order
SELECT TOP (20) plan_code, duration, result, base_sation_code
FROM dim_payments
ORDER BY duration desc;
Arrange by amount # Show biggest duration to minimum
11
#
Where Clause
I am about to show you a beginers error. In SQL date is a reserved word so when you want to query the
column you will get an error.select [date] from dim_payments.Don‘t do that. Use Square brackets to
escape preserved word
SELECT TOP (20) plan_code, duration, result, base_sation_code
FROM dim_payments
WHERE
[date] = '20201008';
SELECT TOP (20) plan_code, duration, result, base_sation_code
FROM dim_payments
WHERE
[date] = '20201008'
ORDER BY amount DESC;
-- BETWEEN
SELECT TOP (20) plan_code, duration, result, base_sation_code
FROM dim_payments
WHERE
[date] BETWEEN '20201008' AND '20201017'
ORDER BY amount DESC;
12
#
Working with NULL values
--TRY these codes on your own
SELECT * FROM dim_customers where middle_name IS NULL;
SELECT * FROM dim_customers WHERE middle_name IS NOT NULL;
Figure 13: Filterning MSCDA613 data: filtering on numeric and character data
JOINING TABLES
The trick with developing is that we start with the easy and slowly build. After all that’s why we are called
developers right
-- union join
select *
from dim_payments as dp inner join dim_customers as dc
on dp.account_number =dc.account_number
-- selecting column after join
select result, duration, date_of_birth, amount, gender
from dim_payments as dp inner join dim_customers as dc
on dp.account_number =dc.account_number
13
Figure 14: Filterning MSCDA613 data: Reserved words
-- Aggregate and summation
select gender , AVG(cast(amount as float)) as 'Average amount by Gender'
from dim_payments as dp inner join dim_customers as dc
on dp.account_number =dc.account_number
group by gender
#
# #
MAY 2021 EXAM
14
Question1
1. How many customers were born in the month of May
select COUNT(*) as 'May Babies' from dim_Customer
where MONTH(BirthDate) = 05;
Figure 15: Business Intelligence Final Exam: May Babies
2. Which SalesOrder(Using SalesOrderNumber) generated the highest sales (using SalesAmount) in the
year 2011
select SalesOrderNumber from dim_InternetSales
where YEAR(OrderDate) = 2011
ORDER BY SalesAmount desc;
Figure 16: Business Intelligence Final Exam: Highest Sales Order
3. What is the total amount (using Freight column) for all the products with a promotion of DiscountPct
= 0.1
select SUM(Freight) as 'Sum of Discount'
from dim_InternetSales
where UnitPriceDiscountPct = 0.1
15
Figure 17: Business Intelligence Final Exam: Total reight
4. What is the the total sales (using SalesAmount) for all the products bought in France by all male
customers who were born in 1971
select sum(SalesAmount) as Total
from dim_InternetSales INNER JOIN dim_SalesTerritory
ON dim_InternetSales.SalesTerritoryKey = dim_SalesTerritory.SalesTerritoryKey INNER JOIN
dim_Customer ON dim_InternetSales.CustomerKey = dim_Customer.CustomerKey
where SalesTerritoryCountry ='France'
AND year(BirthDate) = 1971
and Gender= 'M'
Figure 18: Business Intelligence Final Exam: Total Sales from France
16
WHATS NEXT
Congratulation for making it to the end of this tutorial. I wish you success. You can ATTEMPT questions at
rstatszim and Pyzim. Connect on LinkedIn Mcnamara Chiwaye. I post data science tips and tricks weekly.
17

The ultimate-guide-to-sql

  • 1.
    The Ultimate Guideto SQL CUT Masters in Data Analytics: Business Intelligence Exam Prep , Exam Date: 27 May 2021 McNamara Chiwaye 25/05/2021 Contents INTRODUCTION TO SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 SELECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 BUSINESS INTELLIGENCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 FILTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 ORDER BY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 JOINING TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 MAY 2021 EXAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 WHATS NEXT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 INTRODUCTION TO SQL SQL means Structured Querry Language. It is a language used to query databases. In this tutorial I will focus on queering your database having said that, lets look at the data sets that we are going to be working with. For a quick start i encourage you to create an account with DATACAMP Then if want to use the codes in this tutorial click here Introduction to sql For this tutorial i will be using a database containing information on almost 5000 films. We shall be working with the: “Data Science is the art of story telling with information gatered by a business. Business infor- mation can be, transaformed into useful insights and can drive business growth, reduce risk and make profits for shareholders. Data Science is our only hope to remain in business.” — Macnamara Chiwaye 1. ‘eurovision’ 2. ‘films dataset’ 3. ‘people dataset’ 4. ‘reviews dataset’ 1
  • 2.
    5. ‘eurovision’ 6. ‘grid’ SELECT Letstry something that you will appreciate later.Write this code Submit the query in the editor Now change ‘SQL’ to ‘SQL is’ and click Submit! SELECT 'SQL' AS result; # SELECT 'SQL with Macnamara is cool' AS result; # SQL , which stands for Structured Query Language, is a language for interacting with data stored in some- thing called a ‘relational database’. You can think of a relational database as a collection of tables. A table is just a set of rows and columns, like a spreadsheet, which represents exactly one type of entity. For example, a table might represent employees in a company or purchases made, but not both.Each row, or record, of a table contains information about a single entity. For example, in a table representing employees, each row represents a single person. Each column, or field, of a table contains a single attribute for all rows in the table. We will start with Exploratory Data Analysis on our data sets so we can have a feel of our data * We will use the count function to check the number of records in our table SELECT COUNT(*) FROM films # • We shall inspect our table by viewing all the records in our table. SELECT * FROM films # 2
  • 3.
    -- SELECT thecountry column FROM the eurovision table SELECT COUNT(*) AS 'ALL COUNTRIES IN EUROVISION' FROM eurovision # SELECTing single columns Simple selections It’s time to begin answering Business Questions! In this first coding exercise, you will use SELECT statements to retrieve columns from the business database table. This tutirial uses the eurovision table, which contains data relating to individual country performance at the Eurovision Song Contest from 1998 to 2012 and films datasets. While SQL can be used to create and modify databases, the focus of this course will be querying databases. A query is a request for data from a database table (or combination of tables). Querying is an essential skill for a Data scientist, since the data you need for your analyses will often live in databases. At this point its note worth mentioning that SQL is not case sensitive.In the next code i am going to use Mixed case just for you to see. However by convention we most people use UpperCase. 1. SeLECT the title column fRoM the films Table. SeLEct title FrOm films; # 2. SELECT the country column FROM the eurovision table. -- SELECT the country column FROM the eurovision table SELECT country FROM eurovision; # 3. Amend your query to return the points column instead of the country column. 3
  • 4.
    -- Select thepoints column SELECT points FROM eurovision; Below is a list Fig showing how the SELECT statement can be used including code to limit the number of rows SELECT TOP (50) points FROM eurovision; Figure 1: SELECT statement helps you to select columns from your Table 4. If you noticed I used TOP to change the existing query so that only the first 50 rows are returned. 5. Which countries participated in the Eurovison competition. -- Return unique countries and use an alias SELECT DISTINCT country AS unique_country FROM eurovision; # 4
  • 5.
    SELECTing multiple column Letssuppose you receive a request from the Business Unit requesting all information about 1. SELECT the country and event_year columns from the eurovision table. -- -- Select country and event_year from eurovision SELECT country, event_year FROM eurovision; # 2. Use a shortcut to amend the current query, returning ALL rows from ALL columns in the table. -- -- Shortcut to select all column in a table SELECT * FROM eurovision; 3. Return all columns, but only include the top half of the table - in other words, return 50 percent of the rows. Learning to Count -- -- Shortcut to select all column in a table SELECT COUNT(*) FROM reviews; BUSINESS INTELLIGENCE SQL Server & Transact -SQL • SQL Server - relational database system developed by Microsoft • Transact-SQL (T-SQL) - Microsoft’s implementation of SQL, with additional functionality 5
  • 6.
    • In thistutorial master the fundamentals of T-SQL. • Learn how to write queries Here is what the SQL server management studio looks like. Figure 2: The SQL Server object explorer Figure 3: The SQL Server Database pane Let’s take a look at the data we will be working. The digitv data base has four tables Let’s take a look at the payments . We see our dim_customer table Our Final table is the dim_plan_type 6
  • 7.
    Figure 4: TheSQL Server MENU Figure 5: The Payment Table Figure 6: The Customer Table 7
  • 8.
    Figure 7: ThePlan type Table Exploratory Data Analysis in SQL Uniqueness Constraints Write a querry to select the all the available plan_description, payment /*** Uniqueness Constraints ***/ SELECT DISTINCT plan_description AS plan_description FROM dim_plan_type; SELECT DISTINCT result AS "Transaction Status" FROM dim_payments; SELECT COUNT(DISTINCT plan_code) as 'Count of unique payment results' FROM dim_payments SELECT COUNT(DISTINCT duration) as 'payment duration' FROM dim_payments Figure 8: Uniqueness We calculate max, min and average to see best poor and average performance. Lets see how to calculate variability and measures of central tendency. This calculation also form the basis of risk management and Fraud detection. Outliers are a good start for Fraud detection. 8
  • 9.
    Figure 9: Uniquenesshelp model relationship and data types /*** Variability and Measures of Central Tendency ***/ SELECT MAX(duration) as 'Longest payment duration' FROM dim_payments SELECT MIN(duration) as 'Minimum payment duration' FROM dim_payments SELECT AVG(CAST(duration AS FLOAT)) as 'Average payment duration' FROM dim_payments Figure 10: Average payment duration Figure 11: Longest payment Duration FILTER and Fraud detection. Outliers are a good start for Fraud detection. 9
  • 10.
    Figure 12: MinimumPayment ---Average duration of approved transaction SELECT AVG(CAST(duration AS FLOAT)) as 'Average payment duration' FROM dim_payments WHERE result = 'APPROVED' --- What is the max, min and average amount for basestation AN and plan code 11 SELECT MAX(CAST(amount AS float)) AS 'Maximum payment amount for base station AN and plan code 11', MIN(CAST(amount AS float)) AS 'Minimum payment amount', AVG(CAST(amount AS float)) AS 'Average payment amount' FROM dim_payments WHERE base_sation_code = 'AN' AND plan_code = 11; --- What is the max, min and average amount for result that start with TRAN and you are unsure of the SELECT MAX(CAST(amount AS float)) AS 'Maximum payment amount for result Tstarts with TRAN', MIN(CAST(amount AS float)) AS 'Minimum payment amount', AVG(CAST(amount AS float)) AS 'Average payment amount' FROM dim_payments WHERE result LIKE 'TRAN%' ; Results of executing the Queries # 10
  • 11.
    # ORDER BY Ordering isa common data problem you will encounter. Its because ordered information is easy to process and get insights Lets see how you can order your data based on a criteria. --Order by the amount SELECT TOP (5) plan_code, duration FROM dim_payments ORDER BY amount --Order in descending order SELECT TOP (20) plan_code, duration, result, base_sation_code FROM dim_payments ORDER BY duration desc; Arrange by amount # Show biggest duration to minimum 11
  • 12.
    # Where Clause I amabout to show you a beginers error. In SQL date is a reserved word so when you want to query the column you will get an error.select [date] from dim_payments.Don‘t do that. Use Square brackets to escape preserved word SELECT TOP (20) plan_code, duration, result, base_sation_code FROM dim_payments WHERE [date] = '20201008'; SELECT TOP (20) plan_code, duration, result, base_sation_code FROM dim_payments WHERE [date] = '20201008' ORDER BY amount DESC; -- BETWEEN SELECT TOP (20) plan_code, duration, result, base_sation_code FROM dim_payments WHERE [date] BETWEEN '20201008' AND '20201017' ORDER BY amount DESC; 12
  • 13.
    # Working with NULLvalues --TRY these codes on your own SELECT * FROM dim_customers where middle_name IS NULL; SELECT * FROM dim_customers WHERE middle_name IS NOT NULL; Figure 13: Filterning MSCDA613 data: filtering on numeric and character data JOINING TABLES The trick with developing is that we start with the easy and slowly build. After all that’s why we are called developers right -- union join select * from dim_payments as dp inner join dim_customers as dc on dp.account_number =dc.account_number -- selecting column after join select result, duration, date_of_birth, amount, gender from dim_payments as dp inner join dim_customers as dc on dp.account_number =dc.account_number 13
  • 14.
    Figure 14: FilterningMSCDA613 data: Reserved words -- Aggregate and summation select gender , AVG(cast(amount as float)) as 'Average amount by Gender' from dim_payments as dp inner join dim_customers as dc on dp.account_number =dc.account_number group by gender # # # MAY 2021 EXAM 14
  • 15.
    Question1 1. How manycustomers were born in the month of May select COUNT(*) as 'May Babies' from dim_Customer where MONTH(BirthDate) = 05; Figure 15: Business Intelligence Final Exam: May Babies 2. Which SalesOrder(Using SalesOrderNumber) generated the highest sales (using SalesAmount) in the year 2011 select SalesOrderNumber from dim_InternetSales where YEAR(OrderDate) = 2011 ORDER BY SalesAmount desc; Figure 16: Business Intelligence Final Exam: Highest Sales Order 3. What is the total amount (using Freight column) for all the products with a promotion of DiscountPct = 0.1 select SUM(Freight) as 'Sum of Discount' from dim_InternetSales where UnitPriceDiscountPct = 0.1 15
  • 16.
    Figure 17: BusinessIntelligence Final Exam: Total reight 4. What is the the total sales (using SalesAmount) for all the products bought in France by all male customers who were born in 1971 select sum(SalesAmount) as Total from dim_InternetSales INNER JOIN dim_SalesTerritory ON dim_InternetSales.SalesTerritoryKey = dim_SalesTerritory.SalesTerritoryKey INNER JOIN dim_Customer ON dim_InternetSales.CustomerKey = dim_Customer.CustomerKey where SalesTerritoryCountry ='France' AND year(BirthDate) = 1971 and Gender= 'M' Figure 18: Business Intelligence Final Exam: Total Sales from France 16
  • 17.
    WHATS NEXT Congratulation formaking it to the end of this tutorial. I wish you success. You can ATTEMPT questions at rstatszim and Pyzim. Connect on LinkedIn Mcnamara Chiwaye. I post data science tips and tricks weekly. 17