This final project presents my analysis of sales and methods of payment for electronics, fashions, entertainment, and other products offered by a superstore. I used many SQL commands including Create Table, Select, From, Where, Group By, Order By, Limit, Left Join, and Extract.
2. Data analyzed in this task are collected from Tokopedia (not the original data). The dataset is:
order_detail:
1. id → the unique number of order/id_order
2. customer_id → the unique number of customer
3. order_date → date when the transaction is carried out
4. sku_id → the unique number of a product (sku is stock
keeping unit)
5. price → the amount of money given in payment for
something
6. qty_ordered → the number of items purchased by customers
7. before_discount → the total price value of products
8. discount_amount → the discount value of the total product.
9. after_discount → the value of the total price after
aggregated by the discount
10. is_gross → shows that customers have not yet paid the
orders
11. is_valid → shows that customers have paid the orders
12. is_net → shows that the transaction is finished
13. payment_id → the unique number of payment method
3. sku_detail:
1. id → the unique number of a product (it can be used as a key for
joining)
2. sku_name → the name of the product
3. base_price → the price that is shown in the tagging
4. cogs → cost of selling one product
5. category → product category
customer_detail:
1. id → the unique number of a customer
2. registered_date → the date when a customer sign up as a member
Payment_detail:
1. id → the unique number of a payment
2. payment_method → the method of payment applied during
transaction
5. Business Questions
1. During the transactions that are carried out in 2021, in which month does the total transaction value
(after_discount) reach its peak? Use is_valid = 1 to filter the data. Source table: order_detail.
2. During the transactions that occurred in 2022, which category generated the largest transaction? Use
is_valid = 1 to filter the data. Source table: order_detail, sku_detail.
3. Compare the transaction value for each category in 2021 with those in 2022. Identify which category
that showed an increase, and which category experienced a decrease in the value of transaction. Use
is_valid = 1 to filter the data. Source table: order_detail, sku_detail.
4. Identify the top 5 most popular methods of payment used in 2022 (based on total unique order) Use
is_valid = 1 to filter the data. Source table: order_detail, payment_method.
5. Sort the 5 products based on their value of transactions: Samsung, Apple, Sony, Huawei, Lenovo. Use
is_valid = 1 to filter the data. Source table: order_detail, payment_method.
15. 1. During the transactions that are carried out in 2021, in which month does the total transaction value
(after_discount) reach its peak? Use is_valid = 1 to filter the data. Source table: order_detail.
SELECT
to_char (order_date,'Month') AS month_transaction,
-- to_char is used to format the year, month, and date. AS is used as an alias for
the data that is taken from the order_date.
SUM(after_discount) AS total_transaction
-- calculate the total sum from ‘after_discount’. AS is used as an alias for the
calculation result.
FROM
order_detail
WHERE
is_valid = 1
AND EXTRACT (year FROM order_date) = 2021
-- Extract the month from order_date in 2021
GROUP BY
1
-- Group the data based on the result. We choose the month that shows the
highest total transactions.
ORDER BY
2 DESC
-- orders the result in descending form or from highest to the lowest. 2 is the
sum after_discount
LIMIT 1
Query to take the data from the database and present the result
Function to specify the dataset from which the data is taken
Function to filter the data
Limit the output. 1 means it gives the highest transaction
value as it only presents the first row.
16. August is the month with the highest total transaction value in
2021. Its total transaction is 227,862,744.
Result
17. 2. During the transactions that occurred in 2022, which category generated the largest transaction? Use
is_valid = 1 to filter the data. Source table: order_detail, sku_detail.
SELECT
category,
SUM (after_discount) AS total_transaction
-- calculate the total sum from ‘after_discount’. AS is used as an alias for the
calculation result.
FROM
sku_detail AS sd
LEFT JOIN order_detail AS od
ON sd.id = od.sku_id
-- Left join means returning all records from sku_detail and the matched
records from the order_detail.
WHERE
is_valid = 1
AND EXTRACT (YEAR FROM order_date) = 2022
-- Extract the month from order_date in 2022
GROUP BY
1
-- Group the data based on the result. We choose the category that generates
the highest total transactions.
ORDER BY
2 DESC
-- orders the result in descending form or from highest to the lowest. 2 is the
sum after_discount
LIMIT 1
Query to take the data from the database and present the result
Function to filter the data
The category of product in sku_detail
Function to specify the dataset from which the data is taken
AS is the alias of sku_detail
AS is the alias of order_detail
Limit the output. 1 means it gives the highest transaction
value as it only presents the first row.
18. Mobiles and tablets are categories with the highest total transaction value in
2022. Its total transaction is 918,451,576
Result
19. 3. Compare the transaction value for each category in 2021 with those in 2022. Identify which category
that showed an increase, and which category experienced a decrease in the value of transaction. Use
is_valid = 1 to filter the data. Source table: order_detail, sku_detail.
-- The comparison of transactions from each category in 2021 and 2022
SELECT
category,
SUM(CASE WHEN to_char (order_date,'yyyy-mm-dd') BETWEEN '2021-01-01'
AND '2021-12-31' THEN od.after_discount END) total_sales_2021,
-- CASE WHEN is the alternative function of IF-ELSE. If the data falls within the
range of dates above, then there will be value after_discount. END function is
added because if the data falls outside the range of dates above, then there will
be no value after_discount.
SUM(CASE WHEN to_char (order_date,'yyyy-mm-dd') BETWEEN '2022-01-01'
AND '2022-12-31' THEN od.after_discount END) total_sales_2022
FROM
order_detail AS od
LEFT JOIN
sku_detail AS sd
ON sd.id = od.sku_id
-- Left join means returning all records from order_detail and the matched
records from the sku_detail.
WHERE
is_valid = 1
GROUP BY 1
ORDER BY 2 DESC
Query to take the data from the database and present the result
The category of product in sku_detail
AS is the alias of order_detail
Function to filter the data
Group by category
Orders the result in descending form or from highest to the lowest. 2 is the
sum after_discount
Function to specify the dataset from which the data is taken
20. -- Categories that show growth and categories that show slump
WITH
full_transaction as (
-- In this part, the WITH function is added. It is because, in this part, there are
some conditional functions. WITH helps to define the first functions before
reading the main queries.
SELECT
category,
SUM(CASE WHEN to_char (order_date,'yyyy-mm-dd') BETWEEN '2021-01-01'
AND '2021-12-31' THEN od.after_discount END) total_sales_2021,
SUM(CASE WHEN to_char (order_date,'yyyy-mm-dd') BETWEEN '2022-01-01'
AND '2022-12-31' THEN od.after_discount END) total_sales_2022
-- CASE WHEN is the alternative function of IF-ELSE. If the data falls within the
range of dates above, then there will be value after_discount. END function is
added because if the data falls outside the range of dates above, then there will
be no value after_discount.
FROM
order_detail AS od
LEFT JOIN
sku_detail AS sd
ON sd.id = od.sku_id
WHERE
is_valid = 1
GROUP BY 1
ORDER BY 2
DESC
)
AS is the alias of order_detail
Left join means returning all records from order_detail and the matched
records from the sku_detail.
Function to filter the data
Orders the result in descending form or from highest to the lowest. 2 is the
sum after_discount
Function to specify the dataset from which the data is taken
21. SELECT
full_transaction.*,
total_sales_2022 - total_sales_2021 AS growth_value
-- The star symbol means that all columns in full_transaction are retrieved. Then
create one column where total_sales_2022 is minus by total_sales_2021. The
result will be known as growth_value.
FROM
full_transaction
ORDER BY
4 DESC
Function to specify the dataset from which the data is taken
23. Result
Categories that show increase and categories that show decrease. Decrease is shown by
minus in the growth_value.
24. 4. Identify the top 5 most popular methods of payment used in 2022 (based on total unique order) Use
is_valid = 1 to filter the data. Source table: order_detail, payment_method.
SELECT
payment_method,
COUNT (DISTINCT od.id) AS total_payment
--COUNT is used because we want to count the number of unique orders. As we
need to count total orders, we need to implement DISTINCT. When we use
DISTINCT, although there are 5 records, we can identify that it is actually one
transaction.
FROM
order_detail AS od
LEFT JOIN
payment_detail AS pd
ON pd.id = od.payment_id
-- Left join means returning all records from order_detail and the matched
records from the payment_detail.
WHERE
EXTRACT (Year FROM order_date) = 2022
AND
is_valid = 1
GROUP BY
1
ORDER BY
2 DESC
LIMIT 5
Function to specify the dataset from which the data is taken
Function to filter the data
Group by payment method
Limit the output. 5 means it gives 5 most used payment methods
which are located in the 5 first rows.
26. 5. Sort the 5 products based on their value of transactions: Samsung, Apple, Sony, Huawei, Lenovo. Use
is_valid = 1 to filter the data. Source table: order_detail, payment_method.
WITH full_transaction AS (
SELECT
CASE
WHEN (sku_name) like '%samsung%' THEN 'Samsung'
WHEN (sku_name) LIKE '%apple%' THEN 'Apple'
WHEN (sku_name) LIKE '%iphone%' THEN 'Apple'
WHEN (sku_name) LIKE '%imac%' THEN 'Apple'
WHEN (sku_name) LIKE '%macbook%' THEN 'Apple'
WHEN (sku_name) LIKE '%sony%' THEN 'Sonny'
WHEN (sku_name) LIKE '%huawei%' THEN 'Huawei'
WHEN (sku_name) LIKE '%huawei%' THEN 'Huawei'
WHEN (sku_name) LIKE '%lenovo%' THEN 'Lenovo'
END product_name,
SUM(after_discount) total_sales
FROM
order_detail AS od
LEFT JOIN sku_detail as sd ON sd.id = od.sku_id
WHERE
to_char (order_date,'yyyy-mm-dd') BETWEEN '2022-01-01' AND '2022-12-31'
AND is_valid = 1
GROUP BY 1
)
SELECT
full_transaction.*
FROM full_transaction
WHERE
product_name NOTNULL
ORDER BY
2 DESC
*Syntax explanation is on
the next page
27. Syntax Explanation
WITH full_transaction AS (
SELECT
CASE
WHEN (sku_name) LIKE '%samsung%' THEN 'Samsung'
WHEN (sku_name) LIKE '%apple%' THEN 'Apple'
WHEN (sku_name) LIKE '%iphone%' THEN 'Apple'
WHEN (sku_name) LIKE '%imac%' THEN 'Apple'
WHEN (sku_name) LIKE '%macbook%' THEN 'Apple'
WHEN (sku_name) LIKE '%sony%' THEN 'Sonny'
WHEN (sku_name) LIKE '%huawei%' THEN 'Huawei'
WHEN (sku_name) LIKE '%huawei%' THEN 'Huawei'
WHEN (sku_name) LIKE '%lenovo%' THEN 'Lenovo’
-- Regular expression function is used to identify the data text, for example, Samsung. The
LIKE function is also utilized as it has the meaning of “similar with”. There is a % symbol on
both sides of Samsung, which has a function to identify any string that contains “Samsung”
word.
END product_name,
SUM(after_discount) total_sales
FROM
order_detail AS od
LEFT JOIN sku_detail as sd ON sd.id = od.sku_id
WHERE
to_char (order_date,'yyyy-mm-dd') BETWEEN '2022-01-01' AND '2022-12-31'
AND is_valid = 1
GROUP BY 1
)
Function to specify the dataset from which the data is taken
Function to filter the data
28. SELECT
full_transaction.*
-- The star symbol means that all columns in full-transaction are
retrieved
FROM full_transaction
WHERE
product_name NOTNULL
-- The second temporary table aims to drop NOTNULL
ORDER BY
2 DESC 2 means product_name and total_sales. DESC
means sort from the highest to the lowest