Final Project SQL - Elyada Wigati Pramaresti.pptx

Data analyzed in this task are collected from Tokopedia (not the original data). The dataset is:
order_detail:
1. id → the unique number of order/id_order
2. customer_id → the unique number of customer
3. order_date → date when the transaction is carried out
4. sku_id → the unique number of a product (sku is stock
keeping unit)
5. price → the amount of money given in payment for
something
6. qty_ordered → the number of items purchased by customers
7. before_discount → the total price value of products
8. discount_amount → the discount value of the total product.
9. after_discount → the value of the total price after
aggregated by the discount
10. is_gross → shows that customers have not yet paid the
orders
11. is_valid → shows that customers have paid the orders
12. is_net → shows that the transaction is finished
13. payment_id → the unique number of payment method

sku_detail:
1. id → the unique number of a product (it can be used as a key for
joining)
2. sku_name → the name of the product
3. base_price → the price that is shown in the tagging
4. cogs → cost of selling one product
5. category → product category
customer_detail:
1. id → the unique number of a customer
2. registered_date → the date when a customer sign up as a member
Payment_detail:
1. id → the unique number of a payment
2. payment_method → the method of payment applied during
transaction

Business Questions
1. During the transactions that are carried out in 2021, in which month does the total transaction value
(after_discount) reach its peak? Use is_valid = 1 to filter the data. Source table: order_detail.
2. During the transactions that occurred in 2022, which category generated the largest transaction? Use
is_valid = 1 to filter the data. Source table: order_detail, sku_detail.
3. Compare the transaction value for each category in 2021 with those in 2022. Identify which category
that showed an increase, and which category experienced a decrease in the value of transaction. Use
4. Identify the top 5 most popular methods of payment used in 2022 (based on total unique order) Use
is_valid = 1 to filter the data. Source table: order_detail, payment_method.
5. Sort the 5 products based on their value of transactions: Samsung, Apple, Sony, Huawei, Lenovo. Use

Answers and Steps
1. Copy the syntax from the order_detail file. Then click Run
to implement the syntax.

2. To present the table, type SELECT * FROM order_detail. And then Run

Table of order_detail in PostgreSQL

3. Copy the syntax of sku_detail. Then click Run.

4. To present the table, type SELECT * FROM sku_detail. And then Run

Table of sku_detail in PostgreSQL
Apply the same steps to create the tables of payment_detail and
customer_detail.

Table of payment_detail in PostgreSQL

Table of customer_detail in PostgreSQL

1. During the transactions that are carried out in 2021, in which month does the total transaction value
(after_discount) reach its peak? Use is_valid = 1 to filter the data. Source table: order_detail.
SELECT
to_char (order_date,'Month') AS month_transaction,
-- to_char is used to format the year, month, and date. AS is used as an alias for
the data that is taken from the order_date.
SUM(after_discount) AS total_transaction
-- calculate the total sum from ‘after_discount’. AS is used as an alias for the
calculation result.
FROM
order_detail
WHERE
is_valid = 1
AND EXTRACT (year FROM order_date) = 2021
-- Extract the month from order_date in 2021
GROUP BY
1
-- Group the data based on the result. We choose the month that shows the
highest total transactions.
ORDER BY
2 DESC
-- orders the result in descending form or from highest to the lowest. 2 is the
sum after_discount
LIMIT 1
Query to take the data from the database and present the result
Function to specify the dataset from which the data is taken
Function to filter the data
Limit the output. 1 means it gives the highest transaction
value as it only presents the first row.

August is the month with the highest total transaction value in
2021. Its total transaction is 227,862,744.
Result

2. During the transactions that occurred in 2022, which category generated the largest transaction? Use
SELECT
category,
SUM (after_discount) AS total_transaction
-- calculate the total sum from ‘after_discount’. AS is used as an alias for the
calculation result.
FROM
sku_detail AS sd
LEFT JOIN order_detail AS od
ON sd.id = od.sku_id
-- Left join means returning all records from sku_detail and the matched
records from the order_detail.
WHERE
is_valid = 1
AND EXTRACT (YEAR FROM order_date) = 2022
-- Extract the month from order_date in 2022
GROUP BY
1
-- Group the data based on the result. We choose the category that generates
the highest total transactions.
ORDER BY
2 DESC
-- orders the result in descending form or from highest to the lowest. 2 is the
sum after_discount
LIMIT 1
The category of product in sku_detail
AS is the alias of sku_detail
AS is the alias of order_detail
Limit the output. 1 means it gives the highest transaction
value as it only presents the first row.

Mobiles and tablets are categories with the highest total transaction value in
2022. Its total transaction is 918,451,576
Result

3. Compare the transaction value for each category in 2021 with those in 2022. Identify which category
that showed an increase, and which category experienced a decrease in the value of transaction. Use
-- The comparison of transactions from each category in 2021 and 2022
SELECT
category,
SUM(CASE WHEN to_char (order_date,'yyyy-mm-dd') BETWEEN '2021-01-01'
AND '2021-12-31' THEN od.after_discount END) total_sales_2021,
-- CASE WHEN is the alternative function of IF-ELSE. If the data falls within the
range of dates above, then there will be value after_discount. END function is
added because if the data falls outside the range of dates above, then there will
be no value after_discount.
AND '2022-12-31' THEN od.after_discount END) total_sales_2022
FROM
order_detail AS od
LEFT JOIN
sku_detail AS sd
-- Left join means returning all records from order_detail and the matched
records from the sku_detail.
WHERE
is_valid = 1
GROUP BY 1
ORDER BY 2 DESC
The category of product in sku_detail
Group by category
Orders the result in descending form or from highest to the lowest. 2 is the
sum after_discount

-- Categories that show growth and categories that show slump
WITH
full_transaction as (
-- In this part, the WITH function is added. It is because, in this part, there are
some conditional functions. WITH helps to define the first functions before
reading the main queries.
SELECT
category,
AND '2021-12-31' THEN od.after_discount END) total_sales_2021,
AND '2022-12-31' THEN od.after_discount END) total_sales_2022
-- CASE WHEN is the alternative function of IF-ELSE. If the data falls within the
range of dates above, then there will be value after_discount. END function is
added because if the data falls outside the range of dates above, then there will
be no value after_discount.
FROM
order_detail AS od
LEFT JOIN
sku_detail AS sd
WHERE
is_valid = 1
GROUP BY 1
ORDER BY 2
DESC
)
Left join means returning all records from order_detail and the matched
records from the sku_detail.
Orders the result in descending form or from highest to the lowest. 2 is the
sum after_discount

SELECT
full_transaction.*,
total_sales_2022 - total_sales_2021 AS growth_value
-- The star symbol means that all columns in full_transaction are retrieved. Then
create one column where total_sales_2022 is minus by total_sales_2021. The
result will be known as growth_value.
FROM
full_transaction
ORDER BY
4 DESC

Result
The comparison of transactions from each category in 2021 and 2022

Result
Categories that show increase and categories that show decrease. Decrease is shown by
minus in the growth_value.

4. Identify the top 5 most popular methods of payment used in 2022 (based on total unique order) Use
SELECT
payment_method,
COUNT (DISTINCT od.id) AS total_payment
--COUNT is used because we want to count the number of unique orders. As we
need to count total orders, we need to implement DISTINCT. When we use
DISTINCT, although there are 5 records, we can identify that it is actually one
transaction.
FROM
order_detail AS od
LEFT JOIN
payment_detail AS pd
ON pd.id = od.payment_id
-- Left join means returning all records from order_detail and the matched
records from the payment_detail.
WHERE
EXTRACT (Year FROM order_date) = 2022
AND
is_valid = 1
GROUP BY
1
ORDER BY
2 DESC
LIMIT 5
Group by payment method
Limit the output. 5 means it gives 5 most used payment methods
which are located in the 5 first rows.

Result
5 most popular payment methods in 2022

5. Sort the 5 products based on their value of transactions: Samsung, Apple, Sony, Huawei, Lenovo. Use
WITH full_transaction AS (
SELECT
CASE
WHEN (sku_name) like '%samsung%' THEN 'Samsung'
WHEN (sku_name) LIKE '%apple%' THEN 'Apple'
WHEN (sku_name) LIKE '%iphone%' THEN 'Apple'
WHEN (sku_name) LIKE '%imac%' THEN 'Apple'
WHEN (sku_name) LIKE '%macbook%' THEN 'Apple'
WHEN (sku_name) LIKE '%sony%' THEN 'Sonny'
WHEN (sku_name) LIKE '%huawei%' THEN 'Huawei'
WHEN (sku_name) LIKE '%lenovo%' THEN 'Lenovo'
END product_name,
SUM(after_discount) total_sales
FROM
order_detail AS od
LEFT JOIN sku_detail as sd ON sd.id = od.sku_id
WHERE
to_char (order_date,'yyyy-mm-dd') BETWEEN '2022-01-01' AND '2022-12-31'
AND is_valid = 1
GROUP BY 1
)
SELECT
full_transaction.*
FROM full_transaction
WHERE
product_name NOTNULL
ORDER BY
2 DESC
*Syntax explanation is on
the next page

Syntax Explanation
WITH full_transaction AS (
SELECT
CASE
WHEN (sku_name) LIKE '%samsung%' THEN 'Samsung'
WHEN (sku_name) LIKE '%apple%' THEN 'Apple'
WHEN (sku_name) LIKE '%iphone%' THEN 'Apple'
WHEN (sku_name) LIKE '%imac%' THEN 'Apple'
WHEN (sku_name) LIKE '%macbook%' THEN 'Apple'
WHEN (sku_name) LIKE '%sony%' THEN 'Sonny'
WHEN (sku_name) LIKE '%lenovo%' THEN 'Lenovo’
-- Regular expression function is used to identify the data text, for example, Samsung. The
LIKE function is also utilized as it has the meaning of “similar with”. There is a % symbol on
both sides of Samsung, which has a function to identify any string that contains “Samsung”
word.
END product_name,
SUM(after_discount) total_sales
FROM
order_detail AS od
LEFT JOIN sku_detail as sd ON sd.id = od.sku_id
WHERE
to_char (order_date,'yyyy-mm-dd') BETWEEN '2022-01-01' AND '2022-12-31'
AND is_valid = 1
GROUP BY 1
)

SELECT
full_transaction.*
-- The star symbol means that all columns in full-transaction are
retrieved
FROM full_transaction
WHERE
product_name NOTNULL
-- The second temporary table aims to drop NOTNULL
ORDER BY
2 DESC 2 means product_name and total_sales. DESC
means sort from the highest to the lowest

Result
Samsung is the most popular product name as its total sales reach 97,552,288

Follow me!
Instagram : elyadawigatip
Twitter : @EliNoBishamon
LinkedIn : https://www.linkedin.com/in/elyada-
wigati-pramaresti-1a2387170/
Bootcamp Data Analysis
by @myskill.id

Final Project SQL - Elyada Wigati Pramaresti.pptx

Recommended

Recommended

More Related Content

Similar to Final Project SQL - Elyada Wigati Pramaresti.pptx

Similar to Final Project SQL - Elyada Wigati Pramaresti.pptx (20)

More from Elyada Wigati Pramaresti

More from Elyada Wigati Pramaresti (7)

Recently uploaded

Recently uploaded (20)

Final Project SQL - Elyada Wigati Pramaresti.pptx