SlideShare a Scribd company logo
Python Final Project
TOOL
About the Dataset
Data analyzed in this task are collected from Tokopedia (not the original data). The dataset is:
Variable Data Type Description
id object the unique number of order/id_order
customer_id object the unique number of customer
order_date object date when the transaction is carried out
sku_id object the unique number of a product (sku is stock keeping unit)
price int64 the amount of money given in payment for something
qty_ordered int64 the number of items purchased by customers
before_discount float64 the total price value of products
discount_amount float64 the discount value of the total product
after_discount float64 the value of the total price after aggregated by the discount
is_gross int64 shows that customers have not yet paid the orders
is_valid int64 shows that customers have paid the orders
is_net int64 shows that the transaction is finished
payment_id int64 the unique number of payment method
Variable Data Type Description
id object the unique number of a product (it can be used as a key for
joining)
sku_name object the name of the product
base_price float64 the price that is shown in the tagging
cogs int64 cost of selling one product
category object product category
order_detail:
sku_detail:
Variable Data Type Description
id object the unique number of a customer
registered_date object the date when a customer sign up as a member
customer_detail:
payment_detail:
Variable Data Type Description
id int64 the unique number of a payment
payment_method object the method of payment applied during transaction
Importing Libraries
Reviewing order_detail Table
Reviewing sku_detail Table
Reviewing customer_detail Table
Reviewing payment_detail Table
Pre-Processing the Data
We have 4 sets of tables. However, we need to join them by implementing SQL in Colab
so that we can carry out further analysis. In this case, the LEFT JOIN function is used.
1. Implement SQL in Google Colab 2. Write the SQL queries for combining 4 tables
Pre-Processing the Data
Reviewing the joined datasets by checking the first 10 rows
Pre-Processing the Data
Checking the data type from each column. This is carried out to check whether the data type is correct or not.
Pre-Processing the Data
The types of data in some columns need to be changed to make them easier to process. The data in
the number column is changed into integers. The data in the format column is transformed by giving
the new format so that the date is ordered.
Queries
Result
Pre-Processing the Data
Change the date type to date time
Problem Statements
1. Dear Data Analyst,
At the end of this year, the company will provide prizes for the consumers who win
the Year-End Festival Competition. The Marketing Team needs help to determine the
prizes which will be given to the winners. The prizes will be taken from the Top 5
products from Mobiles and Tablets Category in 2022, with the highest order quantity
(valid = 1).
Please help send the data before the end of this month to the Marketing Team. We
appreciate your help. Thank you.
Kindly regards,
Marketing Team
Explanation:
We need to filter the data. First of all, we need to filter the category as this column consists of different types of
categories and we only need Mobiles and Tablets. Next, we filter the order date based on transactions in 2022. We
specify the date by writing >= ‘2022-01-01 (this is the date when any transaction begins in 2022) and <= ‘2022-12-31’
(this is the date when any transaction ends in 2022). After we obtain the transaction data, we need to add ‘is_valid’
function to ensure all the data is correct.
Second, we aggregate the data based on product and total quantity. Therefore, we use groupby function as we need to
group the data. Use sku_name to query the name of the product and qty_order to query the quantity order. We need
to calculate the quantity order. Therefore, we use sum function. We also need to reset the header and sort the data in
ascending. Write reset_index and sort_values by “qty_ordered”. Write ascending=False to order the data from the
highest to the lowest.
Third, we display the top 5 products. Therefore, write df_filter.head ()
Result
The sales data of Mobiles & Tablets in 2022 shows that IDROID_BALRX7-Gold is the most
popular product as its quantity order reaches 2000. It is followed by RS_Coconut Bites with a
quantity order of 300. There is a large gap in quantity order between IDROID_BALRX7-Gold
and RS_Coconut Bites. The gap may be caused by the better features of IDROID_BALRX7-
Gold and the different marketing strategies for these two products.
Analysis:
Problem Statements
2. Dear Data Analyst,
Following up on the meeting with the Warehouse Team and Marketing Team, we found out
that there was plenty of product stock with Others Category available at the end of 2022.
1. We kindly ask for your help to check the sales data for this category in 2021 based on
sales quantity. Our temporary assumption is that there was a decrease in the sales
quantity in 2022 compared to 2021. Please also display the data of the 15 products
from the category.
2. If there is a decrease in the sales quantity of Others Category, we ask for your help to
present the data of the TOP 20 products that have the highest decrease in 2022
compared with 2021. Please send the result no later than 4 days from today We will
use the result for the next meeting. We appreciate your help. Thank you
Kindly regards,
WarehouseTeam
2.1
Explanation:
First of all, we need to filter the transaction data in 2021. Use is_valid function to ensure the data is valid. We specify the date by writing >=
‘2022-01-01 (this is the date when any transaction begins in 2021) and <= ‘2022-12-31’ (this is the date when any transaction ends in 2021).
Second, use groupby function as we need to group the 2021 data. Use category to query the product category and ‘qty_order’ to query the
quantity order. We need to calculate the quantity order. Therefore, we use sum function. We also need to reset the header andsort the data in
ascending. Write reset_index and sort_values by “qty_ordered”. Write ascending=False to order the data from the highest to the lowest.
Apply the same query to filter the 2022 data and create a group for the 2022 data.
Explanation:
2.1
After sorting the data and creating a group each for 2021 and 2022, we need to merge two tables by using the merge function
on ‘category’. Subsequently, we renamed the column ‘qty_ordered_x’ became ‘Quantity 2021’ and column ‘qty_ordered_y’
became ‘Quantity 2022’. Next, add a new column by calculating ‘Quantity_2022’ minus ‘Quantity 2021’
2.1
Explanation:
Furthermore, we also need to create a new column in which the condition can be applied. In this case, we use df_merged[] =
df_merged[].apply(condition) function. Then, display the merged data from the highest to the lowest by using ascending. As
we only need 15 data, add head(15).
We also need to create a parameter to determine whether the sales data can be considered as an increase or decrease. Add
the if value function that will process any value that is larger than 0, considered as an increase, and if not, then the value falls
under the decrease category.
Result
Analysis:
The categories that show a decrease in sales in
2022 are Others, Soghaat, Men Fashion, and
Beauty & Grooming. Other is the category that
experienced the biggest downturn as its sales
quantity reduced by 147 from 2021 to 2022.
2.2
Explanation:
The query for 2.2 is quite similar to 2.1. However, when filtering the 2021 and 2022 data, add a new column which is the
category. This column is utilized to filter the new table based on Others Category. The  symbol shows that the query is still in
the same line.
2.2
Explanation:
After sorting the data and creating a group each for 2021 and 2022, we need to merge two tables by using the merge function
on ‘sku_name’. Subsequently, we renamed the column ‘qty_ordered_x’ became ‘Quantity 2021’ and column ‘qty_ordered_y’
became ‘Quantity 2022’. Next, add a new column by calculating ‘Quantity_2022’ minus ‘Quantity 2021’
2.2
We also need to create a parameter to determine whether the sales data can be considered as an increase or decrease. Add
the if value function that will process any value that is larger than 0, considered as an increase, and if not, then the value falls
under the decrease category.
Explanation:
Create a new column in which the condition can be applied. In this case, we use df_merged[] =
df_merged[].apply(condition) function. Then, display the merged data from the highest to the lowest by using ascending. As
we need 20 data, add head(20).
Result
Analysis:
The product from Others Category that has the
biggest decrease is RB-Dettol Germ Busting
Kit_bf. Its downturn reaches as much as 155,
followed by Telemail_MM-DR-HB-L, which
sales quantity dropped by 21. The drop may be
caused by poor marketing. Therefore, the
company should give attention to these two
products to overcome the sales decrease.
Problem Statements
3. Dear Data Analyst,
Following the company’s anniversary in the next 2 months, the Digital Marketing Team will
provide promotional information to customers at the end of this month. The customer
criteria that we need are those who have checked out but have not yet made a payment
(is_gross = 1) in 2022. The data we need are the Customer ID and Registered Date.
Please send the data before the end of this month to the Digital Marketing Team. Thank you
for your help.
Kindly Regards,
Digital Marketing Team
Explanation:
We need to filter the data. We look for customers who have checked
out but have not yet made a payment. Therefore, we need to filter
the data based on is_gross, is_net, and order_date. We add the
is_valid function to ensure the data is correct. Then display the data
by using df function.
Explanation:
The Digital Marketing Team needs the data
analyst to send the data. To download the
data, we need to import the files from
Google Colab and convert them to csv
format. Then write files. download.
Result
Result
Problem Statements
4. Dear Data Analyst,
From October to December 2022, we will carry out campaigns every Saturday and Sunday.
We will assess whether the campaign has an impact on the sales increase (before_discount).
We kindly ask for your help to display the data:
1. The average daily weekend sales (Saturday and Sunday) vs average daily weekday
sales (Monday-Friday) per month. Is there an increase of sales in each month?
2. The average daily weekend sales (Saturday and Sunday) vs average daily weekday
sales (Monday-Friday) for 3 months.
Please send the data no later than next week. We appreciate your help. Thank you
Kindly Regards,
Digital Marketing Team
4.1
Explanation:
Create new columns that consist of day, month, month_num, and year.
4.1
Explanation:
We need to create two tables: one that consists of weekend data and another that consists of weekday data. First, filter the weekend
data based on is_valid = 1, month = October, November, and December, day = Saturday and Sunday, and year = 2022. Use the isin
function to check the value inside the data frame and return the data frame.
Second, we group the data based on the month and the sales increase. Use ‘month’ to query the months and “before_discount” to
query the sales increase. As we need to aggregate the average daily sales, we use sum function. We also need to reset the header and
sort the data in ascending. Write reset_index and sort_values by “before_discount”. Write ascending = False to order the data from
the highest to the lowest.
Use the same syntax for the weekday data.
4.1
Explanation:
Merge the data as we need to see the data from two tables. The function that is used in this step is merge.
4.1
Explanation:
We also need to create a parameter to determine whether the sales data can be considered as an increase
or decrease. Add the if value function that will process any value that is larger than 0, considered as an
increase, and if not, then the value falls under the decrease category.
Create a new column in which the condition can be applied. In this case, we use df_merged[] =
df_merged[].apply(condition) function. Then, display the merged data from the highest to the
lowest by using ascending.
4.2
Explanation:
Create a bar chart to ease the users in comparing the sales on the weekends and the weekdays.
Use plot function to make the bar chart.
Result
Analysis:
From the processed data, it is
apparent that the average sales
were the highest in October 2022.
Within three months, the average
sales on the weekdays were higher
than the sales on the weekends.
There was a great drop in the
average weekday sales in November.
However, it rocketed again in
December. On the other hand, the
weekend sales continually decrease
from October to December.
Therefore, the company needs to
give more attention to the weekend
sales. There may be a need to
change the campaign strategy to
improve the average sales on the
weekend.
Follow me!
Instagram : elyadawigatip
Twitter : @EliNoBishamon
LinkedIn : https://www.linkedin.com/in/elyada-
wigati-pramaresti-1a2387170/
Bootcamp Data Analysis
by @myskill.id

More Related Content

Similar to Final Project Python - Elyada Wigati Pramaresti.pptx

Data warehousing
Data warehousingData warehousing
Data warehousing
Ashish Kumar Jena
 
Data Modeling in Looker
Data Modeling in LookerData Modeling in Looker
Data Modeling in Looker
Looker
 
Check printing in_r12
Check printing in_r12Check printing in_r12
Check printing in_r12
Rajesh Khatri
 
Step By Step Analyzing Price Elasticit1.pdf
Step By Step Analyzing Price Elasticit1.pdfStep By Step Analyzing Price Elasticit1.pdf
Step By Step Analyzing Price Elasticit1.pdf
Rahmat Taufiq Sigit
 
SAP SD Copy Controls
SAP SD Copy ControlsSAP SD Copy Controls
SAP SD Copy Controls
Srinivasulu Algaskhanpet
 
How SAP SD is integrated with SAP Finance?
How SAP SD is integrated with SAP Finance?How SAP SD is integrated with SAP Finance?
How SAP SD is integrated with SAP Finance?
Intelligroup, Inc.
 
Sap 20 overview_203_1_
Sap 20 overview_203_1_Sap 20 overview_203_1_
Sap 20 overview_203_1_
Rodolfo Ocampo
 
Da 100-questions
Da 100-questionsDa 100-questions
Da 100-questions
Sandeep Kumar Chavan
 
Set Analyse OK.pdf
Set Analyse OK.pdfSet Analyse OK.pdf
Set Analyse OK.pdf
qlik2learn2024
 
Kpi handbook implementation on bizforce one
Kpi handbook implementation on bizforce oneKpi handbook implementation on bizforce one
Kpi handbook implementation on bizforce one
Hieutanda Nguyen Khac Hieu
 
Dynamics gp insights to distribution - inventory
Dynamics gp insights to distribution - inventoryDynamics gp insights to distribution - inventory
Dynamics gp insights to distribution - inventory
Steve Chapman
 
Troubleshoot experience
Troubleshoot experienceTroubleshoot experience
Troubleshoot experience
Krishna Rao
 
Customer master Data
Customer master DataCustomer master Data
Customer master Data
Gopal Volluri
 
Material determination
Material determinationMaterial determination
Material determination
madhu jetty
 
Sap sd important interview concepts
Sap sd important interview concepts Sap sd important interview concepts
Sap sd important interview concepts
Mohit Amitabh
 
Real -time data visualization using business intelligence techniques. and mak...
Real -time data visualization using business intelligence techniques. and mak...Real -time data visualization using business intelligence techniques. and mak...
Real -time data visualization using business intelligence techniques. and mak...
MD Owes Quruny Shubho
 
Level of-detail-expressions
Level of-detail-expressionsLevel of-detail-expressions
Level of-detail-expressions
Yogeeswar Reddy
 
Answer
AnswerAnswer
Answer
Yella Man
 
ggg
 ggg ggg
ggg
salehe123
 
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptxbigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
Harshavardhan851231
 

Similar to Final Project Python - Elyada Wigati Pramaresti.pptx (20)

Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data Modeling in Looker
Data Modeling in LookerData Modeling in Looker
Data Modeling in Looker
 
Check printing in_r12
Check printing in_r12Check printing in_r12
Check printing in_r12
 
Step By Step Analyzing Price Elasticit1.pdf
Step By Step Analyzing Price Elasticit1.pdfStep By Step Analyzing Price Elasticit1.pdf
Step By Step Analyzing Price Elasticit1.pdf
 
SAP SD Copy Controls
SAP SD Copy ControlsSAP SD Copy Controls
SAP SD Copy Controls
 
How SAP SD is integrated with SAP Finance?
How SAP SD is integrated with SAP Finance?How SAP SD is integrated with SAP Finance?
How SAP SD is integrated with SAP Finance?
 
Sap 20 overview_203_1_
Sap 20 overview_203_1_Sap 20 overview_203_1_
Sap 20 overview_203_1_
 
Da 100-questions
Da 100-questionsDa 100-questions
Da 100-questions
 
Set Analyse OK.pdf
Set Analyse OK.pdfSet Analyse OK.pdf
Set Analyse OK.pdf
 
Kpi handbook implementation on bizforce one
Kpi handbook implementation on bizforce oneKpi handbook implementation on bizforce one
Kpi handbook implementation on bizforce one
 
Dynamics gp insights to distribution - inventory
Dynamics gp insights to distribution - inventoryDynamics gp insights to distribution - inventory
Dynamics gp insights to distribution - inventory
 
Troubleshoot experience
Troubleshoot experienceTroubleshoot experience
Troubleshoot experience
 
Customer master Data
Customer master DataCustomer master Data
Customer master Data
 
Material determination
Material determinationMaterial determination
Material determination
 
Sap sd important interview concepts
Sap sd important interview concepts Sap sd important interview concepts
Sap sd important interview concepts
 
Real -time data visualization using business intelligence techniques. and mak...
Real -time data visualization using business intelligence techniques. and mak...Real -time data visualization using business intelligence techniques. and mak...
Real -time data visualization using business intelligence techniques. and mak...
 
Level of-detail-expressions
Level of-detail-expressionsLevel of-detail-expressions
Level of-detail-expressions
 
Answer
AnswerAnswer
Answer
 
ggg
 ggg ggg
ggg
 
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptxbigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
 

More from Elyada Wigati Pramaresti

Final Project Data Visualization - Elyada Wigati
Final Project Data Visualization - Elyada WigatiFinal Project Data Visualization - Elyada Wigati
Final Project Data Visualization - Elyada Wigati
Elyada Wigati Pramaresti
 
SQL Basic Clause - Portfolio.pptx
SQL Basic Clause - Portfolio.pptxSQL Basic Clause - Portfolio.pptx
SQL Basic Clause - Portfolio.pptx
Elyada Wigati Pramaresti
 
Working with Google Sheet - Portfolio.pptx
Working with Google Sheet - Portfolio.pptxWorking with Google Sheet - Portfolio.pptx
Working with Google Sheet - Portfolio.pptx
Elyada Wigati Pramaresti
 
Intro to Statistics.pptx
Intro to Statistics.pptxIntro to Statistics.pptx
Intro to Statistics.pptx
Elyada Wigati Pramaresti
 
Improvement as Data Analyst.pptx
Improvement as Data Analyst.pptxImprovement as Data Analyst.pptx
Improvement as Data Analyst.pptx
Elyada Wigati Pramaresti
 
Kickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptx
Kickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptxKickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptx
Kickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptx
Elyada Wigati Pramaresti
 
Microsoft Excel Basic to Advance - Sales Analysis
Microsoft Excel Basic to Advance - Sales AnalysisMicrosoft Excel Basic to Advance - Sales Analysis
Microsoft Excel Basic to Advance - Sales Analysis
Elyada Wigati Pramaresti
 

More from Elyada Wigati Pramaresti (7)

Final Project Data Visualization - Elyada Wigati
Final Project Data Visualization - Elyada WigatiFinal Project Data Visualization - Elyada Wigati
Final Project Data Visualization - Elyada Wigati
 
SQL Basic Clause - Portfolio.pptx
SQL Basic Clause - Portfolio.pptxSQL Basic Clause - Portfolio.pptx
SQL Basic Clause - Portfolio.pptx
 
Working with Google Sheet - Portfolio.pptx
Working with Google Sheet - Portfolio.pptxWorking with Google Sheet - Portfolio.pptx
Working with Google Sheet - Portfolio.pptx
 
Intro to Statistics.pptx
Intro to Statistics.pptxIntro to Statistics.pptx
Intro to Statistics.pptx
 
Improvement as Data Analyst.pptx
Improvement as Data Analyst.pptxImprovement as Data Analyst.pptx
Improvement as Data Analyst.pptx
 
Kickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptx
Kickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptxKickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptx
Kickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptx
 
Microsoft Excel Basic to Advance - Sales Analysis
Microsoft Excel Basic to Advance - Sales AnalysisMicrosoft Excel Basic to Advance - Sales Analysis
Microsoft Excel Basic to Advance - Sales Analysis
 

Recently uploaded

一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 

Recently uploaded (20)

一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 

Final Project Python - Elyada Wigati Pramaresti.pptx

  • 3. About the Dataset Data analyzed in this task are collected from Tokopedia (not the original data). The dataset is: Variable Data Type Description id object the unique number of order/id_order customer_id object the unique number of customer order_date object date when the transaction is carried out sku_id object the unique number of a product (sku is stock keeping unit) price int64 the amount of money given in payment for something qty_ordered int64 the number of items purchased by customers before_discount float64 the total price value of products discount_amount float64 the discount value of the total product after_discount float64 the value of the total price after aggregated by the discount is_gross int64 shows that customers have not yet paid the orders is_valid int64 shows that customers have paid the orders is_net int64 shows that the transaction is finished payment_id int64 the unique number of payment method Variable Data Type Description id object the unique number of a product (it can be used as a key for joining) sku_name object the name of the product base_price float64 the price that is shown in the tagging cogs int64 cost of selling one product category object product category order_detail: sku_detail: Variable Data Type Description id object the unique number of a customer registered_date object the date when a customer sign up as a member customer_detail: payment_detail: Variable Data Type Description id int64 the unique number of a payment payment_method object the method of payment applied during transaction
  • 9. Pre-Processing the Data We have 4 sets of tables. However, we need to join them by implementing SQL in Colab so that we can carry out further analysis. In this case, the LEFT JOIN function is used. 1. Implement SQL in Google Colab 2. Write the SQL queries for combining 4 tables
  • 10. Pre-Processing the Data Reviewing the joined datasets by checking the first 10 rows
  • 11. Pre-Processing the Data Checking the data type from each column. This is carried out to check whether the data type is correct or not.
  • 12. Pre-Processing the Data The types of data in some columns need to be changed to make them easier to process. The data in the number column is changed into integers. The data in the format column is transformed by giving the new format so that the date is ordered. Queries Result
  • 13. Pre-Processing the Data Change the date type to date time
  • 14. Problem Statements 1. Dear Data Analyst, At the end of this year, the company will provide prizes for the consumers who win the Year-End Festival Competition. The Marketing Team needs help to determine the prizes which will be given to the winners. The prizes will be taken from the Top 5 products from Mobiles and Tablets Category in 2022, with the highest order quantity (valid = 1). Please help send the data before the end of this month to the Marketing Team. We appreciate your help. Thank you. Kindly regards, Marketing Team
  • 15. Explanation: We need to filter the data. First of all, we need to filter the category as this column consists of different types of categories and we only need Mobiles and Tablets. Next, we filter the order date based on transactions in 2022. We specify the date by writing >= ‘2022-01-01 (this is the date when any transaction begins in 2022) and <= ‘2022-12-31’ (this is the date when any transaction ends in 2022). After we obtain the transaction data, we need to add ‘is_valid’ function to ensure all the data is correct. Second, we aggregate the data based on product and total quantity. Therefore, we use groupby function as we need to group the data. Use sku_name to query the name of the product and qty_order to query the quantity order. We need to calculate the quantity order. Therefore, we use sum function. We also need to reset the header and sort the data in ascending. Write reset_index and sort_values by “qty_ordered”. Write ascending=False to order the data from the highest to the lowest. Third, we display the top 5 products. Therefore, write df_filter.head ()
  • 16. Result The sales data of Mobiles & Tablets in 2022 shows that IDROID_BALRX7-Gold is the most popular product as its quantity order reaches 2000. It is followed by RS_Coconut Bites with a quantity order of 300. There is a large gap in quantity order between IDROID_BALRX7-Gold and RS_Coconut Bites. The gap may be caused by the better features of IDROID_BALRX7- Gold and the different marketing strategies for these two products. Analysis:
  • 17. Problem Statements 2. Dear Data Analyst, Following up on the meeting with the Warehouse Team and Marketing Team, we found out that there was plenty of product stock with Others Category available at the end of 2022. 1. We kindly ask for your help to check the sales data for this category in 2021 based on sales quantity. Our temporary assumption is that there was a decrease in the sales quantity in 2022 compared to 2021. Please also display the data of the 15 products from the category. 2. If there is a decrease in the sales quantity of Others Category, we ask for your help to present the data of the TOP 20 products that have the highest decrease in 2022 compared with 2021. Please send the result no later than 4 days from today We will use the result for the next meeting. We appreciate your help. Thank you Kindly regards, WarehouseTeam
  • 18. 2.1 Explanation: First of all, we need to filter the transaction data in 2021. Use is_valid function to ensure the data is valid. We specify the date by writing >= ‘2022-01-01 (this is the date when any transaction begins in 2021) and <= ‘2022-12-31’ (this is the date when any transaction ends in 2021). Second, use groupby function as we need to group the 2021 data. Use category to query the product category and ‘qty_order’ to query the quantity order. We need to calculate the quantity order. Therefore, we use sum function. We also need to reset the header andsort the data in ascending. Write reset_index and sort_values by “qty_ordered”. Write ascending=False to order the data from the highest to the lowest. Apply the same query to filter the 2022 data and create a group for the 2022 data.
  • 19. Explanation: 2.1 After sorting the data and creating a group each for 2021 and 2022, we need to merge two tables by using the merge function on ‘category’. Subsequently, we renamed the column ‘qty_ordered_x’ became ‘Quantity 2021’ and column ‘qty_ordered_y’ became ‘Quantity 2022’. Next, add a new column by calculating ‘Quantity_2022’ minus ‘Quantity 2021’
  • 20. 2.1 Explanation: Furthermore, we also need to create a new column in which the condition can be applied. In this case, we use df_merged[] = df_merged[].apply(condition) function. Then, display the merged data from the highest to the lowest by using ascending. As we only need 15 data, add head(15). We also need to create a parameter to determine whether the sales data can be considered as an increase or decrease. Add the if value function that will process any value that is larger than 0, considered as an increase, and if not, then the value falls under the decrease category.
  • 21. Result Analysis: The categories that show a decrease in sales in 2022 are Others, Soghaat, Men Fashion, and Beauty & Grooming. Other is the category that experienced the biggest downturn as its sales quantity reduced by 147 from 2021 to 2022.
  • 22. 2.2 Explanation: The query for 2.2 is quite similar to 2.1. However, when filtering the 2021 and 2022 data, add a new column which is the category. This column is utilized to filter the new table based on Others Category. The symbol shows that the query is still in the same line.
  • 23. 2.2 Explanation: After sorting the data and creating a group each for 2021 and 2022, we need to merge two tables by using the merge function on ‘sku_name’. Subsequently, we renamed the column ‘qty_ordered_x’ became ‘Quantity 2021’ and column ‘qty_ordered_y’ became ‘Quantity 2022’. Next, add a new column by calculating ‘Quantity_2022’ minus ‘Quantity 2021’
  • 24. 2.2 We also need to create a parameter to determine whether the sales data can be considered as an increase or decrease. Add the if value function that will process any value that is larger than 0, considered as an increase, and if not, then the value falls under the decrease category. Explanation: Create a new column in which the condition can be applied. In this case, we use df_merged[] = df_merged[].apply(condition) function. Then, display the merged data from the highest to the lowest by using ascending. As we need 20 data, add head(20).
  • 25. Result Analysis: The product from Others Category that has the biggest decrease is RB-Dettol Germ Busting Kit_bf. Its downturn reaches as much as 155, followed by Telemail_MM-DR-HB-L, which sales quantity dropped by 21. The drop may be caused by poor marketing. Therefore, the company should give attention to these two products to overcome the sales decrease.
  • 26. Problem Statements 3. Dear Data Analyst, Following the company’s anniversary in the next 2 months, the Digital Marketing Team will provide promotional information to customers at the end of this month. The customer criteria that we need are those who have checked out but have not yet made a payment (is_gross = 1) in 2022. The data we need are the Customer ID and Registered Date. Please send the data before the end of this month to the Digital Marketing Team. Thank you for your help. Kindly Regards, Digital Marketing Team
  • 27. Explanation: We need to filter the data. We look for customers who have checked out but have not yet made a payment. Therefore, we need to filter the data based on is_gross, is_net, and order_date. We add the is_valid function to ensure the data is correct. Then display the data by using df function. Explanation: The Digital Marketing Team needs the data analyst to send the data. To download the data, we need to import the files from Google Colab and convert them to csv format. Then write files. download.
  • 30. Problem Statements 4. Dear Data Analyst, From October to December 2022, we will carry out campaigns every Saturday and Sunday. We will assess whether the campaign has an impact on the sales increase (before_discount). We kindly ask for your help to display the data: 1. The average daily weekend sales (Saturday and Sunday) vs average daily weekday sales (Monday-Friday) per month. Is there an increase of sales in each month? 2. The average daily weekend sales (Saturday and Sunday) vs average daily weekday sales (Monday-Friday) for 3 months. Please send the data no later than next week. We appreciate your help. Thank you Kindly Regards, Digital Marketing Team
  • 31. 4.1 Explanation: Create new columns that consist of day, month, month_num, and year.
  • 32. 4.1 Explanation: We need to create two tables: one that consists of weekend data and another that consists of weekday data. First, filter the weekend data based on is_valid = 1, month = October, November, and December, day = Saturday and Sunday, and year = 2022. Use the isin function to check the value inside the data frame and return the data frame. Second, we group the data based on the month and the sales increase. Use ‘month’ to query the months and “before_discount” to query the sales increase. As we need to aggregate the average daily sales, we use sum function. We also need to reset the header and sort the data in ascending. Write reset_index and sort_values by “before_discount”. Write ascending = False to order the data from the highest to the lowest. Use the same syntax for the weekday data.
  • 33. 4.1 Explanation: Merge the data as we need to see the data from two tables. The function that is used in this step is merge.
  • 34. 4.1 Explanation: We also need to create a parameter to determine whether the sales data can be considered as an increase or decrease. Add the if value function that will process any value that is larger than 0, considered as an increase, and if not, then the value falls under the decrease category. Create a new column in which the condition can be applied. In this case, we use df_merged[] = df_merged[].apply(condition) function. Then, display the merged data from the highest to the lowest by using ascending.
  • 35. 4.2 Explanation: Create a bar chart to ease the users in comparing the sales on the weekends and the weekdays. Use plot function to make the bar chart.
  • 36. Result Analysis: From the processed data, it is apparent that the average sales were the highest in October 2022. Within three months, the average sales on the weekdays were higher than the sales on the weekends. There was a great drop in the average weekday sales in November. However, it rocketed again in December. On the other hand, the weekend sales continually decrease from October to December. Therefore, the company needs to give more attention to the weekend sales. There may be a need to change the campaign strategy to improve the average sales on the weekend.
  • 37.
  • 38. Follow me! Instagram : elyadawigatip Twitter : @EliNoBishamon LinkedIn : https://www.linkedin.com/in/elyada- wigati-pramaresti-1a2387170/ Bootcamp Data Analysis by @myskill.id