3. Business
Understanding
Business Understanding
Why Needs Business Understanding?
GOALS
• Memilih variabel kunci yang menjadi target
model yang kita bangun serta metrik-metriknya
adalah indikator kesuksesan proyek kita
• Mengidentifikasi sumber data relevan yang
aksesnya tersedia / perlu didapatkan aksesnya
4. Intro
Step
Suggest Recommendation
Rekomendasi yang diberikan
berdasarkan hasil analisis
Get Insight
Mengambil kesimpulan dari
informasi yang didapatkan
Define Business Problem
Memahami permasalahan
yang terjadi
Problem Discovery
Memilih metrics yang dapat
menyelesaikan permasalahan
5. Review Case
Brazilian E-Commerce Olist
● Business Problem:
○ Bagaimana kondisi e-commerce olist?
● Problem Discovery:
○ GMV: Gross Merchant Value
○ Total Order
○ Total Buyers
○ Retention Rate, etc
● Insight?
● Recommendation?
6. Review Case
Design Data Mart
Table Order
order_id
order_date
customer_id
order_status
Table Order Items
order_id
product_id
price
freight_value
Table Customer
customer_id
customer_city
Metrics Transaction
order_id
total_product
total_price
total_logistic _cost
Transform
Data Mart Transaction
order_id
order_date
customer_id
order_status
customer_city
total_product
total_price
total_logistic_cost
Join
Join
Why Data Mart?
● Gather Information
● Efficient
● Support to explore
data
7. Review Case
Query Data Mart
WITH
metrics_transaction AS (
SELECT
order_id,
COUNT(DISTINCT product_id) total_product,
SUM(price) total_product_price,
SUM(freight_value) total_logistic_cost
FROM
`database-385606.Latihan1.order_items`-- custom with your dataset
GROUP BY
1 )
SELECT
a.*,
c.customer_city,
c.customer_unique_id,
COALESCE(b.total_product,0) total_product,
COALESCE(b.total_product_price,0) total_product_price,
COALESCE(total_logistic_cost,0) total_logistic_cost
FROM
`database-385606.Latihan1.order` a -- custom with your dataset
LEFT JOIN
metrics_transaction b
ON
a.order_id=b.order_id
LEFT JOIN (
SELECT
DISTINCT customer_id,customer_unique_id
customer_city
FROM
`database-385606.Latihan1.cust`) c -- custom with your dataset
ON
a.customer_id=c.customer_id
CTE
Sub Query
8. Review Case
Create Data Mart
● View Table:
○ Table yang dibentuk menggunakan query dan
membutuhkan waktu dalam running query untuk
menampilkan result
● Fact Table:
○ Table yang dibentuk menggunakan query dan hasil akan
ditampilkan berupa tabel fisik
9. Review Case
View Table
1. Click Save > Save view
2. Setup Project and Dataset then fill name of table
3. Save
1
2
3
10. Review Case
Fact Table
1. Add query with:
create or replace table loyal-surfer-321414.Testing.data_mart
as
1. Run
11. Review Case
Hashing
Proses menghasilkan fixed-size output, dari variable-sized input yang dilakukan melalui
penggunaan rumus matematika yang dikenal sebagai hash function. Setiap aset kripto
menggunakan berbagai algoritma hashing yang berbeda untuk membuat berbagai jenis kode
hash – algoritma ini bertugas untuk menghasilkan alfanumerik acak
Contoh:
reference
12. Review Case
Step 1 - Check Definition
(Not Yet Metadata)
SELECT
order_status,
MAX(order_purchase_timestamp)
purchase_date,
MAX(order_approved_at) approve_date,
MAX(order_delivered_carrier_date)
pickup_date_by_courir,
MAX(order_delivered_customer_date)
delivered_date
FROM
`loyal-surfer-
321414.Testing.data_mart_purchase` --
custom with your dataset
GROUP BY
1
Metadata:
data that provides information about other data but not the content of the data, such as the text
of a message or the image itself. There are many distinct types of metadata,
including: Descriptive metadata – the descriptive information about a resource. It is used for
discovery and identification.
16. Review Case
Hands On
Step 3 - Deep Dive with our assumptions
WITH
summary_boxplot AS (
SELECT
APPROX_QUANTILES(total_product_price, 4)[
OFFSET
(2)] q2_gmv,
APPROX_QUANTILES(total_product_price, 4)[
OFFSET
(3)] - APPROX_QUANTILES(total_product_price, 4)[
OFFSET
(1)] iqr
FROM
`loyal-surfer-321414.Testing.data_mart_purchase_remake`
WHERE
order_status='delivered' ),
summary_batas AS (
SELECT
q2_gmv+1.5*iqr batas_atas,
q2_gmv-1.5*iqr batas_bawah
FROM
summary_boxplot ),
fact_tagging_anomaly AS (
SELECT
*,
20. Review Case
Hands On
Coba
1. Lengkapi slide 17 untuk bikin 3 kelompok berdasarkan pendekatan boxplot (didalam
whisker pembelanjaan normal, pembelanjaan outlier dan pembelanjaan extreme)
2. Bikin summary estimasi interval untuk rata-rata gmv dan kasih tagging mana transaksi
yang diluar LCL dan UCL
3. Bandingkan taggingan menggunakan boxplot dan pendekatan distribusi normal,
simpulkan hasilnya
4. Lengkapi slide 20 untuk summary yang bisa diambil dari data olist