1_MySQL_20220307_0328.pptx

國立臺北護理健康大學 NTUNHS
ML (ETL) on Database
Orozco Hsu
2022-03-07~2022-03-28
1

About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2

Tutorial
Content
3
MLOPS
ETL on using Python
作業
環境準備與SQL練習

Code
• Download code
• https://github.com/orozcohsu/ntunhs_2022_01.git
• Folder
• 20220307_0328_mis_master
4

安裝開發環境與套件
• Miniconda on Windows
• https://docs.conda.io/en/latest/miniconda.html
• Create an python (3.8) env for practice
• Install packages
• pip install jupyterlab
• pip install mysql-connector-python
• pip install SQLAlchemy
• pip install pymysql
• pip install pandas
5

資料庫連線
• 下載 C++ 2019 Redistributable
• https://docs.microsoft.com/en-
us/cpp/windows/latest-supported-vc-
redist?view=msvc-170
• 下載 MySQL Workbench
• https://dev.mysql.com/downloads/wor
kbench/
• 參考
• https://drive.google.com/file/d/1OmVh
ZSeTenC0p6kXlmebUbjMttMO0Iqb/vie
w?usp=sharing
6

資料庫連線
• 開啟 MySQL Workbench，新增資料庫連線
7

資料庫連線
8
連線名稱
連線資訊
資料庫名稱

資料庫連線
9
• 使用臨時mysql資料庫
• 2022-03-06 ~ 2022-04-05
• 連線主機
• 172.105.215.62
• 連線帳號
• db
• 連線密碼
• 20220306!
連線名稱
連線資訊

建立資料庫
• 匯入北風資料庫
10

北風資料庫 Schema
11
參考: https://docs.yugabyte.com/latest/sample-data/northwind/

SQL 練習
• SQL 語法練習
• https://www.w3schools.com/sql/
12

SQL 練習
• 查詢 1995-10-11 ~ 1996-07-01 出口國家的訂單筆數?
• 查詢所有銷售紀錄中，銷售業績最好的前三名業務?
13
select shipcountry,count(*) from orders
where orderdate between date('1995-10-11') and date('1996-07-01')
group by 1
select sum(sales_price) as total_sales, firstname from (
select c.firstname,sales_price from orders a
join (select orderid, unitprice*quantity as sales_price from `order details`) b
on a.orderid=b.orderid
join employees c
on a.employeeid=c.employeeid
)d
group by 2
order by 1 desc

SQL 練習
• 查詢 1996整年度訂單銷售當中，依照分類(CategoryName)計算
銷售金額
14
select categoryname, sum(sales_price) as total_sales from (
select a.orderid, c.productid, b.unitprice*quantity as sales_price, categoryname from (
select orderid from orders
where year(orderdate)=1996
)a
join `order details` b
on a.orderid=b.orderid
join products c
on b.productid=c.productid
join categories d
on c.categoryid=d.categoryid
)e
group by 1
order by 2

經典的資料流程
15
參考: https://www.semanticscholar.org/paper/Role-of-Machine-Learning-in-ETL-Automation-Mondal-Biswas/de85876afc9a2cf3051ac3c47d027325b269d8ee

MLOPS
• 進行機器學習時，經常遇到的問題
• 煩惱如何更有效地促進資料科學家、軟體和維運工程師之間的合作？
找尋如何穩定地研發與交付機器學習模型服務？
找尋如何持續維持機器學習模型服務效能的方式？
16

MLOPS
• 為 DevOps 方法論的擴展，並將機器學習和數據科學資產作為
DevOps 生態納入其中
• 將 ML 產品服務開發、佈署及維運過程中，開發階段能持續整合、
佈署能實現持續佈署，更藉由 ML Pipeline 縮短溝通成本，讓資
料科學家能自動化並在生產系統中獲得寶貴的見解，讓營運團隊
提供可再現性、可見性及託管服務和計算支援
• 讓處理資料更系統化，不再將資料處理任務停留在是骯髒、臨時
起意、靠經驗的印象，而是採更系統化的方式看待整個工作流程，
符合以系統化看待以資料為中心的 AI
17

MLOPS
18
參考: https://www.cpht.pro/blog/blog-post-12/

MLOPS
• 透過 DevOps 的優良實務作法來提升模型服務交付的水準
19
參考: https://proceedings.neurips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf

延伸閱讀
• 吳恩達對於資料與模型的見解，提出 Data-centric AI 方法論
• https://datacentricai.org/
20

延伸閱讀
• Dagster
• A ETL tool
• The data orchestration platform built for productivity
• https://dagster.io/
• https://github.com/dagster-io/dagster
21
Apache-2.0 License
支援各大雲端環境
版本持續更新
最新版本0.14.1:更新日期2022-02-20

Python基本介紹
• Jupyter lab 基本介紹
22

Python基本介紹
• 開啟一個新的 notebook
• import 套件
• 變數指派
• 列印變數
23

ETL on using Python
• ETL 常見的資料整理方式
• drop table if exists
• create temp table
• drop temp table
• select * from table into table
• create table as select
• table join
• select * from table where column in
24

ETL on using Python
• 讀取本地端 csv 檔案，透過 Python 進行資料整理，寫入資料庫
• 讀取 MySQL 上的資料表，經過資料整理轉換後，再寫入資料庫
25
etl.ipynb

作業
• 練習 SQL 基本語法
• 練習 python 資料處理的方法
• 進階: 利用 python的方法，進行 page 13~page14 找出答案
26

1_MySQL_20220307_0328.pptx

Recommended

Recommended

More Related Content

Similar to 1_MySQL_20220307_0328.pptx

Similar to 1_MySQL_20220307_0328.pptx (20)

More from FEG

More from FEG (20)

1_MySQL_20220307_0328.pptx