SlideShare a Scribd company logo
1 of 37
Download to read offline
Proprietary + ConfidentialProprietary + Confidential
只要會SQL就能做Machine Learning?
BigQuery ML簡介
Aaron Lee
aaronlee@mitac.com.tw
李東霖 Aaron
現職
● 神通資訊科技Google 解決方案顧問
● Qlik、Sophos產品經理
經歷
● Google Apps認證
● Google雲端平台架構師
● PMP專案管理師
● SAP MM顧問
● Oracle OCP認證
演講/授課經驗
專案管理師協會、靜宜大學、前川科技、毅太科技、水利署、國防大學、玉山銀行、神通
資訊科技、國際演講協會、桃園巿稅務局、外貿協會、亞東氣體
Why BigQuery ML?
用SQL就可以建立和執行ML Model,並且做出預測,讓SQL使用者可以用現有工具
加速開發,不用搬移資料,不用費時建立TensorFlow,讓Machine Learning普及化。
BigQuery ML GA了!!
結果......
Objectives
● 用sample data建立一個模型,它會預測電商訪客是否下單
● 用 CREATE MODEL 語法 建立二元迴歸 (是否)
● 用 ML.EVALUATE 語法 評估ML Model
● 用 ML.PREDICT 語法 做預測
Always free usage limits
Resource Monthly Free Usage Limits Details
Storage The first 10 GB per month is free. BigQuery ML models and training data stored in BigQuery are included in the
storage free tier.
Queries
(analysis)
The first 1 TB of query data processed
per month is free.
Queries that use BigQuery ML prediction, inspection, and evaluation functions
are included in the analysis free tier. BigQuery ML queries that contain CREATE
MODEL statements are not.
Flat-rate pricing is also available for high-volume customers that prefer a stable,
monthly cost.
BigQuery ML
CREATE MODEL
queries
The first 10 GB of data processed by
queries that contain CREATE MODEL
statements per month is free.
BigQuery ML CREATE MODEL queries are independent of the BigQuery analysis
free tier.
美國價格,但是......
台灣價格
原始資料:電商使用者與是否下單
一、建立Dataset “4bqml_tutorial” (用新的UI)
地點選擇United States
On the Create dataset page:
● For Dataset ID, enter bqml_tutorial .
● For Data location, choose United
States (US). Currently, the public
datasets are stored in the US
multi-region location. For simplicity, you
should place your dataset in the same
location.
On the Create dataset page:
● For Dataset ID, enter bqml_tutorial .
● For Data location, choose United
States (US). Currently, the public
datasets are stored in the USmulti-region
location. For simplicity, you should place
your dataset in the same location.
● Leave all of the other default settings in
place and click Create dataset.
二、建立模型
#standardSQL
CREATE MODEL `bqml_tutorial.sample_model`
OPTIONS(model_type='logistic_reg') AS
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label,
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(geoNetwork.country, "") AS country,
IFNULL(totals.pageviews, 0) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20170630'
等很久......
BigQuery ML可用的模型類別
● 線性迴歸 linear_reg
● 二元邏輯迴歸 logistic_reg
● 多分類邏輯迴歸 logistic_reg
● K-means分群 kmeans
https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create
三、取得訓練結果
四、評估模型 Evaluate your model
#standardSQL
SELECT * FROM
ML.EVALUATE(MODEL `bqml_tutorial.sample_model`, (
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label,
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(geoNetwork.country, "") AS country,
IFNULL(totals.pageviews, 0) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
Data set
http://www.cs.nthu.edu.tw/~shwu/courses/ml/labs/08_CV_Ensembling/fig-holdout.png
評估結果
When the query is complete, click the Results tab below the query text area. The results should look like the following:
欄位說明
Because you performed a logistic regression, the results include the following columns:
● precision — A metric for classification models. Precision identifies the frequency with which a model was correct
when predicting the positive class. 準確度
● recall — A metric for classification models that answers the following question: Out of all the possible positive
labels, how many did the model correctly identify? 召回度
● accuracy — Accuracy is the fraction of predictions that a classification model got right. 明確度
● f1_score — A measure of the accuracy of the model. The f1 score is the harmonic average of the precision and
recall. An f1 score's best value is 1. The worst value is 0.
● log_loss — The loss function used in a logistic regression. This is the measure of how far the model's predictions
are from the correct labels.
● roc_auc — The area under the ROC curve. This is the probability that a classifier is more confident that a randomly
chosen positive example is actually positive than that a randomly chosen negative example is positive. For more
information, see Classification in the Machine Learning Crash Course.
欄位說明 公式1
Because you performed a logistic regression, the results include the following
columns:
● precision — 準確度 TP / (TP + FP) 在判斷出來為為陽性的個體中,被正確
判斷為陽性之比率
● recall — 召回度 TP / (TP + FN),代表在所有實際為陽性的個體中,被正確
判斷為陽性之比率,例如下單的人當中,被正確預測會下單的比率
● accuracy — TN / (TN + FP),在所有實際為陰性的個體中,被正確判斷為陰性
之比率
● f1_score — Precision 跟 Recall 的調和平均數
Confusion matrix 混淆矩陣
https://www.ycc.idv.tw/confusion-matrix.html
Confusion matrix 混淆矩陣例子:愛滋病預測
True condition 真實情況
True 有愛滋 False 沒愛滋
Predicted Outcome
預測結果
Yes 有愛滋,驗出有愛滋
True Positive
TP
沒愛滋,驗出有愛滋
False Positive
FP
No 有愛滋,沒驗出有愛滋
False Negative
FN
沒愛滋,沒驗出有愛滋
True Negagive
TN
Confusion matrix 混淆矩陣例子:愛滋病預測
True condition 真實情況
True 有愛滋 100人 False 沒愛滋 9900人
Predicted Outcome
預測結果
Yes 有愛滋,驗出有愛滋
True Positive
TP
0人
沒愛滋,驗出有愛滋
False Positive
FP
0人
No 有愛滋,沒驗出有愛滋
False Negative
FN
100人
沒愛滋,沒驗出有愛滋
True Negagive
TN
9900人
假設10000人檢測,模型為:全部的人都沒愛滋
Because you performed a logistic regression, the results include the following
columns:
● precision — 準確度 9900 / (9900 + 100),99%
● accuracy — 精確度 0 ⇒ 準備度悖論
● Recall - 召回率 0 / ( 0 + 100 ) = 0
● 準確度高沒有用,重點是要驗出有愛滋病的人
計算結果
混淆矩陣用在這個例子:User是否下單
True condition
True 真的有下單 False 沒有下單
Predicted
Outcome
Yes
模型預測會下單
會下單,模型預測會下單
True Positive
TP
不會下單,模型預測會下單
False Positive
FP
No
模型預測不會下
單
會下單,模型預測不會下單
False Negative
FN
不會下單,模型預測不會下單
True Negagive
TN
欄位說明 公式1
Because you performed a logistic regression, the results include the following
columns:
● precision — 準確度 TP / (TP + FP),所有個體中,被正確判斷為陽性之比
率
● recall — 召回度 TP / (TP + FN),代表在所有實際為陽性的個體中,被正確
判斷為陽性之比率,例如下單的人當中,被正確預測會下單的比率
● accuracy — TN / (TN + FP),在所有實際為陰性的個體中,被正確判斷為陰性
之比率
● f1_score — Precision 跟 Recall 的調和平均數
欄位說明 公式2
● log_loss — The loss function used in a logistic regression. This is the
measure of how far the model's predictions are from the correct labels.
預測結結果接近真實數據的程度
欄位說明 公式3
● roc_auc — The area under the ROC curve. This is the probability that a
classifier is more confident that a randomly chosen positive example is
actually positive than that a randomly chosen negative example is positive.
For more information, see Classification in the Machine Learning Crash
Course.
AUC=0.5 (no discrimination 無鑑別力)
0.7≦AUC≦0.8 (acceptable discrimination 可接受的鑑別力)
0.8≦AUC≦0.9 (excellent discrimination 優良的鑑別力)
0.9≦AUC≦1.0 (outstanding discrimination 極佳的鑑別力)
五、用模型預測結果 by country
#standardSQL
SELECT
country, SUM(predicted_label) as total_predicted_purchases
FROM
ML.PREDICT(MODEL `bqml_tutorial.sample_model`, (
SELECT
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(totals.pageviews, 0) AS pageviews,
IFNULL(geoNetwork.country, "") AS country
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
GROUP BY country
ORDER BY total_predicted_purchases DESC LIMIT 10
執行結果
六、預測每個user的購買
#standardSQL
SELECT fullVisitorId, SUM(predicted_label) as total_predicted_purchases
FROM
ML.PREDICT(MODEL `bqml_tutorial.sample_model`, ( SELECT
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(totals.pageviews, 0) AS pageviews,
IFNULL(geoNetwork.country, "") AS country,
fullVisitorId
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
GROUP BY fullVisitorId
ORDER BY total_predicted_purchases DESC
LIMIT 10
預測結果
結論
● 你只要會SQL語法就可以用
● 語法簡單,可立即實作
● 資料放美國
參考資料
BigQuery Start
https://cloud.google.com/bigquery/docs/bigqueryml-analyst-start
Machine Learning Crash Course
https://developers.google.com/machine-learning/crash-course/
Proprietary + ConfidentialProprietary + Confidential
Thank you
Aaron Lee
aaronlee@mitac.com.tw

More Related Content

Similar to 20190424 只要會SQL就能做Machine Learning? BigQuery ML簡介

Scaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireDatabricks
 
How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversionsSudeep Shukla
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet SentimentLucinda Linde
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperJames by CrowdProcess
 
Applying data science to sales pipelines -- for fun and profit
Applying data science to sales pipelines -- for fun and profitApplying data science to sales pipelines -- for fun and profit
Applying data science to sales pipelines -- for fun and profitAndy Twigg
 
Applying Data Science - for Fun and Profit
Applying Data Science - for Fun and ProfitApplying Data Science - for Fun and Profit
Applying Data Science - for Fun and ProfitC9 Inc
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIAI Frontiers
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1BigML, Inc
 
Resume_Clasification.pptx
Resume_Clasification.pptxResume_Clasification.pptx
Resume_Clasification.pptxMOINDALVS
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideMegan Verbakel
 
Using Machine Learning on AWS for Continuous Sentiment Analysis from Labeling...
Using Machine Learning on AWS for Continuous Sentiment Analysis from Labeling...Using Machine Learning on AWS for Continuous Sentiment Analysis from Labeling...
Using Machine Learning on AWS for Continuous Sentiment Analysis from Labeling...Amazon Web Services
 
Supervised learning
Supervised learningSupervised learning
Supervised learningJohnson Ubah
 
The 4 Machine Learning Models Imperative for Business Transformation
The 4 Machine Learning Models Imperative for Business TransformationThe 4 Machine Learning Models Imperative for Business Transformation
The 4 Machine Learning Models Imperative for Business TransformationRocketSource
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaRahul Bhatia
 
Weak Supervision.pdf
Weak Supervision.pdfWeak Supervision.pdf
Weak Supervision.pdfStephenLeo7
 
Creating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning AlgorithmCreating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning AlgorithmBill Fite
 

Similar to 20190424 只要會SQL就能做Machine Learning? BigQuery ML簡介 (20)

Scaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With Luminaire
 
How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversions
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet Sentiment
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 
Applying data science to sales pipelines -- for fun and profit
Applying data science to sales pipelines -- for fun and profitApplying data science to sales pipelines -- for fun and profit
Applying data science to sales pipelines -- for fun and profit
 
Applying Data Science - for Fun and Profit
Applying Data Science - for Fun and ProfitApplying Data Science - for Fun and Profit
Applying Data Science - for Fun and Profit
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AI
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
 
Resume_Clasification.pptx
Resume_Clasification.pptxResume_Clasification.pptx
Resume_Clasification.pptx
 
1000 track2 Bharadwaj
1000 track2 Bharadwaj1000 track2 Bharadwaj
1000 track2 Bharadwaj
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's Guide
 
Using Machine Learning on AWS for Continuous Sentiment Analysis from Labeling...
Using Machine Learning on AWS for Continuous Sentiment Analysis from Labeling...Using Machine Learning on AWS for Continuous Sentiment Analysis from Labeling...
Using Machine Learning on AWS for Continuous Sentiment Analysis from Labeling...
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
ML in Android
ML in AndroidML in Android
ML in Android
 
The 4 Machine Learning Models Imperative for Business Transformation
The 4 Machine Learning Models Imperative for Business TransformationThe 4 Machine Learning Models Imperative for Business Transformation
The 4 Machine Learning Models Imperative for Business Transformation
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 
Weak Supervision.pdf
Weak Supervision.pdfWeak Supervision.pdf
Weak Supervision.pdf
 
Creating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning AlgorithmCreating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning Algorithm
 

Recently uploaded

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Recently uploaded (20)

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 

20190424 只要會SQL就能做Machine Learning? BigQuery ML簡介

  • 1. Proprietary + ConfidentialProprietary + Confidential 只要會SQL就能做Machine Learning? BigQuery ML簡介 Aaron Lee aaronlee@mitac.com.tw
  • 2.
  • 3. 李東霖 Aaron 現職 ● 神通資訊科技Google 解決方案顧問 ● Qlik、Sophos產品經理 經歷 ● Google Apps認證 ● Google雲端平台架構師 ● PMP專案管理師 ● SAP MM顧問 ● Oracle OCP認證 演講/授課經驗 專案管理師協會、靜宜大學、前川科技、毅太科技、水利署、國防大學、玉山銀行、神通 資訊科技、國際演講協會、桃園巿稅務局、外貿協會、亞東氣體
  • 4. Why BigQuery ML? 用SQL就可以建立和執行ML Model,並且做出預測,讓SQL使用者可以用現有工具 加速開發,不用搬移資料,不用費時建立TensorFlow,讓Machine Learning普及化。
  • 7. Objectives ● 用sample data建立一個模型,它會預測電商訪客是否下單 ● 用 CREATE MODEL 語法 建立二元迴歸 (是否) ● 用 ML.EVALUATE 語法 評估ML Model ● 用 ML.PREDICT 語法 做預測
  • 8. Always free usage limits Resource Monthly Free Usage Limits Details Storage The first 10 GB per month is free. BigQuery ML models and training data stored in BigQuery are included in the storage free tier. Queries (analysis) The first 1 TB of query data processed per month is free. Queries that use BigQuery ML prediction, inspection, and evaluation functions are included in the analysis free tier. BigQuery ML queries that contain CREATE MODEL statements are not. Flat-rate pricing is also available for high-volume customers that prefer a stable, monthly cost. BigQuery ML CREATE MODEL queries The first 10 GB of data processed by queries that contain CREATE MODEL statements per month is free. BigQuery ML CREATE MODEL queries are independent of the BigQuery analysis free tier.
  • 13. 地點選擇United States On the Create dataset page: ● For Dataset ID, enter bqml_tutorial . ● For Data location, choose United States (US). Currently, the public datasets are stored in the US multi-region location. For simplicity, you should place your dataset in the same location. On the Create dataset page: ● For Dataset ID, enter bqml_tutorial . ● For Data location, choose United States (US). Currently, the public datasets are stored in the USmulti-region location. For simplicity, you should place your dataset in the same location. ● Leave all of the other default settings in place and click Create dataset.
  • 14. 二、建立模型 #standardSQL CREATE MODEL `bqml_tutorial.sample_model` OPTIONS(model_type='logistic_reg') AS SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'
  • 16. BigQuery ML可用的模型類別 ● 線性迴歸 linear_reg ● 二元邏輯迴歸 logistic_reg ● 多分類邏輯迴歸 logistic_reg ● K-means分群 kmeans https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create
  • 18. 四、評估模型 Evaluate your model #standardSQL SELECT * FROM ML.EVALUATE(MODEL `bqml_tutorial.sample_model`, ( SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
  • 20. 評估結果 When the query is complete, click the Results tab below the query text area. The results should look like the following:
  • 21. 欄位說明 Because you performed a logistic regression, the results include the following columns: ● precision — A metric for classification models. Precision identifies the frequency with which a model was correct when predicting the positive class. 準確度 ● recall — A metric for classification models that answers the following question: Out of all the possible positive labels, how many did the model correctly identify? 召回度 ● accuracy — Accuracy is the fraction of predictions that a classification model got right. 明確度 ● f1_score — A measure of the accuracy of the model. The f1 score is the harmonic average of the precision and recall. An f1 score's best value is 1. The worst value is 0. ● log_loss — The loss function used in a logistic regression. This is the measure of how far the model's predictions are from the correct labels. ● roc_auc — The area under the ROC curve. This is the probability that a classifier is more confident that a randomly chosen positive example is actually positive than that a randomly chosen negative example is positive. For more information, see Classification in the Machine Learning Crash Course.
  • 22. 欄位說明 公式1 Because you performed a logistic regression, the results include the following columns: ● precision — 準確度 TP / (TP + FP) 在判斷出來為為陽性的個體中,被正確 判斷為陽性之比率 ● recall — 召回度 TP / (TP + FN),代表在所有實際為陽性的個體中,被正確 判斷為陽性之比率,例如下單的人當中,被正確預測會下單的比率 ● accuracy — TN / (TN + FP),在所有實際為陰性的個體中,被正確判斷為陰性 之比率 ● f1_score — Precision 跟 Recall 的調和平均數
  • 24. Confusion matrix 混淆矩陣例子:愛滋病預測 True condition 真實情況 True 有愛滋 False 沒愛滋 Predicted Outcome 預測結果 Yes 有愛滋,驗出有愛滋 True Positive TP 沒愛滋,驗出有愛滋 False Positive FP No 有愛滋,沒驗出有愛滋 False Negative FN 沒愛滋,沒驗出有愛滋 True Negagive TN
  • 25. Confusion matrix 混淆矩陣例子:愛滋病預測 True condition 真實情況 True 有愛滋 100人 False 沒愛滋 9900人 Predicted Outcome 預測結果 Yes 有愛滋,驗出有愛滋 True Positive TP 0人 沒愛滋,驗出有愛滋 False Positive FP 0人 No 有愛滋,沒驗出有愛滋 False Negative FN 100人 沒愛滋,沒驗出有愛滋 True Negagive TN 9900人 假設10000人檢測,模型為:全部的人都沒愛滋
  • 26. Because you performed a logistic regression, the results include the following columns: ● precision — 準確度 9900 / (9900 + 100),99% ● accuracy — 精確度 0 ⇒ 準備度悖論 ● Recall - 召回率 0 / ( 0 + 100 ) = 0 ● 準確度高沒有用,重點是要驗出有愛滋病的人 計算結果
  • 27. 混淆矩陣用在這個例子:User是否下單 True condition True 真的有下單 False 沒有下單 Predicted Outcome Yes 模型預測會下單 會下單,模型預測會下單 True Positive TP 不會下單,模型預測會下單 False Positive FP No 模型預測不會下 單 會下單,模型預測不會下單 False Negative FN 不會下單,模型預測不會下單 True Negagive TN
  • 28. 欄位說明 公式1 Because you performed a logistic regression, the results include the following columns: ● precision — 準確度 TP / (TP + FP),所有個體中,被正確判斷為陽性之比 率 ● recall — 召回度 TP / (TP + FN),代表在所有實際為陽性的個體中,被正確 判斷為陽性之比率,例如下單的人當中,被正確預測會下單的比率 ● accuracy — TN / (TN + FP),在所有實際為陰性的個體中,被正確判斷為陰性 之比率 ● f1_score — Precision 跟 Recall 的調和平均數
  • 29. 欄位說明 公式2 ● log_loss — The loss function used in a logistic regression. This is the measure of how far the model's predictions are from the correct labels. 預測結結果接近真實數據的程度
  • 30. 欄位說明 公式3 ● roc_auc — The area under the ROC curve. This is the probability that a classifier is more confident that a randomly chosen positive example is actually positive than that a randomly chosen negative example is positive. For more information, see Classification in the Machine Learning Crash Course. AUC=0.5 (no discrimination 無鑑別力) 0.7≦AUC≦0.8 (acceptable discrimination 可接受的鑑別力) 0.8≦AUC≦0.9 (excellent discrimination 優良的鑑別力) 0.9≦AUC≦1.0 (outstanding discrimination 極佳的鑑別力)
  • 31. 五、用模型預測結果 by country #standardSQL SELECT country, SUM(predicted_label) as total_predicted_purchases FROM ML.PREDICT(MODEL `bqml_tutorial.sample_model`, ( SELECT IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(totals.pageviews, 0) AS pageviews, IFNULL(geoNetwork.country, "") AS country FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801')) GROUP BY country ORDER BY total_predicted_purchases DESC LIMIT 10
  • 33. 六、預測每個user的購買 #standardSQL SELECT fullVisitorId, SUM(predicted_label) as total_predicted_purchases FROM ML.PREDICT(MODEL `bqml_tutorial.sample_model`, ( SELECT IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(totals.pageviews, 0) AS pageviews, IFNULL(geoNetwork.country, "") AS country, fullVisitorId FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801')) GROUP BY fullVisitorId ORDER BY total_predicted_purchases DESC LIMIT 10
  • 36. 參考資料 BigQuery Start https://cloud.google.com/bigquery/docs/bigqueryml-analyst-start Machine Learning Crash Course https://developers.google.com/machine-learning/crash-course/
  • 37. Proprietary + ConfidentialProprietary + Confidential Thank you Aaron Lee aaronlee@mitac.com.tw