2. About me
• Engineering Team Lead @ Nilvana - Computer Vision & ML Applications
• BS, MS, PhD in Computer Science
• Ph.D thesis in Machine Learning
• Previously
• Tech Lead @ ADLINK IoT Team
• Software Product Manager @ Delta Research Center
• Engineering Manager @ Networking Company
3. 6
2
3
Case Study
以清華
大
學開發的 AI 輔助親權
判決預測系統為例
Hyper Cycle
什麼是技術炒作週期
Critical Thinking
AI 相關事件探討與資料科學家
的
日
常
4
5
1
ML & NN
機器學習與類神經網路快速入
門
NLP
自
然語
言
處理的流程與困難點
Moravec’s Paradox
AI 問題的難易與我們常識相佐
Agenda
4. 莫拉維克悖論 (Moravec’s Paradox)
In the 60s, Marvin Minsky assigned
a couple of undergrads to spend
the summer programming a
computer to use a camera to
identify objects in a scene. He
fi
gured they'd have the problem
solved by the end of the summer.
Half a century later, we're still
working on it.
[1]
28. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
29. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
30. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
“If I study more, will I
get a higher grade?”
1. Start with a Question
31. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
“If I study more, will I get a higher grade?”
32. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
2. Get & Clean Data
“If I study more, will I get a higher grade?”
Student Hour Studies Grade
Alice 20 90
Bob 5 70
Charlie 10 96
David 15 82
Eve two 62
Frank 16 87
Grace 22 998
33. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
2. Get & Clean Data
“If I study more, will I get a higher grade?”
Student Hour Studies Grade
Alice 20 90
Bob 5 70
Charlie 10 96
David 15 82
Eve two 62
Frank 16 87
Grace 22 998
98
2
34. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
“If I study more, will I get a higher grade?”
35. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
“If I study more, will I get a higher grade?”
3. Perform EDA
Grade
0
25
50
75
100
Hours Studied
0 5 10 15 20 25
36. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
“If I study more, will I get a higher grade?”
3. Perform EDA
Grade
0
25
50
75
100
Hours Studied
0 5 10 15 20 25
Finding #1. The more you study, the
higher grade you will get.
37. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
“If I study more, will I get a higher grade?”
3. Perform EDA
Grade
0
25
50
75
100
Hours Studied
0 5 10 15 20 25
Finding #1. The more you study, the
higher grade you will get.
Finding #2. Also, Charlie is a smarty pants.
38. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
“If I study more, will I get a higher grade?”
Grade
0
25
50
75
100
Hours Studied
0 5 10 15 20 25
39. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
“If I study more, will I get a higher grade?”
Grade
0
25
50
75
100
Hours Studied
0 5 10 15 20 25
Linear Regression
Grade = 1.5*Hors + 65
40. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
“If I study more, will I get a higher grade?”
41. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
“If I study more, will I get a higher grade?”
Yes, there is a positive correlation between
the number of hours you study and the
grade you will get.
42. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
“If I study more, will I get a higher grade?”
Yes, there is a positive correlation between
the number of hours you study and the
grade you will get.
Speci
fi
cally, the relationship is: Grade = 1.5 * Hours + 65
So if you study 10 hours, you can expect to get an 80.
43. Data Science Work
fl
ow
1. Start with a Question
2. Get & Clean Data
3. Perform EDA
4. Apply Techniques
5. Share Insights
“If I study more, will I get a higher grade?”
Yes, there is a positive correlation between
the number of hours you study and the
grade you will get.
Speci
fi
cally, the relationship is: Grade = 1.5 * Hours + 65
So if you study 10 hours, you can expect to get an 80.
However, Charlie is a smarty pants and is in
fl
ating the grade
estimate. You’ll probably get slightly less then 80.
91. Text Processing
Coronavirus diseases are caused by viruses in
the coronavirus subfamily, a group of related RNA
viruses that cause diseases
in mammals and birds. In humans and birds, the
group of viruses cause respiratory tract
infections that can range from mild to lethal.
Coronavirus diseases
92. Text Processing
Coronavirus diseases are caused by viruses in
the coronavirus subfamily, a group of related RNA
viruses that cause diseases
in mammals and birds. In humans and birds, the
group of viruses cause respiratory tract
infections that can range from mild to lethal.
Coronavirus diseases <html>
<head>
<title>Coronavirus diseases</title>
</head>
<body>
<h1>Coronavirus diseases</h1>
<div style="text-align: center;">
<img src="Coronavirus.jpg" width="400" alt="Coronavirus
diseases">
</div>
<p><b>Coronavirus diseases</b> are caused by <a href="/wiki/Virus"
title="Virus">viruses</a> in the <a href="/wiki/Coronavirus"
title="Coronavirus">coronavirus</a> subfamily, a group of related <a
href="/wiki/Orthornavirae" title="Orthornavirae">RNA viruses</a> that
cause diseases in <a href="/wiki/Mammal"
title="Mammal">mammals</a> and <a href="/wiki/Bird"
title="Bird">birds</a>. In humans and birds, the group of viruses cause
<a href="/wiki/Respiratory_tract_infection" title="Respiratory tract
infection">respiratory tract infections</a> that can range from mild to
lethal. </p>
</body>
</html>
96. Text
Processing
Coronavirus diseases
Coronavirus diseases are caused by viruses in
the coronavirus subfamily, a group of related
RNA viruses that cause diseases in mammals
and birds. In humans and birds, the group of
viruses cause respiratory tract infections that
can range from mild to lethal.
108. Common Text Processing Steps
Sentence
Lowecasing
Removal of Punctuation
Removal of Stopwords
Stemming & Lemmatization
Text
Sentence
Tokenization
Sentences
[24]
111. “@Jamie went back to University[http://cdn.thu.edu.tw].”
jamie went back to university
Cleaning
112. “@Jamie went back to University[http://cdn.thu.edu.tw].”
jamie went back to university
jamie went back to university
Cleaning
Tokenize
113. “@Jamie went back to University[http://cdn.thu.edu.tw].”
jamie went back to university
jamie went back to university
jamie went university
Cleaning
Tokenize
Remove Stop Words
114. “@Jamie went back to University[http://cdn.thu.edu.tw].”
jamie went back to university
jamie went back to university
jamie went university
jamie go univers
Cleaning
Tokenize
Remove Stop Words
Stem / Lemmatize
127. Bag of Words
Little House on the Prairie
Mary had a Little Lamb
littl house prairi
mari littl lamb
128. Bag of Words
Little House on the Prairie
Mary had a Little Lamb
The Silence of the Lambs
littl house prairi
mari littl lamb
silenc lamb
129. Bag of Words
Little House on the Prairie
Mary had a Little Lamb
The Silence of the Lambs
Twinkle Twinkle Little Star
littl house prairi
mari littl lamb
silenc lamb
twinkl littl star
130. Bag of Words
Little House on the Prairie
Mary had a Little Lamb
Twinkle Twinkle Little Star
littl house prairi mari
lamb silenc twinkl star
Corpus (D) Vocabulary (V)
The Silence of the Lambs
131. Bag of Words
Little House on the Prairie
Mary had a Little Lamb
Twinkle Twinkle Little Star
The Silence of the Lambs
littl hous priairi mari lamb silenc twinkl star
132. Bag of Words
Little House on the Prairie
Mary had a Little Lamb
Twinkle Twinkle Little Star
The Silence of the Lambs
littl hous priairi mari lamb silenc twinkl star
1 1 1 0 0 0 0 0
1 0 0 1 1 0 0 0
0 0 0 0 1 1 0 0
1 0 0 0 0 0 2 1
Document-Term Matrix
179. Latent Dirichlet Allocation
Every document consists of a
mix of topics
bananas
kitten
ohio
kale
puppy
frog
cute
bananas
kitten
ohio
kale
puppy
frog
cute
Every topic consists of a
mix of words
doc1 doc2 doc3
topic1
topic2
183. [30, 31]
Autoencoder - Deepfake
Latent Face A
Latent Face B
Encoder
Decoder A
Decoder B
Original Face A Reconstructed Face A
Reconstructed Face B
Original Face B
184. [30, 31]
Autoencoder - Deepfake
Latent Face A
Latent Face B
Encoder
Decoder A
Decoder B
Original Face A Reconstructed Face A
Reconstructed Face B
Original Face B
Latent Face A
Encoder Decoder B Reconstructed Face B from A
Original Face A
215. References
[1] Ending with humour
[2] Dartmouth Workshop: The Birthplace Of AI
[3] 想投資科技股,先了解技術炒作週期
[4] 5 Trends Emerge in the Gartner Hype Cycle for Emerging Technologies, 2018
[5] 2 Megatrends Dominate the Gartner Hype Cycle for Arti
fi
cial Intelligence, 2020
[6] The relentless threat of arti
fi
cial intelligence taking our jobs away
[7] 繼IBM後,「Google健康」也開始裁員⋯醫療AI為什麼讓
大
廠
一
一
敗退?
[8] 為什麼AI醫療容易失敗?李建璋:最
大
原因可能在資料
[9] 法律AI公司Ross Intelligence倒闭:头顶是星空,脚下是薄冰
[10] This is what the evolution of self-driving cars looks like
[11] What is Deep Learning and Why you need it?
[12] How Scale is Enabling Deep Learning
[13] Scaling to Very Very Large Corpora for Natural Language Disambiguation
[14] The unreasonable e
ff
ectiveness of data
[15] Arti
fi
cial Intelligence vs. Machine Learning vs. Deep Learning: Essentials
[16]
人
工
智慧新
革
命--超級電腦「華
生
」
[17] What is Arti
fi
cial Intelligence?
216. [18] Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
[19] Natural Language Processing with Classi
fi
cation and Vector Spaces
[20] Use the People album in Photos on your iPhone, iPad, or iPod touch
[21] Neural network playground
[22] Neural Network 3D Simulation
[23] 機器學習基
石
[24] Practical Natural Language Processing
[25] 最火熱的AI應
用
:NLP在含
金
量最
高
的2個產業率先發光
[27] Machine Learning - Convolution with color images
[28] NLP的基本執
行
步驟(I)
[29] Every Machine Learning Algorithm Can Be Represented as a Neural Network
[30] Deep Learning for Deepfakes Creation and Detection: A Survey
[31] Deepfake
大
解密!「換臉」技術更簡單,到底怎麼辦到的?
[32] 維基百科 - 决策树
[33] General Ensemble Method
[34] The First Rule of Machine Learning: Start without Machine Learning
References