Your SlideShare is downloading. ×
0
Gogolook Confidential
Gogolook Confidential 
How Started?
Gogolook Confidential 
How Started?
Gogolook Confidential 
How Started?
Gogolook Confidential 
The Best App 
For 
identifying and blocking calls 
The Best App –LINE whoscall
Gogolook Confidential
Gogolook Confidential 
KeyFeatures
Gogolook Confidential 
★Instant Caller Identification 
LINE whoscallidentifies background information of incoming unknown ...
Gogolook Confidential 
★Database with 
over 600Million 
Phone Numbers 
LINE whoscallboasts an online database with over 60...
Gogolook Confidential 
Incoming Call Dialogue 
Incoming Call Dialogue 
Fraud Call 
Business 
Corporation 
Restaurant
Gogolook Confidential 
★Community Tag 
★Block unwanted calls & SMSs 
Contributions from the global user community has alwa...
Gogolook Confidential 
★World’s Largest 
Yellow Page Database 
★Offline Database 
Available for Free 
LINEwhoscallowns one...
Gogolook Confidential 
3 of every 5 strangers’ calls can be identified by LINE whoscall 
Over 400 million phone calls 
are...
Gogolook Confidential 
Market
Gogolook Confidential 
Honors
Gogolook Confidential 
What we will be…
Gogolook Confidential 
Vision
Gogolook Confidential 
資料科學在whoscall的應用 
GOGOLOOK資料科學家高義銘
Gogolook Confidential 
★日常生活經常會遇到的問題
Gogolook Confidential 
★人面對未知的事物就會有一種… 
我有一種不祥的預感!
Gogolook Confidential 
★坊間流傳著許多解決此問題的APPs 
小熊來電通知
Gogolook Confidential 
★坊間流傳著許多解決此問題的APPs 
小熊來電通知
Gogolook Confidential 
★Why whoscall? 
因為…他是連Google執行長都說讚的軟體! 
唉呦,讚喔
Gogolook Confidential 
whoscall是如何解決未知來電 的問題咧?
Gogolook Confidential 
★Technologies adopted 
1. Yellow pages: 
HiPage, Yelp, 
Zenrin… 
2. Google search 
3. Other sources...
Gogolook Confidential 
★Technologies adopted 
Technologies adopted 
4. 使用者回報與標記
Gogolook Confidential 
★Technologies adopted 
Technologies adopted 
4. 使用者回報與標記
Gogolook Confidential 
★whoscall, I have a problem… 
如果一個未知號碼,我們無 法從這些sources 去取得任 何資訊,那就GG 了嗎?
Gogolook Confidential 
★whoscall, I have a problem… 
如果一個未知號碼,我們無 法從這些sources 去取得任 何資訊,那就GG 了嗎? 
是的,GG然後洗洗睡…
Gogolook Confidential 
當然不能洗洗睡,要不然我站 在這邊幹嘛?
Gogolook Confidential 
★Problem we want to solve 
For an unknown phone number: 
• 
No google result 
• 
No user tag / repo...
Gogolook Confidential 
★Problem we want to solve 
For an unknown phone number: 
• 
No google result 
• 
No user tag / repo...
Gogolook Confidential 
★Problem we want to solve 
For an unknown phone number: 
• 
No google result 
• 
No user tag / repo...
Gogolook Confidential 
★Problem we want to solve 
For an unknown phone number: 
• 
No google result 
• 
No user tag / repo...
Gogolook Confidential 
★Problem we want to solve 
For an unknown phone number: 
• 
No google result 
• 
No user tag / repo...
Gogolook Confidential 
★Problem we want to solve 
For an unknown phone number: 
• 
No google result 
• 
No user tag / repo...
Gogolook Confidential 
★Scenario 
Scenario 
OO推銷 
小明 
小明妹 
小明哥 
?
Gogolook Confidential 
★We think it should work because… 
whoscalluserbase( = potential sensors) 
• 
> 10 million installa...
Gogolook Confidential 
Analysis procedures 
Analysis procedures 
1. 
Collect call logs 
2. 
Compare with user tags 
3. 
Ex...
Gogolook Confidential 
★Collect call logs 
• 
Recruit a group of voluntary whoscallusers as our sensors. 
• 
Collect phone...
Gogolook Confidential 
★User privacy 
 
User privacy is kept in the highest priority. 
 
Phone numbers are stored as one...
Gogolook Confidential 
Analysis procedures 
Analysis procedures 
1. 
Collect call logs 
2. 
Compare with user tags 
3. 
Ex...
Gogolook Confidential 
★List of user tags 
List of user tags 
一接就掛斷 
一打來就掛掉 
一接對方馬上掛斷 
一接就掛電話 
一接起來就掛斷電話 
一接起來,就說打錯 
一直傳廣告...
Gogolook Confidential 
★Compare with user tags 
• 
Compare these phone numbers with user reports from whoscalldatabase (封鎖...
Gogolook Confidential 
★Data summary 
Data summary 
推銷電話 
民調中心 
騷擾電話 
詐騙電話 
70% 
1% 
5% 
24% 
# Samples: 7854 
Normal: 400...
Gogolook Confidential 
Analysis procedures 
Analysis procedures 
1. 
Collect call logs 
2. 
Compare with user tags 
3. 
Ex...
Gogolook Confidential 
Normal numbers 
0 
5 
10 
15 
20 
Calls =195 (in 66, out 129) 
Opponents = 72 (in 21, out 58) 
★Nor...
Gogolook Confidential 
★Spam numbers 
Spam numbers 
0 
10 
20 
30 
Calls =471 (in 15, out 456) 
Opponents = 186 (in 11, ou...
Gogolook Confidential 
Analysis procedures 
Analysis procedures 
1. 
Collect call logs 
2. 
Compare with user tags 
3. 
Ex...
Gogolook Confidential 
★What is a feature? 
What is a feature? 
“Feature”is a measurable property of a phenomenon being ob...
Gogolook Confidential 
Example 
Or, we want to analyze a company, we can look at features: 
公司人數 
★Example
Gogolook Confidential 
Example 
Or, we want to analyze a company, we can look at features: 
工程師人數 
★Example
Gogolook Confidential 
Example 
Or, we want to analyze a company, we might look at features: 
公司裡面 Python工程師 的比例 
★Example
Gogolook Confidential 
Example 
Or, we want to analyze a company, we might look at features: 
公司向心力 
★Example
Gogolook Confidential 
Example 
Or, we want to analyze a company, we might look at features: 
CEO帥氣程度 
★Example
Gogolook Confidential 
Features for call patterns 
Ratio of out calls 
0.8 
0.6 
0.4 
0.2 
0.0 
Fraud 
Marketing 
Normal
Gogolook Confidential 
Features for call patterns 
Ratio of recurring opponents 
Fraud 
Marketing 
Normal 
0 
0.1 
0.2 
0....
Gogolook Confidential 
Features for call patterns 
Ratio of missed out calls 
Fraud 
Marketing 
Normal 
0.6 
0.5 
0.4 
0.3...
Gogolook Confidential 
Features for call patterns 
Ratio of working time calls 
Fraud 
Marketing 
Normal 
0.6 
0.5 
0.4 
0...
Gogolook Confidential 
Features for call patterns 
Median of call durations 
Fraud 
Marketing 
Normal 
50 
40 
30 
20 
10 ...
Gogolook Confidential 
Features for call patterns 
Ratio of out calls in contact book 
Fraud 
Marketing 
Normal 
0.10 
0 
...
Gogolook Confidential 
Analysis procedures 
Analysis procedures 
1. 
Collect call logs 
2. 
Compare with user tags 
3. 
Ex...
Gogolook Confidential 
Ratio of recurring components is less than 40% 
Ratio of out calls is more than 60% 
Ratio of in ca...
Gogolook Confidential 
★Problem 1 
Too many features…
Gogolook Confidential 
★Problem 2 
How to determine the rule?
Gogolook Confidential 
Machine learning 
★Solution
Gogolook Confidential 
Machine learning 
★Solution 
Let the machine learn from the data
Gogolook Confidential 
What is machine learning? 
★What is machine learning? 
機器學習是一種從過去的資料或經驗當中,構造一 個模型(Model),而學習(Learni...
Gogolook Confidential 
Machine learningtechniques for classification 
★Machine learning techniques for classification 
Sup...
Gogolook Confidential 
Support vector machine for binary classification 
★Support vector machine for binary classification
Gogolook Confidential 
Support vector machine for binary classification 
★Support vector machine for binary classification
Gogolook Confidential 
Support vector machine for binary classification 
★Support vector machine for binary classification
Gogolook Confidential 
Support vector machine for binary classification 
★Support vector machine for binary classification
Gogolook Confidential 
Support vector machine for binary classification 
★Support vector machine for binary classification
Gogolook Confidential 
Support vector machine for binary classification 
★Support vector machine for binary classification
Gogolook Confidential 
Support vector machine for binary classification 
★Support vector machine for binary classification
Gogolook Confidential 
Support vector machine for binary classification 
★Support vector machine for binary classification
Gogolook Confidential 
這樣就夠了嗎?
Gogolook Confidential 
Real-life scenario 
★Real-life scenario 
When will we require a spam number prediction? 
Ans: The t...
Gogolook Confidential 
Real-life scenario 
Time 
#recent calls 
Victim 1 
Victim 2 
Victim 3 
XX推銷 
★Real-life scenario 
推...
Gogolook Confidential 
Let’s look at the performances of SVM under different numbers of recent calls
Gogolook Confidential 
SVM for binary classification 
★SVM for binary classification 
0.8 
0.85 
0.9 
0.95 
1.0 
3 
4 
5 
...
Gogolook Confidential 
嗯…表現的不錯,但是… 
可以再快一點嗎?
Gogolook Confidential 
Reduce the number of features 
★Reduce the number of features 
Features computation is time-consumi...
Gogolook Confidential 
Reduce the number of features 
★Reduce the number of features 
Features computation is time-consumi...
Gogolook Confidential 
Reduce the number of features 
★Reduce the number of features 
Features computation is time-consumi...
Gogolook Confidential 
Feature selection results 
★Feature selection results 
10 
15 
20 
25 
30 
3recent calls 
5recent c...
Gogolook Confidential 
Feature selection results 
★Feature selection results 
10 
15 
20 
25 
30 
3recent calls 
5recent c...
Gogolook Confidential 
Feature selection results 
★Feature selection results 
10 
15 
20 
25 
30 
3recent calls 
5recent c...
Gogolook Confidential 
Ratio of out calls 
Rate of out calls 
Ratio of out calls in contact book 
Ratio of reciprocal oppo...
Gogolook Confidential 
★Comparison of w/ and w/o feature selection 
0.8 
0.85 
0.9 
0.95 
1.0 
3 
4 
5 
6 
7 
8 
9 
10 
# ...
Gogolook Confidential 
Done? 
阿不就好棒棒?
Gogolook Confidential 
What is power? 
★What is power? 
Power of class A: The probability of accurately classify a class A...
Gogolook Confidential 
What is power? 
★What is power? 
Power of class A: The probability of accurately classify a class A...
Gogolook Confidential 
What is power? 
★What is power? 
Power of class A: The probability of accurately classify a class A...
Gogolook Confidential 
Power of our classifier 
★Power of our classifier 
0.8 
0.85 
0.9 
0.95 
1.0 
3 
4 
5 
6 
7 
8 
9 
...
Gogolook Confidential 
義銘, 
加油好嗎?
Gogolook Confidential 
★Data summary 
Data summary 
推銷電話 
民調中心 
騷擾電話 
詐騙電話 
70% 
1% 
5% 
24% 
# Samples: 7854 
Normal: 400...
Gogolook Confidential 
★Data summary 
Data summary 
推銷電話 
民調中心 
騷擾電話 
詐騙電話 
70% 
1% 
5% 
24% 
# Samples: 7854 
Normal: 400...
Gogolook Confidential 
Marketing numbers vs. normal numbers 
★Marketing numbers vs. normal numbers 
0.8 
0.85 
0.9 
0.95 
...
Gogolook Confidential 
Fraud numbers vs. normal numbers 
★Fraud numbers vs. normal numbers 
0.8 
0.85 
0.9 
0.95 
1.0 
3 
...
Gogolook Confidential 
一種 
摻在一起做撒尿牛丸的概念…
Gogolook Confidential 
Power of SVMfor multi-classification 
★Power of SVM for multi-classification 
0.8 
0.85 
0.9 
0.95 ...
Gogolook Confidential 
Power of SVM for binary classification 
★Power of SVM for binary classification 
0.8 
0.85 
0.9 
0....
Gogolook Confidential 
What is type I error rate? 
★What is type I error rate? 
Type I error: The probability of misclassi...
Gogolook Confidential 
What is type I error rate? 
★What is type I error rate? 
Type I error: The probability of misclassi...
Gogolook Confidential 
Type I error comparison 
★Type I error comparison 
0 
0.05 
0.1 
0.15 
0.3 
3 
4 
5 
6 
7 
8 
9 
10...
Gogolook Confidential 
這點小成果讓我稍稍放鬆地去逛街,突然電 話響一聲,我開心地接了起來…
Gogolook Confidential 
結果,對方掛斷了
Gogolook Confidential 
響一聲掛斷的惡意電話 
★響一聲掛斷的惡意電話 
 
“響一聲掛斷”(one-ring call) 是一種引誘接電話 者回撥的惡意電話,通常伴隨著高額付款電話。 
 
於是我們先觀察“響一聲掛斷...
Gogolook Confidential 
Call patterns of one-ring calls 
★Call patterns of one-ring calls 
Numbers 
Mean duration of ringin...
Gogolook Confidential 
Feature comparison 
Ratio of new opponents 
Fraud 
Marketing 
Normal 
One-ring 
0 
0.2 
0.4 
0.6 
0...
Gogolook Confidential 
Feature comparison 
Ratio of in calls 
0 
0.1 
0.2 
0.3 
0.4 
0.5 
Fraud 
Marketing 
Normal 
One-ri...
Gogolook Confidential 
Feature comparison 
Ratio of missed calls 
0 
0.2 
0.4 
0.6 
0.8 
Fraud 
Marketing 
Normal 
One-ring
Gogolook Confidential 
★Naïve method 
Similarly, without machine learning we can design rules such as:
Gogolook Confidential 
★Naïve method 
Similarly, without machine learning we can design rules such as: 
Rule1: The mean of...
Gogolook Confidential 
★Problems 
1. 
Too many features… 
2. 
How to determine the rule? 
3. 
New observations.
Gogolook Confidential 
★Problem 3 
Numbers 
Mean duration of ringing(seconds) 
Mean duration ofout calls (seconds) 
0982-4...
Gogolook Confidential 
Numbers 
Mean duration of ringing(seconds) 
Mean duration ofout calls (seconds) 
0982-415-XXX 
1.6 ...
Gogolook Confidential 
Numbers 
Mean duration of ringing(seconds) 
Mean duration ofout calls (seconds) 
0982-415-XXX 
1.6 ...
Gogolook Confidential 
Machine learning can efficiently “learn” from new data and create rules for us.
Gogolook Confidential 
Power of SVM for multi-classification 
★Power of SVM for multi-classification 
0.8 
0.85 
0.9 
0.95...
Gogolook Confidential 
Accuracy comparison 
★Accuracy comparison 
3 
4 
5 
6 
7 
8 
9 
10 
#recent calls 
0 
0.05 
0.1 
0....
Gogolook Confidential 
Deployment 
All the algorithms have been implemented in the whoscallapp, so how does it work?
Gogolook Confidential 
OO推銷 
小明 
Data center 
Classifier calculating… 
0984-003-XXX 
回傳:此號碼可能 為推銷電話 
所需時間: 50-100 millisec...
Gogolook Confidential 
What’s next?
Gogolook Confidential 
Improvements of the classification model 
1. 
Fraud numbers analysis 
2. 
Fuzzy classification algo...
Gogolook Confidential 
Future perspectives 
1. 
User’s tag correction mechanisms 
2. 
Personalized penalty setting 
3. 
An...
Gogolook Confidential 
Creating a contact network of trust 
感謝大家寶貴的時間
Upcoming SlideShare
Loading in...5
×

資料科學在 Whoscall 產品體系中的角色

17,346

Published on

郭建甫 (Jeff Kuo)
Gogolook 走著瞧公司創辦人兼執行長

郭博士與鄭勝丰、宋政桓一同創立走著瞧 (Gogolook) 公司,目前擔任 WhosCall 開發團隊執行長。曾就讀於成大工業設計學系,畢業於清華大學工業工程研究所,其專精領域為產品設計與使用者經驗研究。專業經歷為德國 Heinz Nixdorf Institute 研究員、先構技研(股)公司共同創辦人、安通國際(股)公司新事業發展總監等。

---

高義銘 (Yimin Kao)
Gogolook 走著瞧公司資料科學家

目前為走著瞧(Gogolook)公司數據分析科學家,畢業於美國北卡州立大學統計系。專業研究領域包含統計分類與分群方法、貝氏模型和空間統計。應用於電腦病毒封包偵測、基因關聯檢測和預測颶風路徑等。以散播音樂和歡樂為人生志向。

Published in: Technology
1 Comment
127 Likes
Statistics
Notes
No Downloads
Views
Total Views
17,346
On Slideshare
0
From Embeds
0
Number of Embeds
22
Actions
Shares
0
Downloads
215
Comments
1
Likes
127
Embeds 0
No embeds

No notes for slide

Transcript of "資料科學在 Whoscall 產品體系中的角色"

  1. 1. Gogolook Confidential
  2. 2. Gogolook Confidential How Started?
  3. 3. Gogolook Confidential How Started?
  4. 4. Gogolook Confidential How Started?
  5. 5. Gogolook Confidential The Best App For identifying and blocking calls The Best App –LINE whoscall
  6. 6. Gogolook Confidential
  7. 7. Gogolook Confidential KeyFeatures
  8. 8. Gogolook Confidential ★Instant Caller Identification LINE whoscallidentifies background information of incoming unknown calls in seconds through tags reported by other users, Internet search results, and our comprehensive global database. Instant Caller Identification
  9. 9. Gogolook Confidential ★Database with over 600Million Phone Numbers LINE whoscallboasts an online database with over 600 million phone numbers. The database of LINE whoscallcovers yellow pages, spammers, telemarketers, costumer services...,etc. with numerous community tags contributed by users and comments based on real users’ experiences. Database & Number Details
  10. 10. Gogolook Confidential Incoming Call Dialogue Incoming Call Dialogue Fraud Call Business Corporation Restaurant
  11. 11. Gogolook Confidential ★Community Tag ★Block unwanted calls & SMSs Contributions from the global user community has always been the pillar of LINE whoscall’sservice. LINE whoscalluser can tag aphone number and share it with others, which creates an integrated phone number database and a reliable communication network for everyone. Block calls and SMSs intelligently to ensure a harassment-free calling experience. Tag & Block
  12. 12. Gogolook Confidential ★World’s Largest Yellow Page Database ★Offline Database Available for Free LINEwhoscallowns one of the world’s largest onlinephonenumber database in the world, which covers most of numbers of businesses and service providers essential to you daily lives. The free database is not only available online but also offline. And they are completely free! The unlimited usage of database with over 600 million phone numbers is only on LINE whoscall. Database Usage
  13. 13. Gogolook Confidential 3 of every 5 strangers’ calls can be identified by LINE whoscall Over 400 million phone calls are identified by LINE whoscallevery month. 3000 spammer numbers are reported by LINE whoscalluser every day. Number Identification –2014.07 –2014.07
  14. 14. Gogolook Confidential Market
  15. 15. Gogolook Confidential Honors
  16. 16. Gogolook Confidential What we will be…
  17. 17. Gogolook Confidential Vision
  18. 18. Gogolook Confidential 資料科學在whoscall的應用 GOGOLOOK資料科學家高義銘
  19. 19. Gogolook Confidential ★日常生活經常會遇到的問題
  20. 20. Gogolook Confidential ★人面對未知的事物就會有一種… 我有一種不祥的預感!
  21. 21. Gogolook Confidential ★坊間流傳著許多解決此問題的APPs 小熊來電通知
  22. 22. Gogolook Confidential ★坊間流傳著許多解決此問題的APPs 小熊來電通知
  23. 23. Gogolook Confidential ★Why whoscall? 因為…他是連Google執行長都說讚的軟體! 唉呦,讚喔
  24. 24. Gogolook Confidential whoscall是如何解決未知來電 的問題咧?
  25. 25. Gogolook Confidential ★Technologies adopted 1. Yellow pages: HiPage, Yelp, Zenrin… 2. Google search 3. Other sources Technologies adopted
  26. 26. Gogolook Confidential ★Technologies adopted Technologies adopted 4. 使用者回報與標記
  27. 27. Gogolook Confidential ★Technologies adopted Technologies adopted 4. 使用者回報與標記
  28. 28. Gogolook Confidential ★whoscall, I have a problem… 如果一個未知號碼,我們無 法從這些sources 去取得任 何資訊,那就GG 了嗎?
  29. 29. Gogolook Confidential ★whoscall, I have a problem… 如果一個未知號碼,我們無 法從這些sources 去取得任 何資訊,那就GG 了嗎? 是的,GG然後洗洗睡…
  30. 30. Gogolook Confidential 當然不能洗洗睡,要不然我站 在這邊幹嘛?
  31. 31. Gogolook Confidential ★Problem we want to solve For an unknown phone number: • No google result • No user tag / report • Not a whoscalluser Problem we want to solve
  32. 32. Gogolook Confidential ★Problem we want to solve For an unknown phone number: • No google result • No user tag / report • Not a whoscalluser Can we determine if it’s a spamnumber? Problem we want to solve
  33. 33. Gogolook Confidential ★Problem we want to solve For an unknown phone number: • No google result • No user tag / report • Not a whoscalluser Can we determine if it’s a spam number? 推銷電話? Problem we want to solve
  34. 34. Gogolook Confidential ★Problem we want to solve For an unknown phone number: • No google result • No user tag / report • Not a whoscalluser Can we determine if it’s a spam number? 推銷電話? 詐騙電話? 騷擾電話? Problem we want to solve
  35. 35. Gogolook Confidential ★Problem we want to solve For an unknown phone number: • No google result • No user tag / report • Not a whoscalluser Can we determine if it’s a spam number? 推銷電話? 詐騙電話? 騷擾電話? 打錯電話? Problem we want to solve
  36. 36. Gogolook Confidential ★Problem we want to solve For an unknown phone number: • No google result • No user tag / report • Not a whoscalluser Can we determine if it’s a spam number? 推銷電話? 詐騙電話? 騷擾電話? 打錯電話? Problem we want to solve (我又不是神!!)
  37. 37. Gogolook Confidential ★Scenario Scenario OO推銷 小明 小明妹 小明哥 ?
  38. 38. Gogolook Confidential ★We think it should work because… whoscalluserbase( = potential sensors) • > 10 million installations • > 10 thousands tags (daily) • > 30 million phone calls (daily)
  39. 39. Gogolook Confidential Analysis procedures Analysis procedures 1. Collect call logs 2. Compare with user tags 3. Explore call behaviors 4. Extract features 5. Classify unknown numbers using machine learning techniques
  40. 40. Gogolook Confidential ★Collect call logs • Recruit a group of voluntary whoscallusers as our sensors. • Collect phone call logs from these sensors for a month. Collect call logs
  41. 41. Gogolook Confidential ★User privacy  User privacy is kept in the highest priority.  Phone numbers are stored as one-way hash codes. (therefore unable to be reversed) User privacy
  42. 42. Gogolook Confidential Analysis procedures Analysis procedures 1. Collect call logs 2. Compare with user tags 3. Explore call behaviors 4. Extract features 5. Classify unknown numbers using machine learning techniques
  43. 43. Gogolook Confidential ★List of user tags List of user tags 一接就掛斷 一打來就掛掉 一接對方馬上掛斷 一接就掛電話 一接起來就掛斷電話 一接起來,就說打錯 一直傳廣告簡訊 一直打錯電話 一直收到沒顯示的 APP 一直狂打錯電話 一聲 一聲不響,就掛掉, 有問題 一聲就掛 一聲掛斷 一聽收線 嚴重騷擾 國外莫名來電 國際電話偽裝台北碼??? 地下錢莊 地下錢莊推銷 地下非法期公司 地產 垃圾 垃圾簡訊 垃圾訊息 基隆美髮 壽險 外勞 夜半打給不認識的在亂 色情交友 色情交友電話 色情人肉市場 色情垃圾簡訊 色情外送 色情妹妹電話 色情干擾 色情廣告簡訊 色情拉客妹 色情按摩 色情推銷 色情推銷電話 色情援交外送 色情敗類 摩門 撥了馬上掛掉 擾亂電話 收數率調查 收視率調查 放款簡訊 政府宣導 敲一聲而已 整人電話 新光保全 星展借貸 星展推消 星展銀行 淫媒仲介
  44. 44. Gogolook Confidential ★Compare with user tags • Compare these phone numbers with user reports from whoscalldatabase (封鎖記錄) Compare with user tags Normal numbers 0987-991-XXX 0986-225-XXX 02-2675-XXXX 03-862-XXXX ... 02-2543-XXXX 03-556-XXXX 886-XXXX … 推銷電話 02-2783-XXXX 886-903-XXXX 0800-000-XXX … 惡意電話
  45. 45. Gogolook Confidential ★Data summary Data summary 推銷電話 民調中心 騷擾電話 詐騙電話 70% 1% 5% 24% # Samples: 7854 Normal: 4000 Spam: 3854
  46. 46. Gogolook Confidential Analysis procedures Analysis procedures 1. Collect call logs 2. Compare with user tags 3. Explore call behaviors 4. Extract features 5. Classify unknown numbers using machine learning techniques
  47. 47. Gogolook Confidential Normal numbers 0 5 10 15 20 Calls =195 (in 66, out 129) Opponents = 72 (in 21, out 58) ★Normal numbers
  48. 48. Gogolook Confidential ★Spam numbers Spam numbers 0 10 20 30 Calls =471 (in 15, out 456) Opponents = 186 (in 11, out 183) XX信用卡行銷(7) OOO,XXXX行銷(6) 電話行銷(3)
  49. 49. Gogolook Confidential Analysis procedures Analysis procedures 1. Collect call logs 2. Compare with user tags 3. Explore call behaviors 4. Extract features 5. Classify unknown numbers using machine learning techniques
  50. 50. Gogolook Confidential ★What is a feature? What is a feature? “Feature”is a measurable property of a phenomenon being observed.
  51. 51. Gogolook Confidential Example Or, we want to analyze a company, we can look at features: 公司人數 ★Example
  52. 52. Gogolook Confidential Example Or, we want to analyze a company, we can look at features: 工程師人數 ★Example
  53. 53. Gogolook Confidential Example Or, we want to analyze a company, we might look at features: 公司裡面 Python工程師 的比例 ★Example
  54. 54. Gogolook Confidential Example Or, we want to analyze a company, we might look at features: 公司向心力 ★Example
  55. 55. Gogolook Confidential Example Or, we want to analyze a company, we might look at features: CEO帥氣程度 ★Example
  56. 56. Gogolook Confidential Features for call patterns Ratio of out calls 0.8 0.6 0.4 0.2 0.0 Fraud Marketing Normal
  57. 57. Gogolook Confidential Features for call patterns Ratio of recurring opponents Fraud Marketing Normal 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
  58. 58. Gogolook Confidential Features for call patterns Ratio of missed out calls Fraud Marketing Normal 0.6 0.5 0.4 0.3 0.2 0.1 0
  59. 59. Gogolook Confidential Features for call patterns Ratio of working time calls Fraud Marketing Normal 0.6 0.5 0.4 0.3 0.2 0.1 0 0.7
  60. 60. Gogolook Confidential Features for call patterns Median of call durations Fraud Marketing Normal 50 40 30 20 10 0 60 seconds
  61. 61. Gogolook Confidential Features for call patterns Ratio of out calls in contact book Fraud Marketing Normal 0.10 0 0.25 0.30 0.35 0.20 0.15 0.05
  62. 62. Gogolook Confidential Analysis procedures Analysis procedures 1. Collect call logs 2. Compare with user tags 3. Explore call behaviors 4. Extract features 5. Classify unknown numbers using machine learning techniques
  63. 63. Gogolook Confidential Ratio of recurring components is less than 40% Ratio of out calls is more than 60% Ratio of in calls is less than 20% Then we claim the number is a spam number Intuitively, we can determine an unknown number by rules such as if ★Naïve method
  64. 64. Gogolook Confidential ★Problem 1 Too many features…
  65. 65. Gogolook Confidential ★Problem 2 How to determine the rule?
  66. 66. Gogolook Confidential Machine learning ★Solution
  67. 67. Gogolook Confidential Machine learning ★Solution Let the machine learn from the data
  68. 68. Gogolook Confidential What is machine learning? ★What is machine learning? 機器學習是一種從過去的資料或經驗當中,構造一 個模型(Model),而學習(Learning)這件事就是讓 這個模型以程式的方式執行,等到學習到一定的程 度後,就可以做預測(猜),這個「猜」是有根據的, 且命中率高的。
  69. 69. Gogolook Confidential Machine learningtechniques for classification ★Machine learning techniques for classification Support vector machine Logistic regression Decision tree Neural networks Naïve Bayes Nonparametric Bayesian method
  70. 70. Gogolook Confidential Support vector machine for binary classification ★Support vector machine for binary classification
  71. 71. Gogolook Confidential Support vector machine for binary classification ★Support vector machine for binary classification
  72. 72. Gogolook Confidential Support vector machine for binary classification ★Support vector machine for binary classification
  73. 73. Gogolook Confidential Support vector machine for binary classification ★Support vector machine for binary classification
  74. 74. Gogolook Confidential Support vector machine for binary classification ★Support vector machine for binary classification
  75. 75. Gogolook Confidential Support vector machine for binary classification ★Support vector machine for binary classification
  76. 76. Gogolook Confidential Support vector machine for binary classification ★Support vector machine for binary classification
  77. 77. Gogolook Confidential Support vector machine for binary classification ★Support vector machine for binary classification
  78. 78. Gogolook Confidential 這樣就夠了嗎?
  79. 79. Gogolook Confidential Real-life scenario ★Real-life scenario When will we require a spam number prediction? Ans: The time a phone call reaches a whoscalluser We want to predict whether a number is spam as EARLYas possible in order to prevent further victims…
  80. 80. Gogolook Confidential Real-life scenario Time #recent calls Victim 1 Victim 2 Victim 3 XX推銷 ★Real-life scenario 推銷電話
  81. 81. Gogolook Confidential Let’s look at the performances of SVM under different numbers of recent calls
  82. 82. Gogolook Confidential SVM for binary classification ★SVM for binary classification 0.8 0.85 0.9 0.95 1.0 3 4 5 6 7 8 9 10 # recent calls Accuracy
  83. 83. Gogolook Confidential 嗯…表現的不錯,但是… 可以再快一點嗎?
  84. 84. Gogolook Confidential Reduce the number of features ★Reduce the number of features Features computation is time-consuming. So we want to reduce the number of features before we do classification.
  85. 85. Gogolook Confidential Reduce the number of features ★Reduce the number of features Features computation is time-consuming. So we want to reduce the number of features before we do classification. 當然我們不是用手去選…
  86. 86. Gogolook Confidential Reduce the number of features ★Reduce the number of features Features computation is time-consuming. So we want to reduce the number of features before we do classification. Feature selection methods: Regularization methods Backward, forward, and stepwise methods Bayesian feature selection Random forest method
  87. 87. Gogolook Confidential Feature selection results ★Feature selection results 10 15 20 25 30 3recent calls 5recent calls 10 recent calls 0.8 0.85 0.9 0.95 1.0 # features Accuracy
  88. 88. Gogolook Confidential Feature selection results ★Feature selection results 10 15 20 25 30 3recent calls 5recent calls 10 recent calls 0.8 0.85 0.9 0.95 1.0 # features Accuracy
  89. 89. Gogolook Confidential Feature selection results ★Feature selection results 10 15 20 25 30 3recent calls 5recent calls 10 recent calls 0.8 0.85 0.9 0.95 1.0 # features Accuracy
  90. 90. Gogolook Confidential Ratio of out calls Rate of out calls Ratio of out calls in contact book Ratio of reciprocal opponents Ratio of recurring opponents Median call duration of in calls Ring duration of answered calls and more… ★Selected features Ratio of missed calls Rate of new opponents Ratio of in calls in contact book
  91. 91. Gogolook Confidential ★Comparison of w/ and w/o feature selection 0.8 0.85 0.9 0.95 1.0 3 4 5 6 7 8 9 10 # recent calls Accuracy
  92. 92. Gogolook Confidential Done? 阿不就好棒棒?
  93. 93. Gogolook Confidential What is power? ★What is power? Power of class A: The probability of accurately classify a class A sampleto class A.
  94. 94. Gogolook Confidential What is power? ★What is power? Power of class A: The probability of accurately classify a class A sampleto class A. 性別 Classifier 97.5% this is a male
  95. 95. Gogolook Confidential What is power? ★What is power? Power of class A: The probability of accurately classify a class A sampleto class A. 性別 Classifier 97.5% this is a male
  96. 96. Gogolook Confidential Power of our classifier ★Power of our classifier 0.8 0.85 0.9 0.95 1.0 3 4 5 6 7 8 9 10 #recent calls Power
  97. 97. Gogolook Confidential 義銘, 加油好嗎?
  98. 98. Gogolook Confidential ★Data summary Data summary 推銷電話 民調中心 騷擾電話 詐騙電話 70% 1% 5% 24% # Samples: 7854 Normal: 4000 Spam: 3854
  99. 99. Gogolook Confidential ★Data summary Data summary 推銷電話 民調中心 騷擾電話 詐騙電話 70% 1% 5% 24% # Samples: 7854 Normal: 4000 Spam: 3854
  100. 100. Gogolook Confidential Marketing numbers vs. normal numbers ★Marketing numbers vs. normal numbers 0.8 0.85 0.9 0.95 1.0 3 4 5 6 7 8 9 10 #recent calls Accuracy
  101. 101. Gogolook Confidential Fraud numbers vs. normal numbers ★Fraud numbers vs. normal numbers 0.8 0.85 0.9 0.95 1.0 3 4 5 6 7 8 9 10 #recent calls Accuracy
  102. 102. Gogolook Confidential 一種 摻在一起做撒尿牛丸的概念…
  103. 103. Gogolook Confidential Power of SVMfor multi-classification ★Power of SVM for multi-classification 0.8 0.85 0.9 0.95 1.0 3 4 5 6 7 8 9 10 #recent calls Power
  104. 104. Gogolook Confidential Power of SVM for binary classification ★Power of SVM for binary classification 0.8 0.85 0.9 0.95 1.0 3 4 5 6 7 8 9 10 #recent calls Power
  105. 105. Gogolook Confidential What is type I error rate? ★What is type I error rate? Type I error: The probability of misclassify a class B sampleto class A. 性別 Classifier 5% this is a male
  106. 106. Gogolook Confidential What is type I error rate? ★What is type I error rate? Type I error: The probability of misclassify a class B sampleto class A. 性別 Classifier 5% this is a male
  107. 107. Gogolook Confidential Type I error comparison ★Type I error comparison 0 0.05 0.1 0.15 0.3 3 4 5 6 7 8 9 10 #recent calls Type I error 0.2 0.25
  108. 108. Gogolook Confidential 這點小成果讓我稍稍放鬆地去逛街,突然電 話響一聲,我開心地接了起來…
  109. 109. Gogolook Confidential 結果,對方掛斷了
  110. 110. Gogolook Confidential 響一聲掛斷的惡意電話 ★響一聲掛斷的惡意電話  “響一聲掛斷”(one-ring call) 是一種引誘接電話 者回撥的惡意電話,通常伴隨著高額付款電話。  於是我們先觀察“響一聲掛斷”這類型電話號碼 的call patterns。
  111. 111. Gogolook Confidential Call patterns of one-ring calls ★Call patterns of one-ring calls Numbers Mean duration of ringing(seconds) Mean duration ofout calls (seconds) 0982-415-XXX 1.6 0 0982-420-XXX 3.6 0 0982-495-XXX 5.2 1.25 04-3-704-XXXX 0.9 0 0923-931-XXX 6.7 2.6
  112. 112. Gogolook Confidential Feature comparison Ratio of new opponents Fraud Marketing Normal One-ring 0 0.2 0.4 0.6 0.8
  113. 113. Gogolook Confidential Feature comparison Ratio of in calls 0 0.1 0.2 0.3 0.4 0.5 Fraud Marketing Normal One-ring
  114. 114. Gogolook Confidential Feature comparison Ratio of missed calls 0 0.2 0.4 0.6 0.8 Fraud Marketing Normal One-ring
  115. 115. Gogolook Confidential ★Naïve method Similarly, without machine learning we can design rules such as:
  116. 116. Gogolook Confidential ★Naïve method Similarly, without machine learning we can design rules such as: Rule1: The mean of the ringing duration is less then 7 seconds. and Rule 2: The mean of the outcall duration is less than 3 seconds. Then we claim that it is a one-ring spam call.
  117. 117. Gogolook Confidential ★Problems 1. Too many features… 2. How to determine the rule? 3. New observations.
  118. 118. Gogolook Confidential ★Problem 3 Numbers Mean duration of ringing(seconds) Mean duration ofout calls (seconds) 0982-415-XXX 1.6 0 0982-420-XXX 3.6 0 0982-495-XXX 5.2 1.25 04-3-704-XXXX 0.9 0 0923-931-XXX 6.7 2.6
  119. 119. Gogolook Confidential Numbers Mean duration of ringing(seconds) Mean duration ofout calls (seconds) 0982-415-XXX 1.6 0 0982-420-XXX 3.6 0 0982-495-XXX 5.2 1.25 04-3-704-XXXX 0.9 0 0923-931-XXX 6.7 2.6 04-2-676-XXXX 15.7 1.4 ★Problem 3 New observation
  120. 120. Gogolook Confidential Numbers Mean duration of ringing(seconds) Mean duration ofout calls (seconds) 0982-415-XXX 1.6 0 0982-420-XXX 3.6 0 0982-495-XXX 5.2 1.25 04-3-704-XXXX 0.9 0 0923-931-XXX 6.7 2.6 04-2-676-XXXX 15.7 (S.D.=10.7) 1.4 ★Problem 3
  121. 121. Gogolook Confidential Machine learning can efficiently “learn” from new data and create rules for us.
  122. 122. Gogolook Confidential Power of SVM for multi-classification ★Power of SVM for multi-classification 0.8 0.85 0.9 0.95 1.0 3 4 5 6 7 8 9 10 #recent calls Power
  123. 123. Gogolook Confidential Accuracy comparison ★Accuracy comparison 3 4 5 6 7 8 9 10 #recent calls 0 0.05 0.1 0.15 0.3 0.2 0.25 Type I error
  124. 124. Gogolook Confidential Deployment All the algorithms have been implemented in the whoscallapp, so how does it work?
  125. 125. Gogolook Confidential OO推銷 小明 Data center Classifier calculating… 0984-003-XXX 回傳:此號碼可能 為推銷電話 所需時間: 50-100 milliseconds
  126. 126. Gogolook Confidential What’s next?
  127. 127. Gogolook Confidential Improvements of the classification model 1. Fraud numbers analysis 2. Fuzzy classification algorithm 3. Spam-category scores 4. Cooperate with more solid outside sources 5. Generalize to other countries. Much more… ★Improvements of the classification model
  128. 128. Gogolook Confidential Future perspectives 1. User’s tag correction mechanisms 2. Personalized penalty setting 3. Anti-countermeasures 4. Extend to SMS spam detection 5. Clustering vs. user tags 6. Spam detect Scam detection ★Future perspectives
  129. 129. Gogolook Confidential Creating a contact network of trust 感謝大家寶貴的時間
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×