Big Data 的商機、挑戰、策略與執行
--- 以移動廣告大數據為例
Vpon 行動科技
數據科學家 趙國仁
Data Scientist Craig Chao
craig.chao@vpon.com, chaocraig@gmail.com
Business Opportunities, Challenges, Strategies and
Execution in Big Data Era
--- A Case of Mobile Ads Big Data
Prelog – Myths of Big Data
 Big Data, Big Hype?
 Machine Learning & Statistics have been used in
many places, nothing new in Big Data?
 Big Data is Hadoop / Open Source?
Agenda
• Innovative Cases of BIG DATA
• What is the BIG DATA eventually?
• A Case of Big Data in Mobile Ads
• Yes! We have lots of DATA?!
• Big Data is not only about Technology
but also Org.+Culture+Eco-system
• Summary
Innovative Cases of
BIG DATA
全球最先進
的追蹤器:
活動追蹤、
睡眠追蹤、
Smart Coach
和心臟健康
記錄
iPaaS幫助各公司在雲端中及內部部
署連接企業應用程式
癌
症
分
析
視
覺
化
iPod 之
父Tony
Fadell
創建的
恆溫器
智慧家
居公司
醫療資
料的整
合與分
析
政
府
支
出
公
開
平
台
開
車
更
省
油、
安
全
服務科
技領域
人士的
在線理
財咨詢
管理平
台
Big Data - Google Now
Big Data App
What is the BIG DATA
eventually?
Outlook of Big Data
 Hard to be handled by traditional RDB/SQL DB
 Sources
Intranet:Machine logs
Extranet:Internet users & machines
 Difficult to be utilized by only statistical sampling
 “If you have people in the loop, it’s not real time.”
Joe Hellerstein, Chancellor’s Professor of
Computer Science at UC Berkeley
Challenges of Big Data - 4V
資料量大 資料多樣性
資料輸入
和處理速度快
資料真實性
The Revolution of Big Data
DATA
Hypotheses
Statistical Analysis
BIG DATA
Hypotheses
Machine Learning
Data Mining
Machine-generated
Sampling, Multi-variant… All, Hyper space, …
Volume, Velocity, Variety, Veracity
Human-explainable
Top Truth of Big Data
Source: HP(2013)
A Case of Big Data
in Mobile Ads
Mobile Big Data in Vpon
• Profile
• Classification
• Recommendation
Retargeting
2B+ in China
6M+ in HK
17M+ in TW
User Behavior Data Mine
20GB/day
20TB/year
MLDM to mine the data value
In-database Processing(MPP)
Exploratory Architeture
Spark/Ha
doop
Cluster
Exploratory Architecture
Spark/Ha
doop
Cluster
RRE
In TD
Multi-
core
RRE
RRE
In
Spark
Tableau
Aggregate
Export
RRE
In
Spark
Pricing Engine Framework
Kafka
HDFS
Apache Spark
Jenkins
Realtime processors
( Spark Streaming)
DataInjection
Speed Layer
Batch Layer
ServingLayer
Kafka
DataStreaming
Couchbase
Docker Container
Avro
Avro
Akka/Scala Actors
Data Science Performance
Performance (CTR, CVR, CPI)
of and and DSP
Data Algorithms
Tools
Data Science Performance
Problem-solving
Thinking
Performance (CTR, CVR, CPI)
of AdNet and DSP
Data Algorithms
Tools
Yes!
We have lots of DATA?!
3R: Reach, Richness, Range
Reach
Richness
High
High
Low
使用者接觸量(DAU)
資料豐富度
(Behavioral data)
Range
High
使用者情境
(The audience
affiliate of
whole context)
Data Economy
Traditional -> Internet Economy
HighREACH
RICHNESS
High
Low
Traditional Economy
Internet Economy
(quality)
(quantity)
Reach: The Value Funnel
CPM campaign:
Revenue = N/1000 ⋅CPM
CPC campaign:
Revenue = N ⋅ CTR ⋅ CPC
CPA campaign:
Revenue = N ⋅ CTR ⋅
CVR⋅ CPA
UU Reach (DAU)
ARPU = Life-time Value
Richness
Data Quality  Predictive Power
Richness: Predictive Power
APP類型偏好
使用裝置
使用時間
定位區域
廣告行為偏好
Conversions Logs
Behavioral Data Attribution Data
Richness
 Data Quality Richness
 Data Utilization Richness
Download times vs. Activation days
 Data Model Richness
Range
Range
- Roger Martin
Rothman School of Management, Toronto
If only attach importance to quantify the business
model, it will not have the ability to find a potential
growth opportunities: "The pursuit of quantifying the
biggest problem is that people ignore the context of
the behavior generated, detached from the context of
the event, and have not been included in the model
ignores variables effectiveness. "
企業若只重視量化模式,
將無法擁有尋得潛在成長
契機的能力:「追求量化
最大的問題在於,忽略人
們產生行為的脈絡,把事
件從情境中抽離,且忽略
沒有被納入模式中的變數
效力。」
Range
Range
Range
Brand Awareness
View
Rating
Reach
TV campaign
Conversions
Click
Impression
Request
Range
Mobile Campaign
Actions
Traffic
Buzz
Reach
Offline Campaign
Reach
Richness
Cross-screen Effect
成功案例:掌握3R成效更優異!
Cross-screen synergy
 Big data synergy with Cross-screen effect。
+TV
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
16.00%
0
5000
10000
15000
20000
25000
30000
35000
40000
Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun
APP下載率 優化轉換率
App Download Rate Optimized Conversion Rate
3R: Reach, Richness, Range
Reach
Richness
High
High
Low
使用者接觸量(DAU)
資料豐富度
(Behavioral data)
Range
High
使用者情境
(The audience
affiliate of
whole context)
World, Model & Theory
Credit: John F. Sowa
Big Data is not only about
Technology but also
Org. + Culture + Eco-system
Challenges of Big Data Company
 Tools
Commercial Big Data Tools is Expensive
Open Source Tools need high-skill talents
 Organization
Performance metric of developers
Most people do not understand 3R of data
Data BD, Campaign Manager, Data Engineer,
Data Scientist
 Time
Accumulate behavioral data, Tuning models, Org
& Culture changes
Challenges of Big Data Company
BDSales + AS
Sales + CM
Data BD
Data Engineer +
Data Scientist
Conversions +
3rd Tracking
After dis-intermediary /
Re-intermediary
品牌會員資料庫
• 姓名
• 年齡
• 電話
• email
• 地址
• 購買產品
• 職業…
會員資料庫更新
不易,部份資料
參考性低
AdN資料庫
• 使用手機
• 經常出沒定位
• 使用APP
• 使用時間
• 廣告偏好…
持續收集使用者
行動數據行為喜
好,發掘TA潛在
喜好需求
!
連結
資料庫
M-CRM
活
化
First-party Data
Third-party Data
Micro-Targeting
指定
投遞
指定
排除
曝光
頻次
APP
偵測
投放
情境
投放
廣告
偏好
指定
品牌
粉絲
產品
使用者
興趣
偏好
收集用戶行為數據
Micro-Targeting
1
2
3
大數據分析
找到潛在客群
優化投放
TA in a Closed Loop
數據分析
廣告投放
消費者輪廓
更有效的接觸
你的消費族群
APP類型偏好
使用裝置
使用時間
定位區域
廣告行為偏好
為下次的活動做足
準備,優化成效!
Summary
Data-driven Performance
Problem-solving
Thinking
Performance (CTR,
CVR, CPI)
of AdNet and DSP
Data
Algorit
hms
Tools
Reach
Richness
High
High
Low
使用者接觸量
(DAU)
資料豐富度
(Behavioral
data)
Range
High
使用者情境 (The
audience affiliate
of whole context)
BIG
DATA
Humility
謙虛
Humanity
人性
資料始終為了人性
Use Data, not be Used.
謝謝大家!
craig.chao@vpon.com, chaocraig@gmail.com

Business Opportunities, Challenges, Strategies and Execution in Big Data Era --- A Case of Mobile Ads Big Data

Editor's Notes

  • #6 Jawbone:
  • #18 參考 iClick Spark 缺點: in-memory 不夠大? Java Gabage Collection
  • #35 氣候:颱風天的轉化率特別好!