Robin Li, Director of Data Engineering and Yohan Chin, VP Data Science at Tapjoy share how to architect the best application experience for mobile users using technologies including Apache Kafka, Apache Spark, and MemSQL.
Speaker: Robin Li - Director of Data Engineering, Tapjoy and Yohan Chin - VP Data Science, Tapjoy
2. Mobile Advertising? - Social & Game
Authentic to Consumers Authentic to Entertainment Authentic to Engagement
Mobile Games
3. eMarketer, “US Mobile Phone Content Usage Metrics, 2013-2019.” February, 2015.
People Spend A Lot of Time Gaming
3
Over 55 minutes a day on
average is spent playing mobile
games
Minutes Spent in Mobile
eMarketer, “US Mobile Phone Content Usage Metrics, 2013-2019.” February, 2015.
4. Innovate Advertising as Reward Ads
● Free-to-Play (Freemium) App
● Only 2~5% users In-app-purchase
● Publisher can give “reward” on users who engaged to Ads
● Video + Game Economics + Reward
5. Mobile Video App Advertising
Advertiser
Pay on Video-View
Pub Paid
Tapjoy Profit
User Earn
Reward
7. Video to Install to Event
Video Install
reward
No reward
Event
- Level N
- Registration
- In-app-purchase
- First Booking
8. Mobile Video App Advertising - Data Science
Video Views
Installs
Early
Retention
Life Time
Value
“Event”
Look-alike
Model
Real-time
Bidding
Engine
Advertiser’s Return
“Investment”
9. Building a Data Science Platform
Bigger in Scale
Faster
Serving
Smart and
Smarter !
Data Product
10. Tapjoy’s Data Platform
Algo Serving InfrastructureDatawarehousing
300,000 RPM throughput
Bidding & Targeting &
Personalization
<10 ms response time
20 TB daily addition
2.3 PB DUM
Cloud & On-Premise
In-house & SaaS
Batch based & real-time
11. The Logic Stack
Data
warehousing
HDFS / S3 / GS
Reporting
MPPs (BigQuery)
Algo Service
Batch + Streaming
Hadoop / Spark
• Collect data, set rules
• Reduce data friction
• Improve signal-to-noise ratio
• Model training & iteration
• Deliver business insights
• Driving data awareness
• Apply ideas to product (online)
• Serve model output
• Drive revenue
DataViz
A/B Testing
Data Viz
13. Tapjoy’s Algo Service Engine (SOA)
● SOA (algo service) in Natty
● 320, 000 lines of Java
● 99% response time < 20 ms @ 200k - 400k RPM
Ad Request
A/B test classification
Main Algo & pre-filters
Apply Logic Pipe
Response (offer list)
Video Bidding
Targeting
Persona
Lookalike
...
Biz logic filters
14. Algo Service’s Data Components
Component What’s in there Purpose
Kafka Raw activity logs Everything starts here
Spark Streaming ETL ETL & Algo feature updates
Aerospike User Big Table (User DNA) Real-time k-v lookups. I.e LookALike
MemSQL Striped down raw user activity
data!!
● Device level real time
aggregations
● Hot data sink
● Real time reporting
Elasticsearch Aggregates or
Unstructured logs
Cube aggregates or fulltext search
15. Mobile Video App Advertising - Data Science
Video Views
Installs
Early
Retention
Life Time
Value
“event”
Look-alike
Model
Real-time
Bidding
Engine
Advertiser’s Return
“Investment”
16. Big Table / MemCache
Use Case 1 - Ad-Request Level Decision
Video Bid
# CVR
Spending History
max(views) > T(n)
...
User app usages
Kafka
+
Spark
Streaming
S - App 1
S - App 2
S - App 3
S - App ..
S - App N
Lamda Batch
17. Use Case 1 - Ad-Request Level Decision
Video Bid
Kafka
OR
Spark
Streaming
S - App
RAW
DATA
18. Use Case 1 - Ad-Request Level Decision
High throughput low latency queries querying 30 days device
level data which are streamed into MemSQL.
Does the calculations on the fly and serving as decision features
Reference Join Subquery
Reference Join
19. In Fact - One Fits All
Algo
Serving
Kafka
OR
Spark
Streaming
Real-Time Dashboard
Data Warehouse Hot Batch
Data
Sink
Hot
Batch
Realtime
Query
Realtime
Query
20. eMarketer, “US Mobile Phone Content Usage Metrics, 2013-2019.” February, 2015.
Conclusion
2
❖ Mobile Advertising is all about knowing your audience
❖ Fast & Accurate data is key to Data Science as Service
❖ But, “Realtime” is a relative word
❖ Try to simplify moving parts when it come to streaming
➢ Difficult to debug
➢ Hard to backfill
❖ Generalized hot-data sink for stability and multi-purpose data storage
yohan.chin@tapjoy.com
robin.li@tapjoy.com