Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising

674 views

Published on

Robin Li, Director of Data Engineering and Yohan Chin, VP Data Science at Tapjoy share how to architect the best application experience for mobile users using technologies including Apache Kafka, Apache Spark, and MemSQL.

Speaker: Robin Li - Director of Data Engineering, Tapjoy and Yohan Chin - VP Data Science, Tapjoy

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising

  1. 1. Building Real-time Analytics Engine with Kafka and Spark for Mobile Advertising
  2. 2. Mobile Advertising? - Social & Game Authentic to Consumers Authentic to Entertainment Authentic to Engagement Mobile Games
  3. 3. eMarketer, “US Mobile Phone Content Usage Metrics, 2013-2019.” February, 2015. People Spend A Lot of Time Gaming 3 Over 55 minutes a day on average is spent playing mobile games Minutes Spent in Mobile eMarketer, “US Mobile Phone Content Usage Metrics, 2013-2019.” February, 2015.
  4. 4. Innovate Advertising as Reward Ads ● Free-to-Play (Freemium) App ● Only 2~5% users In-app-purchase ● Publisher can give “reward” on users who engaged to Ads ● Video + Game Economics + Reward
  5. 5. Mobile Video App Advertising Advertiser Pay on Video-View Pub Paid Tapjoy Profit User Earn Reward
  6. 6. Video to Install Video Install reward No reward
  7. 7. Video to Install to Event Video Install reward No reward Event - Level N - Registration - In-app-purchase - First Booking
  8. 8. Mobile Video App Advertising - Data Science Video Views Installs Early Retention Life Time Value “Event” Look-alike Model Real-time Bidding Engine Advertiser’s Return “Investment”
  9. 9. Building a Data Science Platform Bigger in Scale Faster Serving Smart and Smarter ! Data Product
  10. 10. Tapjoy’s Data Platform Algo Serving InfrastructureDatawarehousing 300,000 RPM throughput Bidding & Targeting & Personalization <10 ms response time 20 TB daily addition 2.3 PB DUM Cloud & On-Premise In-house & SaaS Batch based & real-time
  11. 11. The Logic Stack Data warehousing HDFS / S3 / GS Reporting MPPs (BigQuery) Algo Service Batch + Streaming Hadoop / Spark • Collect data, set rules • Reduce data friction • Improve signal-to-noise ratio • Model training & iteration • Deliver business insights • Driving data awareness • Apply ideas to product (online) • Serve model output • Drive revenue DataViz A/B Testing Data Viz
  12. 12. The Data Flow
  13. 13. Tapjoy’s Algo Service Engine (SOA) ● SOA (algo service) in Natty ● 320, 000 lines of Java ● 99% response time < 20 ms @ 200k - 400k RPM Ad Request A/B test classification Main Algo & pre-filters Apply Logic Pipe Response (offer list) Video Bidding Targeting Persona Lookalike ... Biz logic filters
  14. 14. Algo Service’s Data Components Component What’s in there Purpose Kafka Raw activity logs Everything starts here Spark Streaming ETL ETL & Algo feature updates Aerospike User Big Table (User DNA) Real-time k-v lookups. I.e LookALike MemSQL Striped down raw user activity data!! ● Device level real time aggregations ● Hot data sink ● Real time reporting Elasticsearch Aggregates or Unstructured logs Cube aggregates or fulltext search
  15. 15. Mobile Video App Advertising - Data Science Video Views Installs Early Retention Life Time Value “event” Look-alike Model Real-time Bidding Engine Advertiser’s Return “Investment”
  16. 16. Big Table / MemCache Use Case 1 - Ad-Request Level Decision Video Bid # CVR Spending History max(views) > T(n) ... User app usages Kafka + Spark Streaming S - App 1 S - App 2 S - App 3 S - App .. S - App N Lamda Batch
  17. 17. Use Case 1 - Ad-Request Level Decision Video Bid Kafka OR Spark Streaming S - App RAW DATA
  18. 18. Use Case 1 - Ad-Request Level Decision High throughput low latency queries querying 30 days device level data which are streamed into MemSQL. Does the calculations on the fly and serving as decision features Reference Join Subquery Reference Join
  19. 19. In Fact - One Fits All Algo Serving Kafka OR Spark Streaming Real-Time Dashboard Data Warehouse Hot Batch Data Sink Hot Batch Realtime Query Realtime Query
  20. 20. eMarketer, “US Mobile Phone Content Usage Metrics, 2013-2019.” February, 2015. Conclusion 2 ❖ Mobile Advertising is all about knowing your audience ❖ Fast & Accurate data is key to Data Science as Service ❖ But, “Realtime” is a relative word ❖ Try to simplify moving parts when it come to streaming ➢ Difficult to debug ➢ Hard to backfill ❖ Generalized hot-data sink for stability and multi-purpose data storage yohan.chin@tapjoy.com robin.li@tapjoy.com

×