Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Flink Forward Berlin 2018: Henri Heiskanen - "How to keep our flock happy with Apache Flink on AWS"

210 views

Published on

Data is in the very core how Rovio builds and operates its games. What does data mean for Rovio: how its processed and how we gain value from it? In this talk we take a deep dive into Rovio analytics pipeline and its use cases. We will give you a brief history lesson on how a purely batch based system has evolved into hybrid streaming and batch system, and share how we operate our production pipeline in AWS.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Flink Forward Berlin 2018: Henri Heiskanen - "How to keep our flock happy with Apache Flink on AWS"

  1. 1. Rovio © 2018Rovio © 2017 Confidential HOWTOKEEPOURFLOCKHAPPY WITHAPACHEFLINKONAWS
  2. 2. Rovio © 2018 AGENDA ANALYTICS @ ROVIO WHAT IS ROVIO AND HOW DOES ROVIO UTILIZE DATA? DO WE NEED STREAMING? WHY WOULD A COMPANY WITH A WORKING BATCH PROCESSING PIPELINE INVEST IN STREAMING? WHAT USE CASES ROVIO CURRENTLY RUNS IN STREAMING? RUNNING FLINK IN PRODUCTION ON AWS HOW DOES ROVIO OPERATE ITS PRODUCTION PIPELINE IN AWS?
  3. 3. Rovio © 2018 This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 688191.
  4. 4. Rovio © 2018 WE ARE HIRING! Rovio © 2017 Confidential SENIOR DATA ANALYST (STOCKHOLM) BLOCKCHAIN RESEARCH DEVELOPER … AND ALWAYS LOOKING FOR NEW TALENT Photo: Lauri Rotko
  5. 5. Rovio © 2018 POSITIONING
  6. 6. Rovio © 2018 OURMISSION
  7. 7. Rovio © 2018 INANEGGSHELL 2003 FOUNDED IN CREATOR OF BUSINESS UNITS GAMES 3 STUDIOS BRAND LICENSING < 400ROVIANS ESPOO, STOCKHOLM, LONDON, SHANGHAI, LA PEOPLE 4BSINCE 2009 DOWNLOADS $350M BOX OFFICE SEQUEL SEPTEMBER 2019
  8. 8. Rovio © 2018 HOWDOWEUSEDATA? MARKET RESEARCH PERFORMANCE MARKETING GAME OPTIMIZATION
  9. 9. Rovio © 2018 PIPELINE
  10. 10. Rovio © 2018 BEACON ANALYTICS A/BTESTING PERSONALIZATION PAYMENT ADS PUSH
  11. 11. Rovio © 2018 BEACON-MANAGE Show one forced interstitial and only show xpromo reward videos for churning players
  12. 12. Rovio © 2018 27% more xpromo users 0% retention impact
  13. 13. Rovio © 2018 DO WE NEED STREAMING?
  14. 14. Rovio © 2018 TIME TO DATA
  15. 15. Rovio © 2018 ANOMALYDETECTION
  16. 16. Rovio © 2018
  17. 17. Rovio © 2018 PERSONALIZATION
  18. 18. Rovio © 2018 INTEGRATIONWITH3RD PARTYSYSTEMS
  19. 19. Rovio © 2018 STREAMINGDEDUPLICATION
  20. 20. Rovio © 2018 INPROGRESS:STREAMING4ALL?
  21. 21. Rovio © 2018 WHAT WE NEED TO RUN STREAMING PRODUCTION PIPELINE?
  22. 22. Rovio © 2018 WENEED… DATA Backfillingthe historicaldata SLA Systemmustoperate 24/7 CI Implementand deployfeaturescare free
  23. 23. Rovio © 2018 HOW DO WE OPERATE AND MONITOR FLINK STREAMS?
  24. 24. GITFLOW INSPIRED BRANCHING STRATEGY IN GITHUB. TEAMCITY PROJECTS TO BUILD AND DEPLOY FLINK STREAM UBER JARS TO DEVELOPMENT, STAGING AND PRODUCTION S3 BUCKETS.
  25. 25. Rovio © 2018 AZKABAN WORKFLOW MANAGER WITH FLINK PLUGIN USED TO ORCHESTRATE ENABLED EMR CLUSTERS ELASTIC MAPREDUCE WITH YARN BOOTSTRAP ACTIONS TO SETUP FLINK, LOGGING AND MONITORING CUSTOM TOOLS AND LIBRARIES MOSTLY FOR STATE HANDLING MONITORS OFFSET LAG, ACCUMULATORS, BACKPRESSURE AND ANOMALIES LOGS IN KIBANA REAL-TIME TIME SERIES DATA IN GRAFANA KAFKA OFFSET MONITOR TO MONITOR CONSUMER LAG
  26. 26. Rovio © 2018
  27. 27. Rovio © 2018 HOW TO BOOTSTRAP FLINK TO EMR CLUSTER?
  28. 28. Rovio © 2018
  29. 29. Rovio © 2018 HOW TO START FROM THE LATEST RESTORE POINT?
  30. 30. Rovio © 2018 FIND THE LATEST CHECKPOINT OR SAVEPOINT… … AND PASS IT AS ARGUMENT!
  31. 31. Rovio © 2018 HOW TO HANDLE IDLE PARTITIONS?
  32. 32. Rovio © 2018 Source: https://kafka.apache.org/intro PRODUCER SENDS EMPTY MESSAGE WITH EVENT TIMESTAMP TO EVERY PARTITION
  33. 33. Rovio © 2018 HOW TO INITIALIZE THE STATE?
  34. 34. Rovio © 2018 1. Use same uid and state name in the init and main job steps 2. Backfill history from files using streaming and PROCESS_CONTINUOSLY mode 3. Monitor Flink job and save the state when no data is being processed anymore 4. Start your streaming job from the save point and start processing Kafka events from specific point of time
  35. 35. Rovio © 2018 WITHFLINK1.5,KAFKA0.10ANDABOVE 1. Add timestamp to Kafka message 2. Use FlinkKafkaConsumer setStartFromTimestamp(…) How did we do it for older versions?
  36. 36. Rovio © 2018 ANATOMYOFKAFKATOPIC ARRAYS!!! How does one search data from arrays? Source: https://kafka.apache.org/intro
  37. 37. Rovio © 2018 BINARYSEARCH 2018-08-01 12:00:00 2018-08-01 12:00:05 2018-08-01 12:00:10 2018-08-01 12:00:15 2018-08-01 12:00:20 2018-08-01 12:00:25 2018-08-01 12:00:30 2018-08-01 12:00:35 Search offset having data starting from 2018-08-01 12:00:03 2018-08-01 12:00:40 2018-08-01 12:00:00 2018-08-01 12:00:05 2018-08-01 12:00:10 2018-08-01 12:00:15 2018-08-01 12:00:20 2018-08-01 12:00:25 2018-08-01 12:00:30 2018-08-01 12:00:35 2018-08-01 12:00:40 Read message from the middle offset. 12:00:20 > 12:00:03, search from 1st half. 2018-08-01 12:00:00 2018-08-01 12:00:05 2018-08-01 12:00:10 2018-08-01 12:00:15 12:00:05 is close enough, start consuming from this offset. 2018-08-01 12:00:00 2018-08-01 12:00:05 2018-08-01 12:00:10 2018-08-01 12:00:15 2018-08-01 12:00:20 2018-08-01 12:00:25 2018-08-01 12:00:30 2018-08-01 12:00:35 Search offset having data starting from 2018-08-01 12:00:03 2018-08-01 12:00:40 2018-08-01 12:00:00 2018-08-01 12:00:05 2018-08-01 12:00:10 2018-08-01 12:00:15 2018-08-01 12:00:20 2018-08-01 12:00:25 2018-08-01 12:00:30 2018-08-01 12:00:35 2018-08-01 12:00:40 Read message from the middle offset. 12:00:20 > 12:00:03, search from 1st half. 2018-08-01 12:00:00 2018-08-01 12:00:05 2018-08-01 12:00:10 2018-08-01 12:00:15 12:00:05 is close enough, start consuming from this offset. 2018-08-01 12:00:00 2018-08-01 12:00:05 2018-08-01 12:00:10 2018-08-01 12:00:15 2018-08-01 12:00:20 2018-08-01 12:00:25 2018-08-01 12:00:30 2018-08-01 12:00:35 Search offset having data starting from 2018-08-01 12:00:03 2018-08-01 12:00:40 2018-08-01 12:00:00 2018-08-01 12:00:05 2018-08-01 12:00:10 2018-08-01 12:00:15 2018-08-01 12:00:20 2018-08-01 12:00:25 2018-08-01 12:00:30 2018-08-01 12:00:35 2018-08-01 12:00:40 Read message from the middle offset. 12:00:20 > 12:00:03, search from 1st half. 2018-08-01 12:00:00 2018-08-01 12:00:05 2018-08-01 12:00:10 2018-08-01 12:00:15 12:00:05 is close enough, start consuming from this offset. 2018-08-01 12:00:00 2018-08-01 12:00:05 2018-08-01 12:00:10 2018-08-01 12:00:15 2018-08-01 12:00:20 2018-08-01 12:00:25 2018-08-01 12:00:30 2018-08-01 12:00:35 Search offset having data starting from 2018-08-01 12:00:03 2018-08-01 12:00:40 2018-08-01 12:00:00 2018-08-01 12:00:05 2018-08-01 12:00:10 2018-08-01 12:00:15 2018-08-01 12:00:20 2018-08-01 12:00:25 2018-08-01 12:00:30 2018-08-01 12:00:35 2018-08-01 12:00:40 Read message from the middle offset. 12:00:20 > 12:00:03, search from 1st half. 2018-08-01 12:00:00 2018-08-01 12:00:05 2018-08-01 12:00:10 2018-08-01 12:00:15 12:00:05 is close enough, start consuming from this offset. 2018-08-01 12:00:00 2018-08-01 12:00:05 2018-08-01 12:00:10 2018-08-01 12:00:15 2018-08-01 12:00:20 2018-08-01 12:00:25 2018-08-01 12:00:30 2018-08-01 12:00:35 Search offset having data starting from 2018-08-01 12:00:03 2018-08-01 12:00:40 2018-08-01 12:00:00 2018-08-01 12:00:05 2018-08-01 12:00:10 2018-08-01 12:00:15 2018-08-01 12:00:20 2018-08-01 12:00:25 2018-08-01 12:00:30 2018-08-01 12:00:35 2018-08-01 12:00:40 Read message from the middle offset. 12:00:20 > 12:00:03, search from 1st half. 2018-08-01 12:00:00 2018-08-01 12:00:05 2018-08-01 12:00:10 2018-08-01 12:00:15 12:00:05 is close enough, start consuming from this offset.
  38. 38. Rovio © 2018 SUMMARY ROVIO MAKES GREAT GAMES AND DATA IS HEAVILY USED IN EVERY STEP OF THE GAMES LIFE-CYCLE STREAMING IS USED TO IMPROVE TIME TO DATA ESPECIALLLY IN CASES WHERE AUTOMATED DECISION MAKING IS BEING DONE FOR PRODUCTION USE ONE NEEDS ADDITIONAL FEATURES ON TOP OF THE VANILLA FLINK PLATFORM ESPECIALLY AROUND OPERATIONS AND STATE MANAGEMENT
  39. 39. Rovio © 2018 THANKS! Rovio © 2017 Confidential

×