Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

2,321 views

Published on

"Glu uses Amazon Kinesis, Apache Storm, S3, and Hadoop to collect billions of data points from millions of user devices in real-time every single day. This session describes how Glu built and configured an array of producers to submit real-time gaming events into Amazon Kinesis, using temporary tokens from Amazon Cognito, removing the need for an intermediate store-forward fleet. We then discuss how we've been able to easily integrate Amazon Kinesis with powerful open-source technologies such as Apache Storm and the Hadoop ecosystem. Finally, we discuss KCL optimizations and tradeoffs to manage a scalable, flexible, mission-critical streaming data platform.

Published in: Technology

(GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Jordan Young – Manager, Analytics Engineering, Glu Mobile October 2015 GAM406 Glu Mobile An Amazon Kinesis-centric data platform to process real-time gaming events for 10+ million user devices
  2. 2. What to expect from the session • Glu Mobile: Data requirements and challenges • Architecture overview and decisions • Amazon Kinesis: Producers, Streams, and the Amazon Kinesis Connector Library • Real Time: Storm and Amazon Kinesis Storm Spout • Other challenges and insights
  3. 3. Glu mobile basics • A mobile gaming leader across genres • 4 titles in top 100 grossing (US) (9/24/15) • 4–6 million daily active users (DAU) (typical, 2015) • 1 billion+ global installs (2010-)
  4. 4. What we collect High, variable volume • 700 million to 2+ billion events per day • 600 bytes per event • Up to 1.2 TB per day • Could scale up further with successful game launch Multiple sources • Client side SDKs • Game servers, central services servers • Attribution partners • Ad networks • Third parties
  5. 5. Basic requirements • Near zero data loss • High levels of uptime • Flexible data format — JSON with arbitrary fields • Real-time aggregations • Reasonably low latency for ad-hoc queries (hourly batching OK)
  6. 6. Other requirements • Not expensive • Can be implemented with minimal engineering effort • Requires minimal changes to existing games
  7. 7. Architecture: Past, Present, and Why
  8. 8. First redesign
  9. 9. First redesign
  10. 10. Next Step: Bring data collection in house • Build our own analytics SDK • Need a framework for collecting data from SDK • Options: • Build our own streaming and collection (Apache Kafka) • Use a hosted service (Amazon Kinesis)
  11. 11. Amazon Kinesis: Producers, Streams, and the Amazon Kinesis Connector Library
  12. 12. What is Amazon Kinesis?
  13. 13. Why Amazon Kinesis? • Minimal setup time • Prebuilt applications (Amazon Kinesis Connector Library [KCL], Amazon Kinesis Storm Spout) • Extremely minimal maintenance • Minimal hardware • No significant price advantages either way (vs Kafka)
  14. 14. Producers • Custom built client SDKs • Native Android (Java), Native iOS (Obj-C) plug-ins • Unity wrapper for unity titles • Built on top of AWS SDKs (for each platform) • Implements our internal analytical schema / standards • Additional server-side implementations
  15. 15. Producers (continued) • Vanilla KinesisRecord.submitAllRecords() • No record batching • No compression • Records flushed every 30 seconds, or on certain events • Client authentication using Amazon Cognito • Server authentication using AWS Identity and Access Management (IAM) profiles
  16. 16. How many shards? • Shard limits: • 1,000 records per second • 1 MB per sec writes • 2 MB per sec read • Our situation: 20,000 RPS, 600 bytes per message • Need at least 20 shards to handle message count • Only need 12 MB per sec write capacity • 20 shards = 40 MB per sec read capacity • Up to 3 apps OK (36 MB < 40 MB) • Other considerations (peak load, excess capacity)
  17. 17. Consumers: Amazon Kinesis Connector Library
  18. 18. Consumers: KCL pipeline public class S3Pipeline implements IKinesisConnectorPipeline<String, byte[]> { @Override public ITransformer<String, byte[]> getTransformer(KinesisConnectorConfiguration configuration) { return new GluJsonTransformer(); } @Override public IFilter<String> getFilter(KinesisConnectorConfiguration configuration) { return new GluMessageFilter<String>(); } @Override public IBuffer<String> getBuffer(KinesisConnectorConfiguration configuration) { return new BasicMemoryBuffer<String>(configuration); } @Override public IEmitter<byte[]> getEmitter(KinesisConnectorConfiguration configuration) { return new GluS3Emitter(configuration); } }
  19. 19. Consumers: Transformer implementation ex. @Override public String toClass(Record record) { String json_str = ""; try { json_str = new String(record.getData().array()); } catch (Exception e) { return null; } if (json_str != null && !json_str.isEmpty()) { if (json_str.startsWith("{") && json_str.endsWith("}")) { json_str = json_str.substring(0, json_str.length()-1); if (json_str.length() > 3) { json_str += ","; json_str = json_str + ""kin_seq_num":"" + record.getSequenceNumber() + "","; json_str = json_str + ""server_ts":" + System.currentTimeMillis() + "}"; } } } return json_str; } @Override public byte[] fromClass(String record) { return record.getBytes(); }
  20. 20. Real Time: Storm and the Amazon Kinesis Storm Spout
  21. 21. Storm and real-time data • Distributed, fault-tolerant • Processes real-time data • Views records as “tuples” which are passed through an arbitrary DAG of nodes (a topology) • Spouts: Emit tuples into the topology • Bolts: Process tuples in the topology • Read from anywhere, write to anywhere
  22. 22. Storm cluster architecture
  23. 23. Storm: Real-time aggregation
  24. 24. Implementing the Amazon Kinesis Storm Spout //Define configuration parameters final KinesisSpoutConfig config = new KinesisSpoutConfig(streamName, zookeeperEndpoint).withZookeeperPrefix(zookeeperPrefix) .withKinesisRecordScheme(new DefaultKinesisRecordScheme()) .withRecordRetryLimit(recordRetryLimit) .withInitialPositionInStream(initialPositionInStream) //LATEST or TRIM_HORIZON .withEmptyRecordListBackoffMillis(emptyRecordListBackoffMillis); //Create Spout final KinesisSpout spout = new KinesisSpout(config, new CustomCredentialsProviderChain(awsAccessKey, awsSecretKey), new ClientConfiguration()); //Set Spout in Topology and define parallelism builder.setSpout("kinesis_spout", spout, num_spout_executors);
  25. 25. Storm: Lessons • Only extract necessary fields when deserializing • Big instances with few workers (JVMs) • Too many workers can reduce speeds • Balance flexibility vs. speed • Final state (throughput and hardware) • Can handle up to ~42K RPS on (4) c4.2xlarge instances • Using (2) m3.large for ZooKeeper, m3.xlarge for Nimbus
  26. 26. Glu’s new architecture
  27. 27. Challenges and Insights
  28. 28. Challenge: Shards, buffers, and file size • Each shard is handled by its own buffer • A KCL instance needs a buffer for each shard • More shards per machine = more memory • Avoid memory bottleneck by reducing buffer size  Creates more, smaller files • Hadoop does not like this!
  29. 29. Solution: CombineFileInputFormat
  30. 30. Solution: CombineFileInputFormat (continued)
  31. 31. Challenge: No IP address on record • Record sent to stream without IP • Device doesn’t know its own IP • Amazon Kinesis does not provide client IP • But rely on IP address for GEO lookup • No geographic splits • Big problem
  32. 32. Solution: Geo lookup service (v1)
  33. 33. Solution: Geo lookup service (v2)
  34. 34. Challenge: Scaling • Can our system scale with minimal effort and impact? • Stream •  Can scale up / down with Amazon Kinesis Scaling Utils • Consumers •  Can add more machines, pegged to shard count rather than records if memory bottlenecked • Hadoop •  Add more nodes, but enough extra room • Storm?
  35. 35. Kinesis Scaling Utils $ java -cp KinesisScalingUtils.jar-complete.jar -Dstream-name=<stream name> -Dscaling-action=resize –Dcount=<new shard count> ScalingClient Github: Kinesis Scaling Utils
  36. 36. Scaling and Amazon Kinesis Storm Spout • Assigning tasks to shards • Required topology restart so that zookeeper could refresh the shard list • Solved in 1.1.1; now only requires “storm rebalance” • Need to be sure that withEmptyRecordListBackoffMillis setting is adequately low (defaults to 5 post 1.1.0) • Loss of state • Restarting / rebalancing causes tasks to lose their state. • Breaks topology operations, which require state such as unique counts and joins.
  37. 37. Redis to the rescue • Redis is a scalable, in-memory key-value store • Solution: Store long running state to Redis • Count unique values using “sets” • Perform joins using key-value hashes • Easy deployment / management using Amazon ElastiCache
  38. 38. Redis unique counter: Local aggregator @Override public void execute(Tuple tuple) { if (!(TupleHelpers.isTickTuple(tuple))) { addItemToSet(listOfIds, tuple.getID()); } else { emit(listOfIds); } collector.ack(tuple); } private void addItemToSet(HashSet<String> listOfIds, String id) { listOfIds.add(id); } private void emit(listOfIds) { Jedis jedis = redisPool.getResource(); Pipeline redisPipeline = jedis.pipelined(); String redisKeyName = "dau:day:" + FormatDateTime.getCurrentFormattedTimestamp("yyyy-MM-dd"); ttlSec = 60 * 60 * 24; for(String id : listOfIds){ redisPipeline.sadd(redisKeyName, id); } redisPipeline.sync(); if(jedis.ttl(redisKeyName) == -1 ) { jedis.expire(redisKeyName, ttlSec); } jedis.close(); listOfIds.clear() }
  39. 39. Redis unique counter: Global aggregator @Override public void execute(Tuple tuple) { Jedis jedis = redisPool.getResource(); String redisKeyName = "dau:day:" + FormatDateTime.getCurrentFormattedTimestamp("yyyy-MM-dd"); Double unique_count = jedis.scard(redisKeyName).doubleValue(); jedis.close(); emitToWherever(redisKeyName, unique_count); collector.ack(tuple); } @Override public Map<String, Object> getComponentConfiguration() { Map<String, Object> conf = new HashMap<String, Object>(); conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, emitFrequencyInSeconds); return conf; }
  40. 40. In closing • Amazon Kinesis Connector Library makes basic consumer applications simple • Amazon Kinesis Storm Spout enables real-time processing • Optimize Hadoop file size with CombineFileInputFormat • Geo Lookup service in lieu of Amazon Kinesis API • Scale with Amazon Kinesis scaling utils and Storm Spout 1.1.1
  41. 41. Thank you! GAM406
  42. 42. Remember to complete your evaluations!

×