"Glu uses Amazon Kinesis, Apache Storm, S3, and Hadoop to collect billions of data points from millions of user devices in real-time every single day. This session describes how Glu built and configured an array of producers to submit real-time gaming events into Amazon Kinesis, using temporary tokens from Amazon Cognito, removing the need for an intermediate store-forward fleet. We then discuss how we've been able to easily integrate Amazon Kinesis with powerful open-source technologies such as Apache Storm and the Hadoop ecosystem. Finally, we discuss KCL optimizations and tradeoffs to manage a scalable, flexible, mission-critical streaming data platform.
2. What to expect from the session
• Glu Mobile: Data requirements and challenges
• Architecture overview and decisions
• Amazon Kinesis: Producers, Streams, and the Amazon
Kinesis Connector Library
• Real Time: Storm and Amazon Kinesis Storm Spout
• Other challenges and insights
3. Glu mobile basics
• A mobile gaming leader across genres
• 4 titles in top 100 grossing (US) (9/24/15)
• 4–6 million daily active users (DAU) (typical, 2015)
• 1 billion+ global installs (2010-)
4. What we collect
High, variable volume
• 700 million to 2+ billion
events per day
• 600 bytes per event
• Up to 1.2 TB per day
• Could scale up further with
successful game launch
Multiple sources
• Client side SDKs
• Game servers, central
services servers
• Attribution partners
• Ad networks
• Third parties
5. Basic requirements
• Near zero data loss
• High levels of uptime
• Flexible data format — JSON with arbitrary fields
• Real-time aggregations
• Reasonably low latency for ad-hoc queries
(hourly batching OK)
6. Other requirements
• Not expensive
• Can be implemented with minimal engineering effort
• Requires minimal changes to existing games
10. Next Step: Bring data collection in house
• Build our own analytics SDK
• Need a framework for collecting data from SDK
• Options:
• Build our own streaming and collection (Apache Kafka)
• Use a hosted service (Amazon Kinesis)
13. Why Amazon Kinesis?
• Minimal setup time
• Prebuilt applications (Amazon Kinesis Connector Library
[KCL], Amazon Kinesis Storm Spout)
• Extremely minimal maintenance
• Minimal hardware
• No significant price advantages either way (vs Kafka)
14. Producers
• Custom built client SDKs
• Native Android (Java), Native iOS (Obj-C) plug-ins
• Unity wrapper for unity titles
• Built on top of AWS SDKs (for each platform)
• Implements our internal analytical schema / standards
• Additional server-side implementations
15. Producers (continued)
• Vanilla KinesisRecord.submitAllRecords()
• No record batching
• No compression
• Records flushed every 30 seconds, or on certain events
• Client authentication using Amazon Cognito
• Server authentication using AWS Identity and Access
Management (IAM) profiles
16. How many shards?
• Shard limits:
• 1,000 records per second
• 1 MB per sec writes
• 2 MB per sec read
• Our situation: 20,000 RPS, 600 bytes per message
• Need at least 20 shards to handle message count
• Only need 12 MB per sec write capacity
• 20 shards = 40 MB per sec read capacity
• Up to 3 apps OK (36 MB < 40 MB)
• Other considerations (peak load, excess capacity)
21. Storm and real-time data
• Distributed, fault-tolerant
• Processes real-time data
• Views records as “tuples” which are passed through an
arbitrary DAG of nodes (a topology)
• Spouts: Emit tuples into the topology
• Bolts: Process tuples in the topology
• Read from anywhere, write to anywhere
24. Implementing the Amazon Kinesis Storm Spout
//Define configuration parameters
final KinesisSpoutConfig config =
new KinesisSpoutConfig(streamName, zookeeperEndpoint).withZookeeperPrefix(zookeeperPrefix)
.withKinesisRecordScheme(new DefaultKinesisRecordScheme())
.withRecordRetryLimit(recordRetryLimit)
.withInitialPositionInStream(initialPositionInStream) //LATEST or TRIM_HORIZON
.withEmptyRecordListBackoffMillis(emptyRecordListBackoffMillis);
//Create Spout
final KinesisSpout spout = new KinesisSpout(config, new CustomCredentialsProviderChain(awsAccessKey,
awsSecretKey), new ClientConfiguration());
//Set Spout in Topology and define parallelism
builder.setSpout("kinesis_spout", spout, num_spout_executors);
25. Storm: Lessons
• Only extract necessary fields when deserializing
• Big instances with few workers (JVMs)
• Too many workers can reduce speeds
• Balance flexibility vs. speed
• Final state (throughput and hardware)
• Can handle up to ~42K RPS on (4) c4.2xlarge instances
• Using (2) m3.large for ZooKeeper, m3.xlarge for Nimbus
28. Challenge: Shards, buffers, and file size
• Each shard is handled by its own buffer
• A KCL instance needs a buffer for each shard
• More shards per machine = more memory
• Avoid memory bottleneck by reducing buffer size
Creates more, smaller files
• Hadoop does not like this!
31. Challenge: No IP address on record
• Record sent to stream without IP
• Device doesn’t know its own IP
• Amazon Kinesis does not provide client IP
• But rely on IP address for GEO lookup
• No geographic splits
• Big problem
34. Challenge: Scaling
• Can our system scale with minimal effort and impact?
• Stream
• Can scale up / down with Amazon Kinesis Scaling Utils
• Consumers
• Can add more machines, pegged to shard count rather
than records if memory bottlenecked
• Hadoop
• Add more nodes, but enough extra room
• Storm?
36. Scaling and Amazon Kinesis Storm Spout
• Assigning tasks to shards
• Required topology restart so that zookeeper could refresh the
shard list
• Solved in 1.1.1; now only requires “storm rebalance”
• Need to be sure that withEmptyRecordListBackoffMillis
setting is adequately low (defaults to 5 post 1.1.0)
• Loss of state
• Restarting / rebalancing causes tasks to lose their state.
• Breaks topology operations, which require state such as
unique counts and joins.
37. Redis to the rescue
• Redis is a scalable, in-memory key-value store
• Solution: Store long running state to Redis
• Count unique values using “sets”
• Perform joins using key-value hashes
• Easy deployment / management using Amazon ElastiCache
40. In closing
• Amazon Kinesis Connector Library makes basic
consumer applications simple
• Amazon Kinesis Storm Spout enables real-time
processing
• Optimize Hadoop file size with CombineFileInputFormat
• Geo Lookup service in lieu of Amazon Kinesis API
• Scale with Amazon Kinesis scaling utils and Storm
Spout 1.1.1