Big Data Real-Time Analytics in Erlang
Upcoming SlideShare
Loading in...5
×
 

Big Data Real-Time Analytics in Erlang

on

  • 2,018 views

Game Analytics is building the next-generation analytics platform for the gaming industry. We want to make advanced machine learning available to even the small studios with just three guys in a ...

Game Analytics is building the next-generation analytics platform for the gaming industry. We want to make advanced machine learning available to even the small studios with just three guys in a garage. We want to have all the analytics in real-time.

We're building a real-time stream processing analytical engine in Erlang. As data comes in from our customers games, we process it and make it immediately available to our customers. In this presentation we will present the challenges we're facing, the architecture we have come up with and some really clever implementation details. We're preparing for massive volumes of data and need to be ready when big customers start using our product. Come see the challenges we're facing and how we're using Erlang to build the next-generation analytics platform.

Statistics

Views

Total Views
2,018
Views on SlideShare
1,897
Embed Views
121

Actions

Likes
1
Downloads
20
Comments
0

2 Embeds 121

https://twitter.com 114
http://blog.gameanalytics.com 7

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Big Data Real-Time Analytics in Erlang Big Data Real-Time Analytics in Erlang Presentation Transcript

  • Big Data Real-Time Analytics Knut Nesheim · @knutin GameAnalytics
  • Topics ‣ What we do ‣ What we want ‣ Our solution
  • Me ‣ Klarna ‣ Wooga ‣ Game Analytics ‣ github.com/knutin View slide
  • Analytics SaaS for games View slide
  • Game Analytics ‣ Startup, venture funded, ~2 years old ‣ HQ in Copenhagen, engineering in Berlin ‣ Need to move fast ‣ Willing to take on technical debt ‣ Be ready for traffic growth ‣ Big games are big, millions of DAU
  • GA.Event.Business(“sheep”, “gold”, 200);
  • Metrics ‣ Daily Active Users, Monthly Active Users ‣ Revenue ‣ Histogram of event values
  • Idea: some real-time metrics Suitable subset
  • Scope creep: real-time everything GA v2.0
  • MapReduce architecture, v1.0
  • iOS SDK Android SDK Unity SDK HTTP Online Collectors S3 MapReduce MongoDB Website Work queue Special stuff Offline
  • Streaming architecture, v2.0
  • iOS SDK Android SDK Unity SDK HTTP Collectors S3 or H ist Real-time DynamoDB y Stream Query API Website Customers
  • Today 2013-10-14 2013-10-15 DynamoDB RAM
  • Streaming ‣ Partition on game ‣ One process per game, autonomous ‣ Prototype very promising
  • Implementation
  • Game process ‣ Process files sequentially ‣ Keep running results in RAM ‣ Flush to DB when window closes ‣ Answer real-time queries ‣ Put all in one node
  • “Metric DSL”
  • HyperLogLog ‣ Estimate cardinality ‣ Clever hash tricks ‣ Millions unique with <1% error in ~40k words ‣ Unions ‣ “hyper”: Erlang HLL++ from Google paper[0] ‣ Will open source Soon (TM) [0]: “HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm”
  • Recordinality[0] ‣ Estimate frequency of values ‣ User session count ‣ Keeps a reservoir, clever hash tricks ‣ Will open source Soon (TM) [0]: “Data Streams as Random Permutations: the Distinct Element Problem”
  • Next step: More parallelization
  • Game #123 loop(State) -> NewState = process(next_file(), State), loop(NewState).
  • Worker #1 Part = process(next_file(), empty()), game_123 ! Part. Game #123 loop(State) -> receive Part -> NewState = merge(Part, State)), loop(NewState) end.
  • Worker #1 Game #123 Worker #2 Worker #N Manager
  • Mapper #1 Reducer #123 Mapper #2 Mapper #N Scheduler
  • Scheduler ‣ Take ideas from Hadoop, Riak Pipe ‣ Manage limited resources ‣ Distribute work across nodes ‣ Colocate processing with state storage
  • Implementation ‣ DIY? ‣ riak_core for state processes? ‣ riak_pipe for managing work?
  • Conclusion
  • Erlang: The bad parts ‣ Big process state not great ‣ Lots of updates, lots of garbage
  • Erlang: The good parts ‣ Total allocated RAM >30GB ‣ Per process heap is gold ‣ Easy parallelization, distribution ‣ Looking forward to maps!
  • Questions? Knut Nesheim · @knutin GameAnalytics
  • www.gameanalytics.com