0
Big Data Real-Time Analytics

Knut Nesheim · @knutin
GameAnalytics
Topics
‣

What we do

‣

What we want

‣

Our solution
Me
‣

Klarna

‣

Wooga

‣

Game Analytics

‣

github.com/knutin
Analytics SaaS for games
Game Analytics
‣

Startup, venture funded, ~2 years old

‣

HQ in Copenhagen, engineering in Berlin

‣

Need to move fast
...
GA.Event.Business(“sheep”, “gold”, 200);
Metrics

‣

Daily Active Users, Monthly Active Users

‣

Revenue

‣

Histogram of event values
Idea: some real-time metrics
Suitable subset
Scope creep: real-time everything

GA v2.0
MapReduce architecture, v1.0
iOS SDK

Android SDK

Unity SDK

HTTP
Online
Collectors
S3

MapReduce
MongoDB

Website

Work queue
Special stuff

Offline
Streaming architecture, v2.0
iOS SDK

Android SDK

Unity SDK

HTTP

Collectors
S3

or
H
ist

Real-time

DynamoDB
y

Stream

Query API
Website

Customer...
Today
2013-10-14

2013-10-15
DynamoDB

RAM
Streaming
‣

Partition on game

‣

One process per game, autonomous

‣

Prototype very promising
Implementation
Game process
‣

Process files sequentially

‣

Keep running results in RAM

‣

Flush to DB when window closes

‣

Answer re...
“Metric DSL”
HyperLogLog
‣

Estimate cardinality

‣

Clever hash tricks

‣

Millions unique with <1% error in ~40k words

‣

Unions

‣
...
Recordinality[0]
‣

Estimate frequency of values

‣

User session count

‣

Keeps a reservoir, clever hash tricks

‣

Will...
Next step: More parallelization
Game #123
loop(State) ->
NewState = process(next_file(), State),
loop(NewState).
Worker #1
Part = process(next_file(), empty()),
game_123 ! Part.

Game #123
loop(State) ->
receive
Part ->
NewState = merg...
Worker #1

Game #123

Worker #2

Worker #N

Manager
Mapper #1

Reducer #123

Mapper #2

Mapper #N

Scheduler
Scheduler
‣

Take ideas from Hadoop, Riak Pipe

‣

Manage limited resources

‣

Distribute work across nodes

‣

Colocate ...
Implementation
‣

DIY?

‣

riak_core for state processes?

‣

riak_pipe for managing work?
Conclusion
Erlang: The bad parts
‣

Big process state not great

‣

Lots of updates, lots of garbage
Erlang: The good parts
‣

Total allocated RAM >30GB

‣

Per process heap is gold

‣

Easy parallelization, distribution

‣...
Questions?
Knut Nesheim · @knutin
GameAnalytics
www.gameanalytics.com
Upcoming SlideShare
Loading in...5
×

Big Data Real-Time Analytics in Erlang

2,765

Published on

Game Analytics is building the next-generation analytics platform for the gaming industry. We want to make advanced machine learning available to even the small studios with just three guys in a garage. We want to have all the analytics in real-time.

We're building a real-time stream processing analytical engine in Erlang. As data comes in from our customers games, we process it and make it immediately available to our customers. In this presentation we will present the challenges we're facing, the architecture we have come up with and some really clever implementation details. We're preparing for massive volumes of data and need to be ready when big customers start using our product. Come see the challenges we're facing and how we're using Erlang to build the next-generation analytics platform.

Published in: Technology, Business
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,765
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
45
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Transcript of "Big Data Real-Time Analytics in Erlang"

  1. 1. Big Data Real-Time Analytics Knut Nesheim · @knutin GameAnalytics
  2. 2. Topics ‣ What we do ‣ What we want ‣ Our solution
  3. 3. Me ‣ Klarna ‣ Wooga ‣ Game Analytics ‣ github.com/knutin
  4. 4. Analytics SaaS for games
  5. 5. Game Analytics ‣ Startup, venture funded, ~2 years old ‣ HQ in Copenhagen, engineering in Berlin ‣ Need to move fast ‣ Willing to take on technical debt ‣ Be ready for traffic growth ‣ Big games are big, millions of DAU
  6. 6. GA.Event.Business(“sheep”, “gold”, 200);
  7. 7. Metrics ‣ Daily Active Users, Monthly Active Users ‣ Revenue ‣ Histogram of event values
  8. 8. Idea: some real-time metrics Suitable subset
  9. 9. Scope creep: real-time everything GA v2.0
  10. 10. MapReduce architecture, v1.0
  11. 11. iOS SDK Android SDK Unity SDK HTTP Online Collectors S3 MapReduce MongoDB Website Work queue Special stuff Offline
  12. 12. Streaming architecture, v2.0
  13. 13. iOS SDK Android SDK Unity SDK HTTP Collectors S3 or H ist Real-time DynamoDB y Stream Query API Website Customers
  14. 14. Today 2013-10-14 2013-10-15 DynamoDB RAM
  15. 15. Streaming ‣ Partition on game ‣ One process per game, autonomous ‣ Prototype very promising
  16. 16. Implementation
  17. 17. Game process ‣ Process files sequentially ‣ Keep running results in RAM ‣ Flush to DB when window closes ‣ Answer real-time queries ‣ Put all in one node
  18. 18. “Metric DSL”
  19. 19. HyperLogLog ‣ Estimate cardinality ‣ Clever hash tricks ‣ Millions unique with <1% error in ~40k words ‣ Unions ‣ “hyper”: Erlang HLL++ from Google paper[0] ‣ Will open source Soon (TM) [0]: “HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm”
  20. 20. Recordinality[0] ‣ Estimate frequency of values ‣ User session count ‣ Keeps a reservoir, clever hash tricks ‣ Will open source Soon (TM) [0]: “Data Streams as Random Permutations: the Distinct Element Problem”
  21. 21. Next step: More parallelization
  22. 22. Game #123 loop(State) -> NewState = process(next_file(), State), loop(NewState).
  23. 23. Worker #1 Part = process(next_file(), empty()), game_123 ! Part. Game #123 loop(State) -> receive Part -> NewState = merge(Part, State)), loop(NewState) end.
  24. 24. Worker #1 Game #123 Worker #2 Worker #N Manager
  25. 25. Mapper #1 Reducer #123 Mapper #2 Mapper #N Scheduler
  26. 26. Scheduler ‣ Take ideas from Hadoop, Riak Pipe ‣ Manage limited resources ‣ Distribute work across nodes ‣ Colocate processing with state storage
  27. 27. Implementation ‣ DIY? ‣ riak_core for state processes? ‣ riak_pipe for managing work?
  28. 28. Conclusion
  29. 29. Erlang: The bad parts ‣ Big process state not great ‣ Lots of updates, lots of garbage
  30. 30. Erlang: The good parts ‣ Total allocated RAM >30GB ‣ Per process heap is gold ‣ Easy parallelization, distribution ‣ Looking forward to maps!
  31. 31. Questions? Knut Nesheim · @knutin GameAnalytics
  32. 32. www.gameanalytics.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×