1. Scala Conference in Japan 2013
Data Stream Processing and Analysis with
Akka
CyberAgent Inc., Ameba Technology Laboratory
○ Roman Y. Shtykh, Mitsuharu Makita
2. 株式会社サイバーエージェント
2
Self Introduction
シュティフ・ロマン
• Roman Y. Shtykh
• R&D Engineer
• Ameba Technology Lab
• Interests:
• Distributed Computing
• Information Retrieval
3. 株式会社サイバーエージェント
3
Contents
• Ameba Services
• Ameba Technology Lab
• Onix: Data Stream Processing and Analysis
• Choice of Akka
4. 株式会社サイバーエージェント
4
Ameba Services
・ Monthly PV :
33.1
billion
・ Number of users :
about 24 million
* As of Sep 2012
7. 株式会社サイバーエージェント
7
Ameba Technology Laboratory
・ From April 2011
・ About 20 engineers
8. 株式会社サイバーエージェント
8
Technology Areas
Smartphone
News Blog games
Pigg Others
profile Now
Ameba
Search Data Mining
Large-
Scale
Distribute
Pigg Messaging
Blog Gruppo d Data
Blog Gruppo
Processing
Recommenders Filtering
9. 株式会社サイバーエージェント
9
Onix
• Generic real-time event processing and analysis platform
• Emphasis on
• Low latency event processing with delivery guarantees
• => correct analysis real-time
• Being replayable if failure still occurs
• Scalability
• Extensibility
• Built on Akka 2.0.x
10. 株式会社サイバーエージェント
10
System Overview
• Collectors
• data receivers
• Queue
• Sprinklers
• pulling and scattering event data
• Processors
• processing according to the specified
tasks
23. 株式会社サイバーエージェント
23
Scaling Out:
Sprinklers and Processors
Larger View
24. 株式会社サイバーエージェント
24
Extensibility
Adding a new functionality is as simple as
• Writing a processor plugin extending the base class
• Registering it to the processor pool
Processing functionality becomes available to all services
26. 株式会社サイバーエージェント
26
Possible Scenario: Integration with Patriot
• Log processing platform at Ameba
• since 2010
• Hadoop/Hive, HBase
• Working with
• User behavioral logs
• Access logs
• Textual data (blogs, etc.)
Enabling lambda architecture with Onix!
27. 株式会社サイバーエージェント
27
Possible Scenario: Integration with Patriot
• Log processing platform at Ameba
• since 2010
• Hadoop/Hive, HBase,
Onix (speed layer)
• Working with
• User behavioral logs
• Access logs
• Textual data (blogs, etc.)
• Supplementing Patriot with
real-time analytics
• Incremental update of
real-time views
• Emphasis on the latest data
28. 株式会社サイバーエージェント
28
Current State and Challenges
• Auto-scaling is not available yet
• Needed to reduce management complexity
• Additional APIs and protocols to support
• Only basic processing tasks are implemented
• In beta
29. 29
Processing Tasks
• Spam Detection
• Identifying spams from
articles, messages, comments, etc.
• Outlier Action Detection
• Detecting fraud acts on online game
• Unauthorized users, etc. # e.g. R15+, R18+
• Unauthorized tools
• Abuses
• Detecting other anomalies
• DDoS attacks
• Invasions
• Failures
30. 株式会社サイバーエージェント
30
Why Akka?
• Gives loose coupling(疎結合)
and location transparency(ロケーション透過性) for design
• Actor Model
• Easy scaling
• Asynchronous and Distributed by design
• Remoting
Address addr = new Address("akka", "sys", "host", 1234);
ActorRef ref = system.actorOf(new Props(RemoteDeploymentDocSpec.Echo.class).withDeploy(
new Deploy(new RemoteScope(addr))));
• Microkernel
bin/akka sample.kernel.hello.HelloKernel
• No visibility(可視性) and race condition(競合状態) concerns
-> lower complexity of the system
• Isolated Mutability (単独可変性) provided by Actor Concurrency Model(並列モデル)
31. 株式会社サイバーエージェント
31
Why Akka?
• “Let It Crash” Philosophy
• Surviving failures in a simple way instead of doing defensive programming
• Don’t be afraid to crash!
• Java API
• Open Source
• Responsive Community
32. 株式会社サイバーエージェント
32
Some Java-related ‘problems’
• Scala API is richer than Java API
• SBT is required if you want to do something 'fancy‘
• application.conf must be merged with multiple
reference.conf manually if you need a fat jar