1. Lambda Architecture
and Open Source Tools for
Real-time Big Data
● Concepts & Techniques “Thinking with Lambda”
● Case studies in Practice
Trieu Nguyen - http://nguyentantrieu.info or @tantrieuf31
Principal Engineer at eClick Data Analytics team, FPT Online
All contents and thoughts in this slide are my subjective ideas and compiled from
2. Just a little introduction
● 2008 Java Developer, developed Social
Trading Network for a small startup (Yopco)
● 2011 worked at FPT Online, software engineer
in Banbe Project, Restful API for VnExpress
● 2012 joined Greengar Studios in 6 months,
scaling backend API mobile games (iOS, Android)
● 2013 back to FPT Online, R&D about Big Data
& Analytics, developing the new core
Analytics Platform (on JVM Platform)
3. Contents for this talk
The lessons from history
Problems In Practice
What is the Lambda Architecture?
Why lambda architecture for real-time big
Open Source Technology Stack
Lambda in Practice (Mobile Data and Web Data)
Lessons I have learned
Questions & Answers
4. History ?
The best way to predict the future is
looking at the past and now ?
5. Big data is a buzzword for
6. Explaining Big Data
7. Learning ?
8. Working ?
9. Big Data + Old History
10. This is Big DATA
This is most valuable things!
11. We can't solve problems
by using the same kind of
thinking we used when we
Think more with
Lambda and Reactive
12. Where Big Data
can be used
13. BBC Horizon 2013 The Age of Big Data
14. Google’s mission is to
the world’s information and make it
universally accessible and useful.
15. Organize the world’s
16. How did Google scale their search engine ?
How does Hadoop really work ?
18. Trends of Now and the Future
=> All just the special cases of Lambda
19. So what is the λ
20. the Lambda Architecture:
● apply the (λ) Lambda philosophy in designing big data
● equation “query = function(all data)” which is the basis of
all data systems
● proposed by Nathan Marz (http://nathanmarz.com/), a
software engineer from Twitter in his “Big Data” book.
● is based on three main design principles:
○ human fault-tolerance – the system is unsusceptible to data loss or data
corruption because at scale it could be irreparable. (BUGS ?)
○ data immutability – store data in it’s rawest form immutable and for
perpetuity. (INSERT/ SELECT/DELETE but no UPDATE !)
○ recomputation – with the two principles above it is always possible to
(re)-compute results by running a function on the raw data.
21. Lambda In Practice
2 case studies from my experiences
22. Case Study 1:
Monitor API Backend + System KPI
Inside “mobile data”,
What's the most
valuable piece of
24. I applied
Backend System for mobile app
25. Web vs Mobile App
26. Metrics: Cause and Effect
Screen Size => App Design, UI/UX, Usability
App version => Deployment, Marketing
Connectivity => Code, User Experience
Location => Marketing, User Behaviour
OS => Marketing, Cost, Development
Memory => User Experience
Feature Session => How to engage app users
27. The data and the size, not too big for a small
Where is the lambda ?
I used Groovy + GPars (Groovy Parallel Systems) + MongoDB for fast
parallel computation (actor model) on statistical data
The GPars framework offers Java developers intuitive and safe ways to handle
Java or Groovy tasks concurrently.
Actor programming model
Agent - an thread-safe reference to mutable state
Concurrent collection processing
Composable asynchronous functions
STM (Software Transactional Memory)
28. Mobile Apps => Backend APIs =>
Statistics => Find the Trends & Insights?
29. Reactive Data
It means real-time recommendation
➔ context (location, time)
➔ user profile (preferences, level,
30. Big Data on Small Devices: Data Science goes Mobile
31. Case Study 2:
● Real-time Data Analytics
● Monitoring Stream Data (Reactive)
32. at eClick we must
check campaigns in
at eClick we have
30~40 GB Logs in Stream
10~20 GB Bandwidth
just for tracking user
in ONE day !
at eClick we have many types of log (video, web,
mobile, system logs, ad-campaign, articles, … )
33. “lambda architecture”
proposed by @nathanmarz
the open-source lambda architecture at eClick
35. The big-data technology stack
● Netty (http://netty.io/) a framework using reactive programming
pattern for scaling HTTP system easier, by JBoss http://www.jboss.org
● Kafka (http://kafka.apache.org/) a publish-subscribe messaging
rethought as a distributed commit log, open sourced by Linkedin
● Storm (http://storm-project.net/) the framework for distributed
realtime computation system, by Twitter
● Redis (http://redis.io/) a advanced key-value in-memory NoSQL
database, all fast statistical computations in here.
● Groovy for scripting layer on JVM, ad-hoc query on Redis
● Hadoop ecosystem: HDFS, Hive, HBase for batch processing
● RxJava https://github.com/Netflix/RxJava a library for composing
asynchronous and event-based programs
● Hystrix https://github.com/Netflix/Hystrix : for Latency and Fault
Tolerance for Distributed Systems
36. My new ideas for the future
Connecting the active functor pattern + reactive programming +
stream computation + in-memory computing to make:
● real-time data analytics easier
● better recommendation system
● build more profitable in big data
● http://activefunctor.blogspot.com/ (a special case of Lambda
that actively search best connections to form optimal
topology) - from ideas when internship at DRD with my
● Can a function be persistent (stored as data), distributed in a
cluster (cloud), reactive to right data (best value in network) ?
● http://www.reactivemanifesto.org/ (reactive pattern)
What I have learned from Lambda and Big
38. What I have learned
Study about lambda and read some books
Ask questions=> analytics=> Profit & Value
Collect any data you can, learn inside !
Implement it! Just right tools for right jobs.
Turn your data into the things everyone can
"look & feel"
39. read papers
40. Study the “lambda”
I studied Haskell in 2007 with Dr.Peter Gammie http://peteg.org/ when
internship at DRD (a non-profit organization).
● Imperative programs will always be vulnerable to data races because
they contain mutable variables.
● There are no data races in purely functional languages because they
don't have mutable variables.
41. Reading some books
42. Improve your business knowledge !
=> read the Behavioral Economics Books
43. Collect the data ?
44. Use your imagination is more than just
knowledge you have
45. Think more about Butterfly Effect!
om A to
l get you you
“Logic w n will get
ginatio - Albert Einste
with da imagination
just log analytics, not
47. Questions & Answers
The link of this slide is here:
More useful resources: