Your SlideShare is downloading. ×
0
Lambda Architecture
and Open Source Tools for
Real-time Big Data
● Concepts & Techniques “Thinking with Lambda”
● Case stu...
Just a little introduction
● 2008 Java Developer, developed Social
Trading Network for a small startup (Yopco)
● 2011 work...
Contents for this talk
●
●
●
●
●
●
●
●

The lessons from history
Problems In Practice
What is the Lambda Architecture?
Why...
History ?
The best way to predict the future is
looking at the past and now ?
Big data is a buzzword for
old problems
Explaining Big Data
http://www.youtube.com/watch?v=7D1CQ_LOizA
Learning ?
Working ?
Big Data + Old History
http://www.youtube.com/watch?v=tp4y-_VoXdA
This is Big DATA

This is most valuable things!
We can't solve problems
by using the same kind of
thinking we used when we
created them.
Albert Einstein
Think more with
L...
Where Big Data
can be used
BBC Horizon 2013 The Age of Big Data
http://www.youtube.com/watch?v=RE0ITQ7XQjM
Google’s mission is to

organize

the world’s information and make it
universally accessible and useful.
Organize the world’s
information?
How did Google scale their search engine ?
How does Hadoop really work ?
http://stackoverflow.com/questions/6087834/howscalable-is-mapreduce-in-the-original-functionallanguages
Trends of Now and the Future
MapReduce Programming
Reactive Programming
Functional Programming
Streaming Computation
=> Al...
So what is the λ
(Lambda)
Architecture ?
the Lambda Architecture:
● apply the (λ) Lambda philosophy in designing big data
system
● equation “query = function(all d...
Lambda In Practice
2 case studies from my experiences
Case Study 1:
Mobile Data
Monitor API Backend + System KPI
Problem:

Inside “mobile data”,
What's the most
valuable piece of
information
I applied
“Lambda”
here

Backend System for mobile app
Web vs Mobile App
Web
Visitors
Visits
Pageviews
Events

Mobile App
Users
Sessions
Events
Metrics: Cause and Effect
●
●
●
●
●
●
●

Screen Size => App Design, UI/UX, Usability
App version => Deployment, Marketing
...
The data and the size, not too big for a small
startup!

Where is the lambda ?
I used Groovy + GPars (Groovy Parallel Syst...
Mobile Apps => Backend APIs =>
Statistics => Find the Trends & Insights?
Reactive Data
Analytics for
Mobile Apps
It means real-time recommendation
by:
➔ context (location, time)
➔ user profile (p...
Big Data on Small Devices: Data Science goes Mobile
http://strataconf.com/strata2013/public/schedule/detail/27605
Case Study 2:
Web Data
● Real-time Data Analytics
● Monitoring Stream Data (Reactive)

http://eclick.vn
at eClick we must
check campaigns in
near-real-time
(seconds) !

at eClick we have
30~40 GB Logs in Stream
10~20 GB Bandwi...
“lambda architecture”
proposed by @nathanmarz
Internet

Netty Http
Server

TCP Connection

Kafka
Akka Workers
Hadoop Tools

Storm

Redis

Redis

KPI Report
the open-sou...
The big-data technology stack
● Netty (http://netty.io/) a framework using reactive programming
pattern for scaling HTTP s...
My new ideas for the future
Connecting the active functor pattern + reactive programming +
stream computation + in-memory ...
Lessons
What I have learned from Lambda and Big
Data World
What I have learned
●
●
●
●
●

Study about lambda and read some books
Ask questions=> analytics=> Profit & Value
Collect a...
read papers
Study the “lambda”
I studied Haskell in 2007 with Dr.Peter Gammie http://peteg.org/ when
internship at DRD (a non-profit o...
Reading some books
Improve your business knowledge !
=> read the Behavioral Economics Books

http://www.goodreads.com/shelf/show/behavioral-e...
Collect the data ?
Use your imagination is more than just
knowledge you have
Think more about Butterfly Effect!
Z;
om A to
fr
l get you you
il
“Logic w n will get
in
ginatio - Albert Einste
ima
.”
ywhere
ever
Use you
r
with da imagina...
Questions & Answers
The link of this slide is here:
● http://nguyentantrieu.info/blog/lambda-architecture-andopen-source-t...
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
Upcoming SlideShare
Loading in...5
×

Lambda Architecture and open source technology stack for real time big data

3,812

Published on

Concepts & Techniques “Thinking with Lambda”
Case studies in Practice using Lambda architecture

Published in: Technology
0 Comments
18 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,812
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
180
Comments
0
Likes
18
Embeds 0
No embeds

No notes for slide

Transcript of "Lambda Architecture and open source technology stack for real time big data"

  1. 1. Lambda Architecture and Open Source Tools for Real-time Big Data ● Concepts & Techniques “Thinking with Lambda” ● Case studies in Practice Trieu Nguyen - http://nguyentantrieu.info or @tantrieuf31 Principal Engineer at eClick Data Analytics team, FPT Online All contents and thoughts in this slide are my subjective ideas and compiled from Communities
  2. 2. Just a little introduction ● 2008 Java Developer, developed Social Trading Network for a small startup (Yopco) ● 2011 worked at FPT Online, software engineer in Banbe Project, Restful API for VnExpress Mobile App ● 2012 joined Greengar Studios in 6 months, scaling backend API mobile games (iOS, Android) ● 2013 back to FPT Online, R&D about Big Data & Analytics, developing the new core Analytics Platform (on JVM Platform)
  3. 3. Contents for this talk ● ● ● ● ● ● ● ● The lessons from history Problems In Practice What is the Lambda Architecture? Why lambda architecture for real-time big data ? Open Source Technology Stack Lambda in Practice (Mobile Data and Web Data) Lessons I have learned Questions & Answers
  4. 4. History ? The best way to predict the future is looking at the past and now ?
  5. 5. Big data is a buzzword for old problems
  6. 6. Explaining Big Data http://www.youtube.com/watch?v=7D1CQ_LOizA
  7. 7. Learning ?
  8. 8. Working ?
  9. 9. Big Data + Old History http://www.youtube.com/watch?v=tp4y-_VoXdA
  10. 10. This is Big DATA This is most valuable things!
  11. 11. We can't solve problems by using the same kind of thinking we used when we created them. Albert Einstein Think more with Lambda and Reactive
  12. 12. Where Big Data can be used
  13. 13. BBC Horizon 2013 The Age of Big Data http://www.youtube.com/watch?v=RE0ITQ7XQjM
  14. 14. Google’s mission is to organize the world’s information and make it universally accessible and useful.
  15. 15. Organize the world’s information?
  16. 16. How did Google scale their search engine ? How does Hadoop really work ?
  17. 17. http://stackoverflow.com/questions/6087834/howscalable-is-mapreduce-in-the-original-functionallanguages
  18. 18. Trends of Now and the Future MapReduce Programming Reactive Programming Functional Programming Streaming Computation => All just the special cases of Lambda ● ● ● ●
  19. 19. So what is the λ (Lambda) Architecture ?
  20. 20. the Lambda Architecture: ● apply the (λ) Lambda philosophy in designing big data system ● equation “query = function(all data)” which is the basis of all data systems ● proposed by Nathan Marz (http://nathanmarz.com/), a software engineer from Twitter in his “Big Data” book. ● is based on three main design principles: ○ human fault-tolerance – the system is unsusceptible to data loss or data corruption because at scale it could be irreparable. (BUGS ?) ○ data immutability – store data in it’s rawest form immutable and for perpetuity. (INSERT/ SELECT/DELETE but no UPDATE !) ○ recomputation – with the two principles above it is always possible to (re)-compute results by running a function on the raw data.
  21. 21. Lambda In Practice 2 case studies from my experiences
  22. 22. Case Study 1: Mobile Data Monitor API Backend + System KPI
  23. 23. Problem: Inside “mobile data”, What's the most valuable piece of information
  24. 24. I applied “Lambda” here Backend System for mobile app
  25. 25. Web vs Mobile App Web Visitors Visits Pageviews Events Mobile App Users Sessions Events
  26. 26. Metrics: Cause and Effect ● ● ● ● ● ● ● Screen Size => App Design, UI/UX, Usability App version => Deployment, Marketing Connectivity => Code, User Experience Location => Marketing, User Behaviour OS => Marketing, Cost, Development Memory => User Experience Feature Session => How to engage app users
  27. 27. The data and the size, not too big for a small startup! Where is the lambda ? I used Groovy + GPars (Groovy Parallel Systems) + MongoDB for fast parallel computation (actor model) on statistical data http://gpars.codehaus.org/ The GPars framework offers Java developers intuitive and safe ways to handle Java or Groovy tasks concurrently. Support: ● ● ● ● ● ● ● ● Dataflow concurrency Actor programming model CSP Agent - an thread-safe reference to mutable state Concurrent collection processing Composable asynchronous functions Fork/Join STM (Software Transactional Memory)
  28. 28. Mobile Apps => Backend APIs => Statistics => Find the Trends & Insights?
  29. 29. Reactive Data Analytics for Mobile Apps It means real-time recommendation by: ➔ context (location, time) ➔ user profile (preferences, level, ...)
  30. 30. Big Data on Small Devices: Data Science goes Mobile http://strataconf.com/strata2013/public/schedule/detail/27605
  31. 31. Case Study 2: Web Data ● Real-time Data Analytics ● Monitoring Stream Data (Reactive) http://eclick.vn
  32. 32. at eClick we must check campaigns in near-real-time (seconds) ! at eClick we have 30~40 GB Logs in Stream 10~20 GB Bandwidth just for tracking user actions (click, impression,...) in ONE day ! at eClick we have many types of log (video, web, mobile, system logs, ad-campaign, articles, … )
  33. 33. “lambda architecture” proposed by @nathanmarz
  34. 34. Internet Netty Http Server TCP Connection Kafka Akka Workers Hadoop Tools Storm Redis Redis KPI Report the open-source lambda architecture at eClick
  35. 35. The big-data technology stack ● Netty (http://netty.io/) a framework using reactive programming pattern for scaling HTTP system easier, by JBoss http://www.jboss.org ● Kafka (http://kafka.apache.org/) a publish-subscribe messaging rethought as a distributed commit log, open sourced by Linkedin ● Storm (http://storm-project.net/) the framework for distributed realtime computation system, by Twitter ● Redis (http://redis.io/) a advanced key-value in-memory NoSQL database, all fast statistical computations in here. ● Groovy for scripting layer on JVM, ad-hoc query on Redis ● Hadoop ecosystem: HDFS, Hive, HBase for batch processing ● RxJava https://github.com/Netflix/RxJava a library for composing asynchronous and event-based programs ● Hystrix https://github.com/Netflix/Hystrix : for Latency and Fault Tolerance for Distributed Systems
  36. 36. My new ideas for the future Connecting the active functor pattern + reactive programming + stream computation + in-memory computing to make: ● real-time data analytics easier ● better recommendation system ● build more profitable in big data More Information: ● http://activefunctor.blogspot.com/ (a special case of Lambda that actively search best connections to form optimal topology) - from ideas when internship at DRD with my advisor. ● Can a function be persistent (stored as data), distributed in a cluster (cloud), reactive to right data (best value in network) ? ● http://www.reactivemanifesto.org/ (reactive pattern)
  37. 37. Lessons What I have learned from Lambda and Big Data World
  38. 38. What I have learned ● ● ● ● ● Study about lambda and read some books Ask questions=> analytics=> Profit & Value Collect any data you can, learn inside ! Implement it! Just right tools for right jobs. Turn your data into the things everyone can "look & feel"
  39. 39. read papers
  40. 40. Study the “lambda” I studied Haskell in 2007 with Dr.Peter Gammie http://peteg.org/ when internship at DRD (a non-profit organization). ● Imperative programs will always be vulnerable to data races because they contain mutable variables. ● There are no data races in purely functional languages because they don't have mutable variables.
  41. 41. Reading some books
  42. 42. Improve your business knowledge ! => read the Behavioral Economics Books http://www.goodreads.com/shelf/show/behavioral-economics
  43. 43. Collect the data ?
  44. 44. Use your imagination is more than just knowledge you have
  45. 45. Think more about Butterfly Effect!
  46. 46. Z; om A to fr l get you you il “Logic w n will get in ginatio - Albert Einste ima .” ywhere ever Use you r with da imagination ta just log analytics, not ic Learn Data Visualization
  47. 47. Questions & Answers The link of this slide is here: ● http://nguyentantrieu.info/blog/lambda-architecture-andopen-source-tools-for-real-time-big-data/ More useful resources: ● http://nguyentantrieu.info/blog ● http://www.mc2ads.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×