DataScience Lab 2017_Kappa Architecture: How to implement a real-time streaming data analytics engine Juantomás García

•

3 likes•497 views

DataScience Lab, 13 мая 2017 Kappa Architecture: How to implement a real-time streaming data analytics engine Juantomás García (Data Solutions Manager at OpenSistemas, Madrid, Spain) We will have an introduction of what is the kappa architecture vs lambda architecture. We will see how kappa architecture is a good solution to implement solutions in (almost) real time when we need to analyze data in streaming. We will show in a case of real use: how architecture is designed, how pipelines are organized and how data scientists use it. We will review the most used technologies to implement it from apache Kafka + spark using Scala to new tools like apache beam / google dataflow. Все материалы: http://datascience.in.ua/report2017

Technology

Juantomás García - Open Sistemas
Kappa Architecture 2.0
DataScience Lab, Odessa

Доброго ранку
Одеса!!
(Dobroho ranku Odesa)
first

Juantomás García
• Data Solutions Manager @ OpenSistemas
• GDE (Google Developer Expert) for cloud
Others
• Co-Author of the first Spanish free software book “La Pastilla
Roja”
• President of Hispalinux (Spanish Linux User Group)
• Organizer of the Machine Learning Spain and GDG Cloud
Madrid.
Who I am

What’s Kappa Architecture?
July 2, 2014 Jay Kreps coined the term Kappa
Architecture in an article for O’reilly Radar
“Maybe we could call this the Kappa Achitecture, though it may
be too simple of an idea to merit a Greek letter”

Jay has been involved in lots
of projects:
✓ Author of the essay: The
Log: What every software
engineer should know about
real-time data's unifying
abstraction (12/16/2013)
✓ Author of the book I love
Logs
Who is Jay Kreps?

•Involved with projects as:
✓ Apache Kafka
✓ Apache Samza
✓ Voldemort
✓ Azkaban
✓ Ex-Linkedin
✓ Now co-founder and CEO of Confluent
Who is Jay Kreps?

✓ If you have an schema spark SQL, is
perfect.
✓ Spark streaming works very fine with spark
and almost each streaming sources.
✓ Structured queries will be a huge advance.
✓ We love Scala, the spirit of Spark.
Some Favorite Spark Features

We love code like this:
Some Favorite Spark Features

• One of our clients wanted to monitor all the
car's information via OBD II
• OBD II is a car interface with the car
electronics.
• Our client developed an app for reading all
the car information throw ODB II with
bluetooth
A Real Use Case

• We needed to scale the rest interfaces.
There were too many requests.
• MySQL don’t scale
• Client wanted to do realtime expensive
queries.
First Problems

• Kappa architecture is not a silver bullet but helps
with a lot of solutions.
• Kafka + spark streaming are our favorite tools
• There are a lots of improvements:
Takeaways
✓ OLAP like Apache Druid
✓ Graph databases like neo4j
✓ Kafka streams and
compacts logs
✓ Apache Beams
✓ Scio Scala bindings

Think Big
• Forget Legacy Architectures
• Forget Old Tools
• Use Light Technologies / Serverless
• Use pieces of Lego
• Mix different technologies from diverse sources

Spark Use Cases
Not to do list
•Avoid install & config a server even a
VM.
•Avoid installs tools instead use
containers and/or cloud services.
•In general: think if there is a simpler
way to do it and needs less effort

Spark Use Cases
Architecture & Tools
•To use Cloud Services is not a brainer
decision.
•Git + Containers + Kubernetes
•Use the best language* for each
module.
•Use Notebooks: Jupyter, Zeppelin,
DSX
(*) Even java might be an option - unprovable

Kappa Architecture
Questions?
•email: juantomas@opensistemas.com
•twitter: @juantomas
This talk have a free questions lifetime warranty: If you have any questions or concerns
about this talk, feel free to contact me anytime.
Selfie Time: If you like the talk just smile while I take
the selfie ;-)

What's hot

Augmenting Mongo DB with treasure dataTreasure Data, Inc.

Large Scale Graph Analytics with JanusGraphP. Taylor Goetz

Neo4j tms_mdev_

A (XPages) developers guide to CloudantFrank van der Linden

Unifying Events and Logs into the CloudEduardo Silva Pereira

Briney - Leveling Up Data Management - With NotesNational Information Standards Organization (NISO)

Elk meetupAsaf Yigal

Serverless Big Data Architecture on Google Cloud Platform at Credit OKKriangkrai Chaonithi

A (XPages) developers guide to Cloudant - MeetITFrank van der Linden

Protecting Your Cluster from Your HumansElasticsearch

Indexing big data in the cloudOpenSource Connections

2018 01-15 infra coders meetup vienna iiAleksandar Lazic

DEV-1129 How Watson, Bluemix, Cloudant, and XPages Can Work Together In A Rea...Frank van der Linden

Find your dataOliver Busse

Atlogys Academy - Tech Talk on Mongo DBAtlogys Technical Consulting

Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …mortardata

JanusGraph, Jupyter Meetup NYCJason Plurad

Going Serverless with AWS Lambda at ReportGardenJay Gandhi

EDU2.0 and Amazon CloudSearchMichael Bohlig

What's hot (19)

Augmenting Mongo DB with treasure data

Large Scale Graph Analytics with JanusGraph

Neo4j tms

A (XPages) developers guide to Cloudant

Unifying Events and Logs into the Cloud

Briney - Leveling Up Data Management - With Notes

Elk meetup

Serverless Big Data Architecture on Google Cloud Platform at Credit OK

A (XPages) developers guide to Cloudant - MeetIT

Protecting Your Cluster from Your Humans

Indexing big data in the cloud

2018 01-15 infra coders meetup vienna ii

DEV-1129 How Watson, Bluemix, Cloudant, and XPages Can Work Together In A Rea...

Find your data

Atlogys Academy - Tech Talk on Mongo DB

Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …

JanusGraph, Jupyter Meetup NYC

Going Serverless with AWS Lambda at ReportGarden

EDU2.0 and Amazon CloudSearch

Similar to DataScience Lab 2017_Kappa Architecture: How to implement a real-time streaming data analytics engine Juantomás García

Kappa Architecture, IoT of the cars - LibreCon 2016LibreCon

Architecting Your First Big Data ImplementationAdaryl "Bob" Wakefield, MBA

Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido

Cloud Big Data ArchitecturesLynn Langit

Beyond RelationalLynn Langit

ASPgems - kappa architectureJuantomás García Molina

Apache spark y cómo lo usamos en nuestros proyectosOpenSistemas

On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...Radhika Puthiyetath

Which database should I use for my app?Nawaz Dhandala

Cloud and Big Data trendsSebastien Goasguen

Mapping Life Science Informatics to the CloudChris Dagdigian

Why Organizations are Looking at Alternative Database Technologies – Introduc...DATAVERSITY

Decoupling Drupal - Drupal Camp Toronto 2014Alex De Winne

Technologies for Data Analytics PlatformN Masahiro

Getting Started with Big Data in the CloudRightScale

Traveloka's data journey — Traveloka data meetup #2Traveloka

An Incomplete Data Tools Landscape for Hackers in 2015Wes McKinney

Comment choisir entre Parse, Heroku et AWS ?TheFamily

Server’s variations bsw2015Laurent Cerveau

Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...Edward Wilde

Similar to DataScience Lab 2017_Kappa Architecture: How to implement a real-time streaming data analytics engine Juantomás García (20)

Kappa Architecture, IoT of the cars - LibreCon 2016

Architecting Your First Big Data Implementation

Data Science at Scale: Using Apache Spark for Data Science at Bitly

Cloud Big Data Architectures

Beyond Relational

ASPgems - kappa architecture

Apache spark y cómo lo usamos en nuestros proyectos

On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...

Which database should I use for my app?

Cloud and Big Data trends

Mapping Life Science Informatics to the Cloud

Why Organizations are Looking at Alternative Database Technologies – Introduc...

Decoupling Drupal - Drupal Camp Toronto 2014

Technologies for Data Analytics Platform

Getting Started with Big Data in the Cloud

Traveloka's data journey — Traveloka data meetup #2

An Incomplete Data Tools Landscape for Hackers in 2015

Comment choisir entre Parse, Heroku et AWS ?

Server’s variations bsw2015

Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Bluetooth Controlled Car with Arduino.pdfngoud9212

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Build your next Gen AI Breakthrough - April 2024Neo4j

Artificial intelligence in the post-deep learning eraDeakin University

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

CloudStudio User manual (basic edition):comworks

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

AI as an Interface for Commercial BuildingsMemoori

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Unblocking The Main Thread Solving ANRs and Frozen Frames

Bluetooth Controlled Car with Arduino.pdf

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

Build your next Gen AI Breakthrough - April 2024

Artificial intelligence in the post-deep learning era

Pigging Solutions Piggable Sweeping Elbows

Connect Wave/ connectwave Pitch Deck Presentation

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Benefits Of Flutter Compared To Other Frameworks

My Hashitalk Indonesia April 2024 Presentation

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Understanding the Laravel MVC Architecture

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service

DMCC Future of Trade Web3 - Special Edition

CloudStudio User manual (basic edition):

My INSURER PTE LTD - Insurtech Innovation Award 2024

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

AI as an Interface for Commercial Buildings

Advanced Test Driven-Development @ php[tek] 2024

DataScience Lab 2017_Kappa Architecture: How to implement a real-time streaming data analytics engine Juantomás García

1. Juantomás García - Open Sistemas Kappa Architecture 2.0 DataScience Lab, Odessa

2. Доброго ранку Одеса!! (Dobroho ranku Odesa) first

3. Juantomás García • Data Solutions Manager @ OpenSistemas • GDE (Google Developer Expert) for cloud Others • Co-Author of the first Spanish free software book “La Pastilla Roja” • President of Hispalinux (Spanish Linux User Group) • Organizer of the Machine Learning Spain and GDG Cloud Madrid. Who I am

4. What’s Kappa Architecture? July 2, 2014 Jay Kreps coined the term Kappa Architecture in an article for O’reilly Radar “Maybe we could call this the Kappa Achitecture, though it may be too simple of an idea to merit a Greek letter”

5. Jay has been involved in lots of projects: ✓ Author of the essay: The Log: What every software engineer should know about real-time data's unifying abstraction (12/16/2013) ✓ Author of the book I love Logs Who is Jay Kreps?

6. •Involved with projects as: ✓ Apache Kafka ✓ Apache Samza ✓ Voldemort ✓ Azkaban ✓ Ex-Linkedin ✓ Now co-founder and CEO of Confluent Who is Jay Kreps?

7. Usual Data Flow

8. Usual Data Flow

9. Usual Data Flow

10. Kappa Architecture Way

11. Tools we use

12. Tools we use

13. Tools we use

14. ✓ If you have an schema spark SQL, is perfect. ✓ Spark streaming works very fine with spark and almost each streaming sources. ✓ Structured queries will be a huge advance. ✓ We love Scala, the spirit of Spark. Some Favorite Spark Features

15. We love code like this: Some Favorite Spark Features

16. • One of our clients wanted to monitor all the car's information via OBD II • OBD II is a car interface with the car electronics. • Our client developed an app for reading all the car information throw ODB II with bluetooth A Real Use Case

17. A Real Use Case

18. • We needed to scale the rest interfaces. There were too many requests. • MySQL don’t scale • Client wanted to do realtime expensive queries. First Problems

19. Some metrics

20. Architecture v 2.0

21. Architecture v 3.0

22. We can have queries like: “What are the drivers that are not client of the X gas brand, has a few gas and are near of gas station of the brand X and if true, send a notification with a discount coupon and a link with the route." Now we’re more flexible!!

23. • Kappa architecture is not a silver bullet but helps with a lot of solutions. • Kafka + spark streaming are our favorite tools • There are a lots of improvements: Takeaways ✓ OLAP like Apache Druid ✓ Graph databases like neo4j ✓ Kafka streams and compacts logs ✓ Apache Beams ✓ Scio Scala bindings

24. Takeaways: Apache Beam

25. Takeaways: Scio Scala Binding

26. Think Big

27. Think Big • Forget Legacy Architectures • Forget Old Tools • Use Light Technologies / Serverless • Use pieces of Lego • Mix different technologies from diverse sources

28. Spark Use Cases Not to do list •Avoid install & config a server even a VM. •Avoid installs tools instead use containers and/or cloud services. •In general: think if there is a simpler way to do it and needs less effort

29. Spark Use Cases Architecture & Tools •To use Cloud Services is not a brainer decision. •Git + Containers + Kubernetes •Use the best language* for each module. •Use Notebooks: Jupyter, Zeppelin, DSX (*) Even java might be an option - unprovable

30. Google Cloud Version

31. Kappa Architecture Questions? •email: juantomas@opensistemas.com •twitter: @juantomas This talk have a free questions lifetime warranty: If you have any questions or concerns about this talk, feel free to contact me anytime. Selfie Time: If you like the talk just smile while I take the selfie ;-)

32. Kappa Architecture велике спасибі

DataScience Lab 2017_Kappa Architecture: How to implement a real-time streaming data analytics engine Juantomás García

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to DataScience Lab 2017_Kappa Architecture: How to implement a real-time streaming data analytics engine Juantomás García

Similar to DataScience Lab 2017_Kappa Architecture: How to implement a real-time streaming data analytics engine Juantomás García (20)

More from GeeksLab Odessa

More from GeeksLab Odessa (20)

Recently uploaded

Recently uploaded (20)

DataScience Lab 2017_Kappa Architecture: How to implement a real-time streaming data analytics engine Juantomás García