Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DataScience Lab 2017_Kappa Architecture: How to implement a real-time streaming data analytics engine Juantomás García


Published on

DataScience Lab, 13 мая 2017
Kappa Architecture: How to implement a real-time streaming data analytics engine
Juantomás García (Data Solutions Manager at OpenSistemas, Madrid, Spain)
We will have an introduction of what is the kappa architecture vs lambda architecture. We will see how kappa architecture is a good solution to implement solutions in (almost) real time when we need to analyze data in streaming. We will show in a case of real use: how architecture is designed, how pipelines are organized and how data scientists use it. We will review the most used technologies to implement it from apache Kafka + spark using Scala to new tools like apache beam / google dataflow.
Все материалы:

Published in: Technology
  • Be the first to comment

DataScience Lab 2017_Kappa Architecture: How to implement a real-time streaming data analytics engine Juantomás García

  1. 1. Juantomás García - Open Sistemas Kappa Architecture 2.0 DataScience Lab, Odessa
  2. 2. Доброго ранку Одеса!! (Dobroho ranku Odesa) first
  3. 3. Juantomás García • Data Solutions Manager @ OpenSistemas • GDE (Google Developer Expert) for cloud Others • Co-Author of the first Spanish free software book “La Pastilla Roja” • President of Hispalinux (Spanish Linux User Group) • Organizer of the Machine Learning Spain and GDG Cloud Madrid. Who I am
  4. 4. What’s Kappa Architecture? July 2, 2014 Jay Kreps coined the term Kappa Architecture in an article for O’reilly Radar “Maybe we could call this the Kappa Achitecture, though it may be too simple of an idea to merit a Greek letter”
  5. 5. Jay has been involved in lots of projects: ✓ Author of the essay: The Log: What every software engineer should know about real-time data's unifying abstraction (12/16/2013) ✓ Author of the book I love Logs Who is Jay Kreps?
  6. 6. •Involved with projects as: ✓ Apache Kafka ✓ Apache Samza ✓ Voldemort ✓ Azkaban ✓ Ex-Linkedin ✓ Now co-founder and CEO of Confluent Who is Jay Kreps?
  7. 7. Usual Data Flow
  8. 8. Usual Data Flow
  9. 9. Usual Data Flow
  10. 10. Kappa Architecture Way
  11. 11. Tools we use
  12. 12. Tools we use
  13. 13. Tools we use
  14. 14. ✓ If you have an schema spark SQL, is perfect. ✓ Spark streaming works very fine with spark and almost each streaming sources. ✓ Structured queries will be a huge advance. ✓ We love Scala, the spirit of Spark. Some Favorite Spark Features
  15. 15. We love code like this: Some Favorite Spark Features
  16. 16. • One of our clients wanted to monitor all the car's information via OBD II • OBD II is a car interface with the car electronics. • Our client developed an app for reading all the car information throw ODB II with bluetooth A Real Use Case
  17. 17. A Real Use Case
  18. 18. • We needed to scale the rest interfaces. There were too many requests. • MySQL don’t scale • Client wanted to do realtime expensive queries. First Problems
  19. 19. Some metrics
  20. 20. Architecture v 2.0
  21. 21. Architecture v 3.0
  22. 22. We can have queries like: “What are the drivers that are not client of the X gas brand, has a few gas and are near of gas station of the brand X and if true, send a notification with a discount coupon and a link with the route." Now we’re more flexible!!
  23. 23. • Kappa architecture is not a silver bullet but helps with a lot of solutions. • Kafka + spark streaming are our favorite tools • There are a lots of improvements: Takeaways ✓ OLAP like Apache Druid ✓ Graph databases like neo4j ✓ Kafka streams and compacts logs ✓ Apache Beams ✓ Scio Scala bindings
  24. 24. Takeaways: Apache Beam
  25. 25. Takeaways: Scio Scala Binding
  26. 26. Think Big
  27. 27. Think Big • Forget Legacy Architectures • Forget Old Tools • Use Light Technologies / Serverless • Use pieces of Lego • Mix different technologies from diverse sources
  28. 28. Spark Use Cases Not to do list •Avoid install & config a server even a VM. •Avoid installs tools instead use containers and/or cloud services. •In general: think if there is a simpler way to do it and needs less effort
  29. 29. Spark Use Cases Architecture & Tools •To use Cloud Services is not a brainer decision. •Git + Containers + Kubernetes •Use the best language* for each module. •Use Notebooks: Jupyter, Zeppelin, DSX (*) Even java might be an option - unprovable
  30. 30. Google Cloud Version
  31. 31. Kappa Architecture Questions? •email: •twitter: @juantomas This talk have a free questions lifetime warranty: If you have any questions or concerns about this talk, feel free to contact me anytime. Selfie Time: If you like the talk just smile while I take the selfie ;-)
  32. 32. Kappa Architecture велике спасибі