Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rediscovering the Value of Apache Kafka® in Modern Data Architecture

256 views

Published on

Ricardo Ferreira, Developer Advocate, Confluent
Rediscovering the Value of Apache Kafka® in Modern Data Architecture

Published in: Technology
  • Be the first to comment

Rediscovering the Value of Apache Kafka® in Modern Data Architecture

  1. 1. Rediscovering the value of apache kafka® in modern data architectures @riferrei | #kafkameetup | @CONFLUENTINC
  2. 2. About me @riferrei | @kafkameetup | @CONFLUENTINC • RICARDO FERREIRA • Works for confluent • Developer advocate • Ricardo@confluent.iO • HTTPS://RIFERREI.NET
  3. 3. Origins of apache kafka @riferrei | @kafkameetup | @CONFLUENTINC ”there were lots of databases and other systems built to store data, but what was missing in our architecture was something that would help us to handle continuous flows of data.” – jay kreps
  4. 4. @riferrei | @kafkameetup | @CONFLUENTINC
  5. 5. @riferrei | @kafkameetup | @CONFLUENTINC First realization > I changed my job from oracle to confluent I work at confluent event state
  6. 6. @riferrei | @kafkameetup | @CONFLUENTINC Events are both notification State transfer +
  7. 7. @riferrei | @kafkameetup | @CONFLUENTINC Event-driven application Job change recommendation engine Search engine Email service
  8. 8. @riferrei | @kafkameetup | @CONFLUENTINC SQL SQL SQL Recommendation engine Search engine Email servicedatabase LOG Let’s implement this!
  9. 9. @riferrei | @kafkameetup | @CONFLUENTINC second realization database 1000x more volume Non-transactional events Transactional events LOG
  10. 10. Databases, 30 years ago...
  11. 11. Developer Databases, these days...
  12. 12. @riferrei | @kafkameetup | @CONFLUENTINC Databases are limited
  13. 13. Limited? Are you kidding me?
  14. 14. @riferrei | @kafkameetup | @CONFLUENTINC ARE DATABASES LIMITED? YES, THEY ARE. WHY DO WE HAVE TO MOVE DATA FROM ONE DB TO ANOTHER JUST TO DO ANALYTICS?
  15. 15. @riferrei | @kafkameetup | @CONFLUENTINC SHARED STATE = MORE DB’S Business line 1 Business line 2 Business line 3
  16. 16. @riferrei | @kafkameetup | @CONFLUENTINC THIRD REALIZATION User tracking Historical data Operational metricsNosql database Graph database Sql database microservices ...HADOOP Elastic search grafana Machine learning REC. ENGINE SEARCH SECURITY EMAIL SOCIAL GRAPH
  17. 17. “The truth is the log. The database is a cache of a subset of the log.” — pat helland Immutability changes everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
  18. 18. @riferrei | @kafkameetup | @CONFLUENTINC log as first-class citizen database LOG 0 1 2 3 4 5 6 7 8LOG reads writes Destination System a (time = 1) Destination System b (time = 3)
  19. 19. @riferrei | @kafkameetup | @CONFLUENTINC SOLUTION: BUILD A COMMIT LOG Commit LOG User tracking Historical data Operational metricsNosql database Graph database Sql database microservices ...HADOOP Elastic search grafana Machine learning REC. ENGINE SEARCH SECURITY EMAIL SOCIAL GRAPH
  20. 20. http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Do you THINK is a table that you RUN QUERIES?
  21. 21. @riferrei | @kafkameetup | @CONFLUENTINC STREAMS AND TABLES DUALITY {"user":"riferrei","score":"1001"} {"user":"riferrei","score":"1002"} {"user":"riferrei","score":"1003"} {"user":"riferrei","score":"1004"} {"user":"riferrei","score":"1005"} {"user":"riferrei","score":"1005"} stream table
  22. 22. Origins of apache kafka @riferrei | @kafkameetup | @CONFLUENTINC ”WE’VE COME TO THINK OF KAFKA AS A STREAMING PLATFORM: A SYSTEM THAT LETS YOU PUBLISH AND SUBSCRIBE TO STREAMS OF DATA, STORE THEM, AND PROCESS THEM, AND THAT IS EXACTLY WHAT APACHE KAFKA IS BUILT TO BE.” – jay kreps
  23. 23. @riferrei | @kafkameetup | @CONFLUENTINC ORIGINS OF APACHE KAFKA Databases Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Fast (Low Latency)
  24. 24. @riferrei | @kafkameetup | @CONFLUENTINC ORIGINS OF APACHE KAFKA Databases Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Fast (Low Latency)Highly Scalable Durable Persistent Ordered Fast (Low Latency) Distributed Commit log
  25. 25. @riferrei | @kafkameetup | @CONFLUENTINC ORIGINS OF APACHE KAFKA Databases Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Fast (Low Latency)Highly Scalable Durable Persistent Ordered Fast (Low Latency) Stream processing Continuous flows Scalable integration Distributed Streaming platform
  26. 26. @riferrei | @confluentinc | @itau
  27. 27. Origins of apache kafka @riferrei | @kafkameetup | @CONFLUENTINC ”the ability to combine these three areas – to bring all the streams of data together across all the use cases – is what makes the idea of a streaming platform so appealing to people” – jay kreps
  28. 28. 01 Well done messaging 02 Durable storage 03 Stream processing WHAT IS APACHE KAFKA?
  29. 29. @riferrei | @kafkameetup | @CONFLUENTINC Time for some fun 1. Get the game 2. Name yourself
  30. 30. @riferrei | @KAFKAMEETUP | @CONFLUENTINC https://github.com/confluentinc/demo-scene <<Pacman-ccloud>> Source-code
  31. 31. @riferrei | @kafkameetup | @CONFLUENTINC Source: USER_GAME TOPIC
  32. 32. @riferrei | @kafkameetup | @CONFLUENTINC Creating User_game stream
  33. 33. @riferrei | @kafkameetup | @CONFLUENTINC Querying USER_GAME STREAM
  34. 34. @riferrei | @kafkameetup | @CONFLUENTINC Creating Stats_per_user table
  35. 35. @riferrei | @kafkameetup | @CONFLUENTINC Querying STATS_PER_USER TABLE
  36. 36. @riferrei | @kafkameetup | @CONFLUENTINC Low latency Pull queries
  37. 37. @riferrei | @kafkameetup | @CONFLUENTINC Source: User_losses topic
  38. 38. @riferrei | @kafkameetup | @CONFLUENTINC Creating USER_LOSSES STREAM
  39. 39. @riferrei | @kafkameetup | @CONFLUENTINC querying USER_LOSSES STREAM
  40. 40. @riferrei | @kafkameetup | @CONFLUENTINC Creating LOSSES_PER_USER TABLE
  41. 41. @riferrei | @kafkameetup | @CONFLUENTINC Querying LOSSES_PER_USER TABLE
  42. 42. @riferrei | @kafkameetup | @CONFLUENTINC Creating SCOREBOARD TABLE
  43. 43. @riferrei | @kafkameetup | @CONFLUENTINC Querying SCOREBOARD TABLE
  44. 44. @riferrei | @kafkameetup | @CONFLUENTINC Complete scoreboard USER_GAME USER_losses Stats_per_user losses_per_user SCOREBOARD storage process storage process storage
  45. 45. @riferrei | @kafkameetup | @CONFLUENTINC how can I learn more?
  46. 46. @riferrei | @kafkameetup | @CONFLUENTINC Get kafka: confluent cloud Try free: https://cnfl.io/confluent-cloud
  47. 47. @riferrei | @kafkameetup | @CONFLUENTINC https://cnfl.io/tutorials Get examples: kafka tutorials
  48. 48. @riferrei | @kafkameetup | @CONFLUENTINC https://cnfl.io/books Get books: o’reilly bundle
  49. 49. @riferrei | @kafkameetup | @CONFLUENTINC https://kafka-summit.org/events/kafka-summit-austin-2020 join kafka summit https://myeventi.events/kafka20/aus Use 25% discount code: KSL20Meetup
  50. 50. @riferrei | @kafkameetup | @CONFLUENTINC Thank you

×