Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DevFest Nantes 2018 - Créer un data pipeline en 20 minutes avec Kafka Connect

107 views

Published on

Chez iAdvize, nous utilisons en production Apache Kafka et le framework Kafka Connect pour créer facilement des data pipelines temps réels, scalables et résilients. Nous verrons comment Kafka Connect peut devenir une solution efficace pour vos flux de données entrants ou sortants de Kafka. En 20 minutes, nous créerons ensemble un pipeline d'A-Z, basé sur un cas concret.

Published in: Software
  • Be the first to comment

  • Be the first to like this

DevFest Nantes 2018 - Créer un data pipeline en 20 minutes avec Kafka Connect

  1. 1. KAFKA CONNECT CREATE A DATA PIPELINE IN 20 MINUTES
  2. 2. OUR JOURNEY REMINDERS ABOUT APACHE KAFKA 1 2 DISCOVER KAFKA CONNECT 3 4 CREATE OUR DATA PIPELINE KAFKA CONNECT ECOSYSTEM KAFKA CONNECT 
 AT IADVIZE
  3. 3. Jocelyn Dréan and also dogs lover… Data Engineer
 Kafka Lover Jocelyn is the best data engineer I ever met ! “  My Mom. Vrr vrr ratatata…. “  PNL Data Engineer at iAdvize  Data Engineer at MindGeek BI developer Who I Am? @jocelyndrean
  4. 4. WHAT IS APACHE KAFKA ?
  5. 5. APACHE KAFKA IS A
 DISTRIBUTED STREAMING PLATFORM
  6. 6. Source: courtesy of Confluent - https://medium.com/walmartlabs/apache-kafka-for-item-setup-3fe8f4ba5967
  7. 7. App App App Kafka Cluster App App App Producers Consumers
  8. 8. OLD NEW MARTINE MARIE JEAN-CLAUDE CONSUMERS: TAKE YOUR TIME BUDDY…
  9. 9. WHAT IS KAFKA CONNECT ?
  10. 10. App App App Kafka Cluster App App App Producers Consumers Connectors DB DB App App Stream Processors KAFKA CONNECT
  11. 11. KAFKA CONNECT CONCEPTS KAFKA CONNECT IS A FRAMEWORK TO STREAM DATA INTO AND OUT OF KAFKA Data Source KAFKA CLUSTER DATA
 SOURCE DATA
 SINK KAFKACONNECT KAFKACONNECT
  12. 12. SOME AVAILABLE CONNECTORS
  13. 13. WRITING YOUR OWN CONNECTOR https://docs.confluent.io/current/connect/javadocs/index.html?org/apache/kafka/connect/connector/Connector.html
  14. 14. KAFKA CONNECT CONCEPTS CONFIGURATION
 REST API WORKERS OUT-OF-THE-BOX CONNECTORS
  15. 15. KAFKA CONNECT ARCHITECTURE SOURCES SINKS KAFKA CONNECT KAFKA CLUSTER task task task task WORKERWORKER
  16. 16. AN OPEN-SOURCE UI https://github.com/Landoop/kafka-connect-ui
  17. 17. KAFKA CONNECT AT IADVIZE ADOPT TRIAL Amazon S3 Debezium
  18. 18. CREATE OUR DATA PIPELINE
  19. 19. OUR DATA PIPELINE ARCHITECTURE 
 GDPR Service Amazon S3 Amazon Athena Producer
 API Debezium 
 Connector QueriesS3 Connector Users DB
  20. 20. BEFORE WE BEGIN Ingredients - Kafka cluster*
 - Zookeeper cluster*
 - Kafka Connect cluster*
 - Debezium Connector* - S3 Connector* - MySQL DB * : https://www.confluent.io/download/
  21. 21. PRODUCE GDPR DATA 1
  22. 22. GENERATE FAKE GDPR EVENTS
  23. 23. OUR DATA PIPELINE ARCHITECTURE 
 GDPR Service Amazon S3 Amazon Athena Producer
 API Users DB
  24. 24. CHANGE DATA CAPTURE
 WITH DEBEZIUM 2
  25. 25. OUR DATABASE SCHEMA
  26. 26. DEBEZIUM CONNECTOR CONFIGURATION
  27. 27. START THE CONNECTOR
  28. 28. LIST KAFKA TOPICS
  29. 29. OUR DATA PIPELINE ARCHITECTURE 
 GDPR Service Amazon S3 Amazon Athena Producer
 API Users DB Debezium 
 Connector
  30. 30. EXPORT DATA FROM KAFKA TOPICS TO S3 3
  31. 31. LIST TOPICS
  32. 32. S3 CONNECTOR CONFIGURATION
  33. 33. START CONNECTORS
  34. 34. MONITORING USING UI
  35. 35. S3 CONNECTOR CONFIGURATION
  36. 36. START CONNECTORS
  37. 37. OPEN AMAZON S3
  38. 38. OPEN AMAZON S3 YYYY / MM / dd
  39. 39. OPEN A FILE ONE LINE = ONE EVENT
  40. 40. OUR DATA PIPELINE ARCHITECTURE 
 GDPR Service Amazon S3 Amazon Athena Producer
 API Debezium 
 Connector Users DB S3 Connector
  41. 41. ANALYZE DATA DIRECTLY IN AMAZON S3 USING SQL 4
  42. 42. WHAT IS AMAZON ATHENA? ANALYSE Analyze unstructured, semi- structured, and structured data stored in Amazon S3. Using PrestoDb in backend. Examples include CSV, JSON, or columnar data formats such as Apache Parquet and Apache ORC PRICING You pay only for the queries you run $5 per TB of scanned data from Amazon S3 JDBC JDBC available. You can perform SQL queries using Python, Java or some BI software like Tableau Software.
  43. 43. CREATE WEBSITES TABLE ON ATHENA
  44. 44. CREATE GDPR TABLE ON ATHENA
  45. 45. RUNNING QUERIES
  46. 46. RUNNING QUERIES
  47. 47. RUNNING QUERIES
  48. 48. RUNNING QUERIES
  49. 49. OUR DATA PIPELINE ARCHITECTURE 
 GDPR Service Amazon S3 Amazon Athena Producer
 API Debezium 
 Connector S3 Connector Users DB Queries
  50. 50. WHAT’S NEXT? https://www.udemy.com/share/1007rsBUUacVhRRHQ=/ https://docs.confluent.io/current/connect/index.html https://github.com/confluentinc/quickstart-demos COURSE ON
 UDEMYDOCUMENTATION QUICKSTART DEMO 
 FROM CONFLUENT
  51. 51. WE'RE HIRING :)
  52. 52. JOIN THE CONVERSATION WE ARE IADVIZE WE CONNECT PEOPLE WITH PASSIONATE EXPERTS TO MAKE EVERY ONLINE CONVERSATION MATTER. Share your savvy expertise via instant messaging Advise the visitors of your favourite websites via chat by sharing advice on the products and services you love to use. You can do it whenever and wherever you want via our app and get paid for each conversation.

×