Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Streaming Data Integration - For Women in Big Data Meetup

745 views

Published on

A stream processing platform is not an island unto itself; it must be connected to all of your existing data systems, applications, and sources. In this talk, we will provide different options for integrating systems and applications with Apache Kafka, with a focus on the Kafka Connect framework and the ecosystem of Kafka connectors. We will discuss the intended use cases for Kafka Connect and share our experience and best practices for building large-scale data pipelines using Apache Kafka.

Published in: Software
  • Be the first to comment

Streaming Data Integration - For Women in Big Data Meetup

  1. 1. 1Confidential Streaming Data Integration with Apache Kafka
  2. 2. 2Confidential About Gwen Gwen Shapira – System Architect @Confluent PMC @ Apache Kafka Moving data round since 2000 Previously: • Software Engineer @ Cloudera • Oracle Database Consultant Find me: • gwen@confluent.io • @gwenshap
  3. 3. 3Confidential The Plan 1. What is Data Integration About? 2. How things changed? 3. What is difficult and important? 4. How we solve things in Kafka?
  4. 4. 4Confidential Data Integration Making sure the right data Gets to the right places
  5. 5. 5Confidential 10 years ago… Informatica DataStage Manual Optimizations
  6. 6. 6Confidential 5 years ago…
  7. 7. 7Confidential
  8. 8. 8Confidential
  9. 9. 9Confidential Today… • Everything streaming • Everything real-time • Everything in-memory • Everything containers • Everything clouds
  10. 10. 10Confidential These Things Matter • Reliability – Losing data is (usually) not OK. • Exactly Once vs At Least Once • Timeliness • Push vs Pull • High throughput, Varying throughput • Compression, Parallelism, Back Pressure • Data Formats • Flexibility, Structure • Security • Error Handling
  11. 11. 11Confidential
  12. 12. 12Confidential After: Stream Data Platform with Kafka  Distributed  Fault Tolerant  Stores Messages Search Security Fraud Detection Application User Tracking Operational Logs Operational MetricsEspresso Cassandra Oracle Hadoop Log Search Monitoring Data Warehouse Kafka  Processes Streams
  13. 13. 13Confidential
  14. 14. 14Confidential 1 4
  15. 15. 15Confidential 1 5
  16. 16. 16Confidential 1 6
  17. 17. 17Confidential 1 7
  18. 18. 18Confidential Introducing Kafka Connect Large-scale streaming data import/export for Kafka
  19. 19. 19Confidential
  20. 20. 20Confidential Overview of Connect 1. Install a cluster of Workers 2. Download / Build and install Connector Plugins 3. Use REST API to Start and Configure Connectors 4. Connectors start Tasks. Tasks run inside Workers and copy data.
  21. 21. 21Confidential
  22. 22. 22Confidential
  23. 23. 23Confidential
  24. 24. 24Confidential
  25. 25. 25Confidential
  26. 26. 26Confidential
  27. 27. 27Confidential
  28. 28. 28Confidential
  29. 29. 30Confidential
  30. 30. 31Confidential
  31. 31. 32Confidential Questions?

×