SlideShare a Scribd company logo
1 of 16
Download to read offline
1
Data Pipelines Made Simple
With Apache Kafka
Ewen Cheslack-Postava
Engineer, Apache Kafka Committer
2
Attend the whole series!
Simplify Governance for Streaming Data in Apache Kafka
Date: Thursday, April 6, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session Windows
Date: Thursday, March 30, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent Control
Center
Date: Thursday, March 16, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache Kafka
Date: Thursday, March 23, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Date: Thursday, March 9, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Clarke Patterson, Senior Director, Product Marketing
3
The Challenge: Streaming Data Pipelines
4
Simplifying Streaming Data Pipelines with Apache Kafka
5
Kafka Connect
6
Streaming ETL
7
Single Message Transforms for Kafka Connect
Modify events before storing in
Kafka:
• Mask sensitive information
• Add identifiers
• Tag events
• Store lineage
• Remove unnecessary columns
Modify events going out of
Kafka:
• Route high priority events to
faster data stores
• Direct events to different
Elasticsearch indexes
• Cast data types to match
destination
• Remove unnecessary columns
8
Where Single Message Transforms Fit In
9
Built-in Transformations
• InsertField – Add a field using either static data or record metadata
• ReplaceField – Filter or rename fields
• MaskField – Replace field with valid null value for the type (0, empty string, etc)
• ValueToKey – Set the key to one of the value’s fields
• HoistField – Wrap the entire event as a single field inside a Struct or a Map
• ExtractField – Extract a specific field from Struct and Map and include only this field in results
• SetSchemaMetadata – modify the schema name or version
• TimestampRouter – Modify the topic of a record based on original topic and timestamp. Useful
when using a sink that needs to write to different tables or indexes based on timestamps
• RegexpRouter – modify the topic of a record based on original topic, replacement string and a
regular expression
10
Configuring Single Message Transforms
name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=test.txt
topic=connect-test
transforms=MakeMap,InsertSource
transforms.MakeMap.type=org.apache.kafka.connect.transforms.HoistField$Value
transforms.MakeMap.field=line
transforms.InsertSource.type=org.apache.kafka.connect.transforms.InsertField$Value
transforms.InsertSource.static.field=data_source
transforms.InsertSource.static.value=test-file-source
11
Why only single messages?
• Delivery guarantees!
• Always provide at least once semantics
• For supported connectors, provide exactly once semantics
• No additional complication: transformations happens inline with import/export
12
When should I use each tool?
Kafka Connect & Single Message Transforms
• Simple, message at a time
• Transformation can be performed inline
• Transformation does not interact with
external systems
Kafka Streams
• Complex transformations including
• Aggregations
• Windowing
• Joins
• Transformed data stored back in Kafka,
enabling reuse
• Write, deploy, and monitor a Java
application
13
Conclusion
Single Message Transforms in Kafka Connect
• Lightweight transformation of individual messages
• Configuration-only data pipelines
• Pluggable, with lots of built-in transformations
14
Attend the whole series!
Simplify Governance for Streaming Data in Apache Kafka
Date: Thursday, April 6, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session Windows
Date: Thursday, March 30, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent Control
Center
Date: Thursday, March 16, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache Kafka
Date: Thursday, March 23, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Date: Thursday, March 9, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Clarke Patterson, Senior Director, Product Marketing
15
Get Started with Apache Kafka Today!
https://www.confluent.io/downloads/
THE place to start with Apache Kafka!
Thoroughly tested and quality
assured
More extensible developer
experience
Easy upgrade path to
Confluent Enterprise
16
Discount code: kafcom17
  Use the Apache Kafka community discount code to get $50 off
  www.kafka-summit.org
Kafka Summit New York: May 8
Kafka Summit San Francisco: August 28
Presented by

More Related Content

What's hot

Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
confluent
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 

What's hot (20)

Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Confluent building a real-time streaming platform using kafka streams and k...
Confluent   building a real-time streaming platform using kafka streams and k...Confluent   building a real-time streaming platform using kafka streams and k...
Confluent building a real-time streaming platform using kafka streams and k...
 
Kafka streams - From pub/sub to a complete stream processing platform
Kafka streams - From pub/sub to a complete stream processing platformKafka streams - From pub/sub to a complete stream processing platform
Kafka streams - From pub/sub to a complete stream processing platform
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
 
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACK
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data Meetup
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafka
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Athens BigData Meetup - Sept 17
Athens BigData Meetup - Sept 17Athens BigData Meetup - Sept 17
Athens BigData Meetup - Sept 17
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
 
Riding the Streaming Wave DIY style
Riding the Streaming Wave  DIY styleRiding the Streaming Wave  DIY style
Riding the Streaming Wave DIY style
 
Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker series
 
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
Kafka Summit SF 2017 - Keynote - Go Against the Flow: Databases and Stream Pr...
 

Viewers also liked

Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 

Viewers also liked (20)

Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
 
Building Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache KafkaBuilding Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache Kafka
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafka
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
The Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and ServicesThe Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and Services
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
Demystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache KafkaDemystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache Kafka
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
 
Data integration with Apache Kafka
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafka
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Confluent Enterprise Datasheet
Confluent Enterprise DatasheetConfluent Enterprise Datasheet
Confluent Enterprise Datasheet
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafka
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 

Similar to Data Pipelines Made Simple with Apache Kafka

ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
Timothy Spann
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
HostedbyConfluent
 
Iccana 2011
Iccana 2011Iccana 2011
Iccana 2011
hanums1
 

Similar to Data Pipelines Made Simple with Apache Kafka (20)

Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AI
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
 
Microservices, Kafka Streams and KafkaEsque
Microservices, Kafka Streams and KafkaEsqueMicroservices, Kafka Streams and KafkaEsque
Microservices, Kafka Streams and KafkaEsque
 
Trivadis TechEvent 2017 Tools and Methods for DB Migrations by Kim Berg Hansen
Trivadis TechEvent 2017 Tools and Methods for DB Migrations by Kim Berg HansenTrivadis TechEvent 2017 Tools and Methods for DB Migrations by Kim Berg Hansen
Trivadis TechEvent 2017 Tools and Methods for DB Migrations by Kim Berg Hansen
 
Presentation fyp1automationreplicationinopenstack
Presentation fyp1automationreplicationinopenstackPresentation fyp1automationreplicationinopenstack
Presentation fyp1automationreplicationinopenstack
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdfDustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
 
Prashant Vichare Resume
Prashant Vichare ResumePrashant Vichare Resume
Prashant Vichare Resume
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Scaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data ChallengesScaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data Challenges
 
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)
 
High-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale SystemsHigh-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale Systems
 
Patterns of Streaming Applications
Patterns of Streaming ApplicationsPatterns of Streaming Applications
Patterns of Streaming Applications
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
 
Introduction to OpenStack Trove & Database as a Service
Introduction to OpenStack Trove & Database as a ServiceIntroduction to OpenStack Trove & Database as a Service
Introduction to OpenStack Trove & Database as a Service
 
Iccana 2011
Iccana 2011Iccana 2011
Iccana 2011
 
Spark Summit EU talk by John Musser
Spark Summit EU talk by John MusserSpark Summit EU talk by John Musser
Spark Summit EU talk by John Musser
 

More from confluent

More from confluent (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 

Recently uploaded

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
SQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionSQL Injection Introduction and Prevention
SQL Injection Introduction and Prevention
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 

Data Pipelines Made Simple with Apache Kafka

  • 1. 1 Data Pipelines Made Simple With Apache Kafka Ewen Cheslack-Postava Engineer, Apache Kafka Committer
  • 2. 2 Attend the whole series! Simplify Governance for Streaming Data in Apache Kafka Date: Thursday, April 6, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Gwen Shapira, Product Manager, Confluent Using Apache Kafka to Analyze Session Windows Date: Thursday, March 30, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Michael Noll, Product Manager, Confluent Monitoring and Alerting Apache Kafka with Confluent Control Center Date: Thursday, March 16, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Nick Dearden, Director, Engineering and Product Data Pipelines Made Simple with Apache Kafka Date: Thursday, March 23, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Ewen Cheslack-Postava, Engineer, Confluent https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/ What’s New in Apache Kafka 0.10.2 and Confluent 3.2 Date: Thursday, March 9, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Clarke Patterson, Senior Director, Product Marketing
  • 3. 3 The Challenge: Streaming Data Pipelines
  • 4. 4 Simplifying Streaming Data Pipelines with Apache Kafka
  • 7. 7 Single Message Transforms for Kafka Connect Modify events before storing in Kafka: • Mask sensitive information • Add identifiers • Tag events • Store lineage • Remove unnecessary columns Modify events going out of Kafka: • Route high priority events to faster data stores • Direct events to different Elasticsearch indexes • Cast data types to match destination • Remove unnecessary columns
  • 8. 8 Where Single Message Transforms Fit In
  • 9. 9 Built-in Transformations • InsertField – Add a field using either static data or record metadata • ReplaceField – Filter or rename fields • MaskField – Replace field with valid null value for the type (0, empty string, etc) • ValueToKey – Set the key to one of the value’s fields • HoistField – Wrap the entire event as a single field inside a Struct or a Map • ExtractField – Extract a specific field from Struct and Map and include only this field in results • SetSchemaMetadata – modify the schema name or version • TimestampRouter – Modify the topic of a record based on original topic and timestamp. Useful when using a sink that needs to write to different tables or indexes based on timestamps • RegexpRouter – modify the topic of a record based on original topic, replacement string and a regular expression
  • 10. 10 Configuring Single Message Transforms name=local-file-source connector.class=FileStreamSource tasks.max=1 file=test.txt topic=connect-test transforms=MakeMap,InsertSource transforms.MakeMap.type=org.apache.kafka.connect.transforms.HoistField$Value transforms.MakeMap.field=line transforms.InsertSource.type=org.apache.kafka.connect.transforms.InsertField$Value transforms.InsertSource.static.field=data_source transforms.InsertSource.static.value=test-file-source
  • 11. 11 Why only single messages? • Delivery guarantees! • Always provide at least once semantics • For supported connectors, provide exactly once semantics • No additional complication: transformations happens inline with import/export
  • 12. 12 When should I use each tool? Kafka Connect & Single Message Transforms • Simple, message at a time • Transformation can be performed inline • Transformation does not interact with external systems Kafka Streams • Complex transformations including • Aggregations • Windowing • Joins • Transformed data stored back in Kafka, enabling reuse • Write, deploy, and monitor a Java application
  • 13. 13 Conclusion Single Message Transforms in Kafka Connect • Lightweight transformation of individual messages • Configuration-only data pipelines • Pluggable, with lots of built-in transformations
  • 14. 14 Attend the whole series! Simplify Governance for Streaming Data in Apache Kafka Date: Thursday, April 6, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Gwen Shapira, Product Manager, Confluent Using Apache Kafka to Analyze Session Windows Date: Thursday, March 30, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Michael Noll, Product Manager, Confluent Monitoring and Alerting Apache Kafka with Confluent Control Center Date: Thursday, March 16, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Nick Dearden, Director, Engineering and Product Data Pipelines Made Simple with Apache Kafka Date: Thursday, March 23, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Ewen Cheslack-Postava, Engineer, Confluent https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/ What’s New in Apache Kafka 0.10.2 and Confluent 3.2 Date: Thursday, March 9, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Clarke Patterson, Senior Director, Product Marketing
  • 15. 15 Get Started with Apache Kafka Today! https://www.confluent.io/downloads/ THE place to start with Apache Kafka! Thoroughly tested and quality assured More extensible developer experience Easy upgrade path to Confluent Enterprise
  • 16. 16 Discount code: kafcom17  Use the Apache Kafka community discount code to get $50 off  www.kafka-summit.org Kafka Summit New York: May 8 Kafka Summit San Francisco: August 28 Presented by