Data Pipelines Made Simple with Apache Kafka

1
Data Pipelines Made Simple
With Apache Kafka
Ewen Cheslack-Postava
Engineer, Apache Kafka Committer

2
Attend the whole series!
Simplify Governance for Streaming Data in Apache Kafka
Date: Thursday, April 6, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session Windows
Date: Thursday, March 30, 2017
Speaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent Control
Center
Speaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache Kafka
Speaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Speaker: Clarke Patterson, Senior Director, Product Marketing

3
The Challenge: Streaming Data Pipelines

4
Simplifying Streaming Data Pipelines with Apache Kafka

7
Single Message Transforms for Kafka Connect
Modify events before storing in
Kafka:
• Mask sensitive information
• Add identifiers
• Tag events
• Store lineage
• Remove unnecessary columns
Modify events going out of
Kafka:
• Route high priority events to
faster data stores
• Direct events to different
Elasticsearch indexes
• Cast data types to match
destination
• Remove unnecessary columns

8
Where Single Message Transforms Fit In

9
Built-in Transformations
• InsertField – Add a field using either static data or record metadata
• ReplaceField – Filter or rename fields
• MaskField – Replace field with valid null value for the type (0, empty string, etc)
• ValueToKey – Set the key to one of the value’s fields
• HoistField – Wrap the entire event as a single field inside a Struct or a Map
• ExtractField – Extract a specific field from Struct and Map and include only this field in results
• SetSchemaMetadata – modify the schema name or version
• TimestampRouter – Modify the topic of a record based on original topic and timestamp. Useful
when using a sink that needs to write to different tables or indexes based on timestamps
• RegexpRouter – modify the topic of a record based on original topic, replacement string and a
regular expression

10
Configuring Single Message Transforms
name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=test.txt
topic=connect-test
transforms=MakeMap,InsertSource
transforms.MakeMap.type=org.apache.kafka.connect.transforms.HoistField$Value
transforms.MakeMap.field=line
transforms.InsertSource.type=org.apache.kafka.connect.transforms.InsertField$Value
transforms.InsertSource.static.field=data_source
transforms.InsertSource.static.value=test-file-source

11
Why only single messages?
• Delivery guarantees!
• Always provide at least once semantics
• For supported connectors, provide exactly once semantics
• No additional complication: transformations happens inline with import/export

12
When should I use each tool?
Kafka Connect & Single Message Transforms
• Simple, message at a time
• Transformation can be performed inline
• Transformation does not interact with
external systems
Kafka Streams
• Complex transformations including
• Aggregations
• Windowing
• Joins
• Transformed data stored back in Kafka,
enabling reuse
• Write, deploy, and monitor a Java
application

13
Conclusion
Single Message Transforms in Kafka Connect
• Lightweight transformation of individual messages
• Configuration-only data pipelines
• Pluggable, with lots of built-in transformations

14
Attend the whole series!
Simplify Governance for Streaming Data in Apache Kafka
Date: Thursday, April 6, 2017
Speaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session Windows
Speaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent Control
Center
Speaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache Kafka
Speaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Speaker: Clarke Patterson, Senior Director, Product Marketing

15
Get Started with Apache Kafka Today!
https://www.confluent.io/downloads/
THE place to start with Apache Kafka!
Thoroughly tested and quality
assured
More extensible developer
experience
Easy upgrade path to
Confluent Enterprise

16
Discount code: kafcom17
Use the Apache Kafka community discount code to get $50 off
www.kafka-summit.org
Kafka Summit New York: May 8
Kafka Summit San Francisco: August 28
Presented by

Data Pipelines Made Simple with Apache Kafka

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Data Pipelines Made Simple with Apache Kafka

Similar to Data Pipelines Made Simple with Apache Kafka (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Data Pipelines Made Simple with Apache Kafka