Streaming ETL to Elastic with Apache Kafka and KSQL

1
Streaming ETL to Elastic
with Kafka and KSQL
San Francisco Elasticon, 1 March 2018
Nick Dearden

Kafka
Cluster
2
Apache Kafka®
Kafka
A Distributed Commit Log. Publish and subscribe to
streams of records. Highly scalable, high throughput.
Supports transactions. Persisted data.
Reads are a single seek & scan
Writes are
append only

3
Apache Kafka®
Kafka Streams API
Write standard Java applications & microservices
to process your data in real-time
Kafka Connect API
Reliable and scalable integration of Kafka
with other systems – no coding required.
Orders
Table
Customers
Kafka Streams API

4
Many Systems are a bit of a mess…

Event-Centric Thinking
Streaming
Platform
“A product was viewed”
Elasticweb
app

Streaming
Platform
web
app
mobile
app
APIs
Elastic

mobile
app
web
app
APIs
Streaming
Platform
Hadoop
Security
Monitoring
Elastic

System Availability and Event Buffering
Producer Elasticsearch

Native Stream Processing
Raw
logs
SLA
breaches
Alert
App
Stream
Processing
App
Serve
r

Visualise & Analyse data from Kafka

15
Integrating Elastic with Kafka - Beats, Logstash
output.kafka:
hosts: ["localhost:9092"]
topic: 'logs'
required_acks: 1
output {
kafka {
topic_id => "logstash_logs_json"
bootstrap_servers => "localhost:9092"
codec => json
}
}
Beats
Logstash

16
Kafka Connect : Stream data in and out of Kafka
Amazon S3

17
Kafka Connect's Elasticsearch Sink
{
"name": "es-sink",
"config": {
"connector.class":
"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"connection.url": "http://localhost:9200",
"type.name": "type.name=kafka-connect",
"topics": "foobar"
}
}

KSQLis the
Streaming
SQL Enginefor
Apache Kafka

19
KSQL: a Streaming SQL Engine for Apache Kafka® from Confluent
• Enables stream processing with zero coding required
• The simplest way to process streams of data in real-time
• Powered by Kafka: scalable, distributed, battle-tested
• All you need is Kafka–No complex deployments of bespoke
systems for stream processing

What is it for ?
● Streaming ETL
○ Kafka is popular for data pipelines.
○ KSQL enables easy transformations of data within the pipe
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';

What is it for ?
● Anomaly Detection
○ Identifying patterns or anomalies in real-time data, surfaced in milliseconds
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;

What is it for ?
● Real Time Monitoring
○ Log data monitoring, tracking and alerting
○ Sensor / IoT data
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;

23
Streaming Transformations with KSQL
Raw logs
App
Server
KSQL

24
HDFS / S3
Raw logs
App
Server
KSQL

25
HDFS / S3
Raw logs
App
Server Error logs Elasticsearch
KSQL
Filter

26
Filtering streams with KSQL
ksql> CREATE STREAM ERROR_LOGS AS
SELECT * FROM LOGS
WHERE RESPONSE >=400;
Message
----------------------------
Stream created and running
----------------------------

27
Raw logs Error logs
SLA
breaches
Elasticsearch
HDFS / S3
Alert App
KSQL
Filter / Aggregate / Join
App
Server

28
Monitoring thresholds with KSQL
ksql> CREATE TABLE SLA_BREACHES AS
SELECT RESPONSE, COUNT(*) AS REQUEST_COUNT
FROM LOGS
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE RESPONSE>=400
GROUP BY RESPONSE
HAVING COUNT(*) > 10;

29
Raw logs Error logs
SLA
breaches
Elasticsearch
HDFS / S3
Alert App
KSQL
Filter / Aggregate / Join
App
Server

30
Confluent Platform: Enterprise Streaming based on Apache Kafka®
Database Changes Log Events loT Data Web Events …
CRM
Data Warehouse
Database
Hadoop
Data
Integration
…
Monitoring
Analytics
Custom Apps
Transformations
Real-time Applications
…
Apache Open Source Confluent Open Source Confluent Enterprise
Confluent Platform
Confluent Platform
Apache Kafka®
Core | Connect API | Streams API
Data Compatibility
Schema Registry
Monitoring & Administration
Confluent Control Center | Security
Operations
Replicator | Auto Data Balancing
Development and Connectivity
Clients | Connectors | REST Proxy | CLI
Apache Open Source Confluent Open Source Confluent Enterprise
SQL Stream Processing
KSQL

Streaming ETL to Elastic with Apache Kafka and KSQL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Streaming ETL to Elastic with Apache Kafka and KSQL

Similar to Streaming ETL to Elastic with Apache Kafka and KSQL (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Streaming ETL to Elastic with Apache Kafka and KSQL