Chti JUG
Florent Ramière
Technical Account Manager
florent@confluent.io
@framiere
Agenda
1.Confluent
2.Streaming
3.KSQL
4.Demo
5.Resources
6.Q&A!
Confluent
About Confluent and Apache Kafka™
70% of active Kafka Committers
Founded

September 2014
Technology developed 

while at LinkedIn
Founded by the creators of
Apache Kafka
Cheryl Dalrymple

CFO
Luanne Dauber

CMO
Simon Hayes

Head of Corporate &
Business Development
Jay Kreps

CEO
Todd Barnett

VP WW Sales
Neha Narkhede

CTO, VP Engineering
Sarah Sproehnle
VP Customer Success
Why a Streaming Platform?
All your data
Real-time
Fault tolerant
Secure
Confluent Platform: Enterprise Streaming based on Apache Kafka
Database Changes Log Events loT Data Web Events …
CRM
Data Warehouse
Database
Hadoop
Data

Integration
…
Monitoring
Analytics
Custom Apps
Transformations
Real-time Applications
…
Apache Open Source Confluent Open Source Confluent Enterprise
Confluent Platform
Apache Kafka®
Core | Connect API | Streams API
Data Compatibility
Schema Registry
Confluent Platform
Monitoring & Administration
Confluent Control Center | Security
Operations
Replicator | Auto Data Balancing | JMS Client | JMS Connectors
Development and Connectivity
Clients | Connectors | REST Proxy | CLI
Apache Open Source Confluent Open Source Confluent Enterprise
SQL Stream Processing
KSQL (Streams API)
Confidential 7
Key concepts
Streaming
A Kafka story!
https://github.com/framiere/a-kafka-story
No Panic, it's a walktrough!
https://github.com/framiere/a-kafka-story/tree/master/step6
Trade-Offs
• subscribe()
• poll()
• send()
• flush()
Consumer,
Producer
Flexibility Simplicity
Trade-Offs
• subscribe()
• poll()
• send()
• flush()
Consumer,
Producer
• filter()
• join()
• aggregate()
Kafka Streams
Flexibility Simplicity
App
Streams
API
Not running
inside brokers!
Brokers?
Nope!
App
Streams
API
App
Streams
API
App
Streams
API
Same app, many instances
Before
DashboardProcessing Cluster
Your Job
Shared Database
After
Dashboard
APP
Streams
API
Things Kafka Streams Does
Runs
everywhere
Clustering
done for you
Exactly-once
processing
Event-time
processing
Integrated
database
Joins, windowing,
aggregation
S/M/L/XL/XXL/XXXL
sizes
Stream Processing in Kafka
● KStream
KStream<byte[], String> textLines = builder
.stream("textlines-topic", Consumed.with(Serdes.ByteArray(), Serdes.String()))
.mapValues(String::toUpperCase));
KTable<String, Long> wordCounts = textLines
.flatMapValues(textLine -> Arrays.asList(textLine.split("W+")))
.groupBy((key, word) -> word)
.count();
● KTable
Trade-Offs
• subscribe()
• poll()
• send()
• flush()
Consumer,
Producer
• filter()
• join()
• aggregate()
Kafka Streams
• Copy In
• Copy Out
• SMT
Kafka Connect
Flexibility Simplicity
Apache Kafka™ Connect API – Streaming Data Capture
JDBC
Mongo
MySQL
Elastic
Cassandra
HDFS

Kafka Connect API
Kafka Pipeline
Connector
Connector
Connector
Connector
Connector
Connector
Sources Sinks
Fault tolerant
Manage hundreds of data
sources and sinks
Preserves data schema
Part of Apache Kafka project
Integrated within Confluent
Platform’s Control Center
Flexible Integrated Reliable Compatible
Connect any source to any target system
Single Message Transforms
•Mask sensitive information
•Add identifiers
•Tag events
•Lineage/provenance
•Remove unnecessary
columns
•Route high priority events to
faster data stores
•Direct events to different
Elasticsearch indexes
•Cast data types to match
destination
•Remove unnecessary
columns
Modify events before storing in
Kafka:
Modify events going out of Kafka:
But…Easy to Implement
/**

* Single message transformation for Kafka Connect record types.

*

* Connectors can be configured with transformations to make lightweight
* message-at-a-time modifications.

*/

public interface Transformation<R extends ConnectRecord<R>> extends Configurable, Closeable {



/**

* Apply transformation to the {@code record} and return another record object.

*

* The implementation must be thread-safe.

*/

R apply(R record);




/** Configuration specification for this transformation. **/

ConfigDef config();




/** Signal that this transformation instance will no longer will be used. **/

@Override

void close();



}
Trade-Offs
• subscribe()
• poll()
• send()
• flush()
Consumer,
Producer
• filter()
• join()
• aggregate()
Kafka Streams
• Select…from…
• Join…where…
• Group by..
KSQL
Flexibility Simplicity
Declarative
Stream
Language
Processing
KSQLis a
KSQL for Data Exploration
SELECT status, bytes
FROM clickstream
WHERE user_agent =
'Mozilla/5.0 (compatible; MSIE 6.0)';
An easy way to inspect data in a running cluster
KSQL for Streaming ETL
• Kafka is popular for data pipelines.
• KSQL enables easy transformations of data within the pipe.
• Transforming data while moving from Kafka to another system.
CREATE STREAM vip_actions AS 

SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id 

WHERE u.level = 'Platinum';
KSQL for Anomaly Detection
CREATE TABLE possible_fraud AS

SELECT card_number, count(*)

FROM authorization_attempts 

WINDOW TUMBLING (SIZE 5 SECONDS)

GROUP BY card_number

HAVING count(*) > 3;
Identifying patterns or anomalies in real-time data,
surfaced in milliseconds
Once again
KSQL implements for you the Kafka Stream
application you would have implemented if you had
• ... the time
• ... the experience
• ... the KSQL as a spec
• ... the willingness to do boring code
KSQL is really Kafka Stream ? ... yes!
./confluent start
./jmc&
echo '{"something":"value"}' | ./kafka-console-producer --broker-list localhost:9092 --topic temp
./kafka-console-consumer --bootstrap-server localhost:9092 --topic temp --from-beginning
{"something":"value"}
./ksql
ksql>SET 'auto.offset.reset' = 'earliest';
ksql>CREATE STREAM TEMP (something varchar) WITH ( kafka_topic='temp',value_format='JSON');
ksql>SELECT * FROM TEMP;
1526371655810 | null | value
Where is KSQL not such a great fit?
BI reports (Tableau etc.)
• No indexes
• No JDBC (most BI tools are not
good with continuous results!)
Ad-hoc queries
• Limited span of time usually
retained in Kafka
• No indexes
Demo
Demo fun
https://www.confluent.io/blog/taking-ksql-spin-using-real-time-device-data/
Demo ... less fun
https://bit.ly/2KqPZYo
Demo ... less fun
https://bit.ly/2L5l2dj
Demo ... less fun
https://github.com/framiere/a-kafka-story/tree/master/step19
Change Data Capture
Docker
Producer
Consumer
Kafka Stream
KSQL
Event sourcing
Influxdb
Grafana
S3
docker run --rm -it --name dcv -v $(pwd):/input pmsipilot/docker-compose-viz 
render --horizontal --output-format image --force 
docker-compose.yml
Demo ... less fun
... without Confluent Control Center links
Confluent 4.2 - Nested Types
SELECT userid, address.city
FROM users
WHERE address.state = 'CA'
https://github.com/confluentinc/ksql/pull/1114
Confluent 4.2 - Remaining joins
SELECT orderid, shipmentid
FROM orders INNER JOIN shipments
ON order.id = shipmentid;
Where to go from here
● KSQL project page
○ https://www.confluent.io/product/ksql
● Confluent blog
○ http://blog.confluent.io/
● Blog Formule1 game
○ https://www.confluent.io/blog/taking-ksql-spin-using-real-time-device-data/
● KSQL github repo
○ https://github.com/confluentinc/ksql
● CP-Demo
○ https://github.com/confluentinc/cp-demo
● A-Kafka-Story
○ https://github.com/framiere/a-kafka-story
● Un tour de l'environement Kafka
○ https://www.youtube.com/watch?v=BBo-rqmhpDM
● KSQL Recipies
○ https://github.com/bluemonk3y/ksql-recipe-fraudulent-txns/
Confluent Download – 4.1 – Kafka 1.1 and KSQL GA
KSQL Capacity Planning – Sizing
https://docs.confluent.io/current/ksql/docs/capacity-planning.html
Resources - Confluent Enterprise Reference Architecture
https://www.confluent.io/whitepaper/confluent-enterprise-reference-
architecture/
Resources – Community Slack and Mailing List
https://slackpass.io/confluentcommunity
https://groups.google.com/forum/#!forum/confluent-platform
Devoxx France
https://www.youtube.com/watch?v=BBo-rqmhpDM
Q&A

Chti jug - 2018-06-26

  • 1.
    Chti JUG Florent Ramière TechnicalAccount Manager florent@confluent.io @framiere
  • 2.
  • 3.
  • 4.
    About Confluent andApache Kafka™ 70% of active Kafka Committers Founded
 September 2014 Technology developed 
 while at LinkedIn Founded by the creators of Apache Kafka Cheryl Dalrymple
 CFO Luanne Dauber
 CMO Simon Hayes
 Head of Corporate & Business Development Jay Kreps
 CEO Todd Barnett
 VP WW Sales Neha Narkhede
 CTO, VP Engineering Sarah Sproehnle VP Customer Success
  • 5.
    Why a StreamingPlatform? All your data Real-time Fault tolerant Secure
  • 6.
    Confluent Platform: EnterpriseStreaming based on Apache Kafka Database Changes Log Events loT Data Web Events … CRM Data Warehouse Database Hadoop Data
 Integration … Monitoring Analytics Custom Apps Transformations Real-time Applications … Apache Open Source Confluent Open Source Confluent Enterprise Confluent Platform Apache Kafka® Core | Connect API | Streams API Data Compatibility Schema Registry Confluent Platform Monitoring & Administration Confluent Control Center | Security Operations Replicator | Auto Data Balancing | JMS Client | JMS Connectors Development and Connectivity Clients | Connectors | REST Proxy | CLI Apache Open Source Confluent Open Source Confluent Enterprise SQL Stream Processing KSQL (Streams API)
  • 7.
  • 8.
  • 9.
  • 10.
    No Panic, it'sa walktrough! https://github.com/framiere/a-kafka-story/tree/master/step6
  • 11.
    Trade-Offs • subscribe() • poll() •send() • flush() Consumer, Producer Flexibility Simplicity
  • 12.
    Trade-Offs • subscribe() • poll() •send() • flush() Consumer, Producer • filter() • join() • aggregate() Kafka Streams Flexibility Simplicity
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    Things Kafka StreamsDoes Runs everywhere Clustering done for you Exactly-once processing Event-time processing Integrated database Joins, windowing, aggregation S/M/L/XL/XXL/XXXL sizes
  • 18.
    Stream Processing inKafka ● KStream KStream<byte[], String> textLines = builder .stream("textlines-topic", Consumed.with(Serdes.ByteArray(), Serdes.String())) .mapValues(String::toUpperCase)); KTable<String, Long> wordCounts = textLines .flatMapValues(textLine -> Arrays.asList(textLine.split("W+"))) .groupBy((key, word) -> word) .count(); ● KTable
  • 19.
    Trade-Offs • subscribe() • poll() •send() • flush() Consumer, Producer • filter() • join() • aggregate() Kafka Streams • Copy In • Copy Out • SMT Kafka Connect Flexibility Simplicity
  • 20.
    Apache Kafka™ ConnectAPI – Streaming Data Capture JDBC Mongo MySQL Elastic Cassandra HDFS
 Kafka Connect API Kafka Pipeline Connector Connector Connector Connector Connector Connector Sources Sinks Fault tolerant Manage hundreds of data sources and sinks Preserves data schema Part of Apache Kafka project Integrated within Confluent Platform’s Control Center Flexible Integrated Reliable Compatible Connect any source to any target system
  • 21.
    Single Message Transforms •Masksensitive information •Add identifiers •Tag events •Lineage/provenance •Remove unnecessary columns •Route high priority events to faster data stores •Direct events to different Elasticsearch indexes •Cast data types to match destination •Remove unnecessary columns Modify events before storing in Kafka: Modify events going out of Kafka:
  • 22.
    But…Easy to Implement /**
 *Single message transformation for Kafka Connect record types.
 *
 * Connectors can be configured with transformations to make lightweight * message-at-a-time modifications.
 */
 public interface Transformation<R extends ConnectRecord<R>> extends Configurable, Closeable {
 
 /**
 * Apply transformation to the {@code record} and return another record object.
 *
 * The implementation must be thread-safe.
 */
 R apply(R record); 
 
 /** Configuration specification for this transformation. **/
 ConfigDef config(); 
 
 /** Signal that this transformation instance will no longer will be used. **/
 @Override
 void close();
 
 }
  • 23.
    Trade-Offs • subscribe() • poll() •send() • flush() Consumer, Producer • filter() • join() • aggregate() Kafka Streams • Select…from… • Join…where… • Group by.. KSQL Flexibility Simplicity
  • 24.
  • 25.
    KSQL for DataExploration SELECT status, bytes FROM clickstream WHERE user_agent = 'Mozilla/5.0 (compatible; MSIE 6.0)'; An easy way to inspect data in a running cluster
  • 26.
    KSQL for StreamingETL • Kafka is popular for data pipelines. • KSQL enables easy transformations of data within the pipe. • Transforming data while moving from Kafka to another system. CREATE STREAM vip_actions AS 
 SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id 
 WHERE u.level = 'Platinum';
  • 27.
    KSQL for AnomalyDetection CREATE TABLE possible_fraud AS
 SELECT card_number, count(*)
 FROM authorization_attempts 
 WINDOW TUMBLING (SIZE 5 SECONDS)
 GROUP BY card_number
 HAVING count(*) > 3; Identifying patterns or anomalies in real-time data, surfaced in milliseconds
  • 28.
    Once again KSQL implementsfor you the Kafka Stream application you would have implemented if you had • ... the time • ... the experience • ... the KSQL as a spec • ... the willingness to do boring code
  • 29.
    KSQL is reallyKafka Stream ? ... yes! ./confluent start ./jmc& echo '{"something":"value"}' | ./kafka-console-producer --broker-list localhost:9092 --topic temp ./kafka-console-consumer --bootstrap-server localhost:9092 --topic temp --from-beginning {"something":"value"} ./ksql ksql>SET 'auto.offset.reset' = 'earliest'; ksql>CREATE STREAM TEMP (something varchar) WITH ( kafka_topic='temp',value_format='JSON'); ksql>SELECT * FROM TEMP; 1526371655810 | null | value
  • 30.
    Where is KSQLnot such a great fit? BI reports (Tableau etc.) • No indexes • No JDBC (most BI tools are not good with continuous results!) Ad-hoc queries • Limited span of time usually retained in Kafka • No indexes
  • 31.
  • 32.
  • 33.
    Demo ... lessfun https://bit.ly/2KqPZYo
  • 34.
    Demo ... lessfun https://bit.ly/2L5l2dj
  • 35.
    Demo ... lessfun https://github.com/framiere/a-kafka-story/tree/master/step19 Change Data Capture Docker Producer Consumer Kafka Stream KSQL Event sourcing Influxdb Grafana S3 docker run --rm -it --name dcv -v $(pwd):/input pmsipilot/docker-compose-viz render --horizontal --output-format image --force docker-compose.yml
  • 36.
    Demo ... lessfun ... without Confluent Control Center links
  • 37.
    Confluent 4.2 -Nested Types SELECT userid, address.city FROM users WHERE address.state = 'CA' https://github.com/confluentinc/ksql/pull/1114
  • 38.
    Confluent 4.2 -Remaining joins SELECT orderid, shipmentid FROM orders INNER JOIN shipments ON order.id = shipmentid;
  • 39.
    Where to gofrom here ● KSQL project page ○ https://www.confluent.io/product/ksql ● Confluent blog ○ http://blog.confluent.io/ ● Blog Formule1 game ○ https://www.confluent.io/blog/taking-ksql-spin-using-real-time-device-data/ ● KSQL github repo ○ https://github.com/confluentinc/ksql ● CP-Demo ○ https://github.com/confluentinc/cp-demo ● A-Kafka-Story ○ https://github.com/framiere/a-kafka-story ● Un tour de l'environement Kafka ○ https://www.youtube.com/watch?v=BBo-rqmhpDM ● KSQL Recipies ○ https://github.com/bluemonk3y/ksql-recipe-fraudulent-txns/
  • 40.
    Confluent Download –4.1 – Kafka 1.1 and KSQL GA
  • 41.
    KSQL Capacity Planning– Sizing https://docs.confluent.io/current/ksql/docs/capacity-planning.html
  • 42.
    Resources - ConfluentEnterprise Reference Architecture https://www.confluent.io/whitepaper/confluent-enterprise-reference- architecture/
  • 43.
    Resources – CommunitySlack and Mailing List https://slackpass.io/confluentcommunity https://groups.google.com/forum/#!forum/confluent-platform
  • 44.
  • 45.