Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Chti jug - 2018-06-26

180 views

Published on

KSQL, Kafka Streams, Kafka

Published in: Software
  • Be the first to comment

  • Be the first to like this

Chti jug - 2018-06-26

  1. 1. Chti JUG Florent Ramière Technical Account Manager florent@confluent.io @framiere
  2. 2. Agenda 1.Confluent 2.Streaming 3.KSQL 4.Demo 5.Resources 6.Q&A!
  3. 3. Confluent
  4. 4. About Confluent and Apache Kafka™ 70% of active Kafka Committers Founded
 September 2014 Technology developed 
 while at LinkedIn Founded by the creators of Apache Kafka Cheryl Dalrymple
 CFO Luanne Dauber
 CMO Simon Hayes
 Head of Corporate & Business Development Jay Kreps
 CEO Todd Barnett
 VP WW Sales Neha Narkhede
 CTO, VP Engineering Sarah Sproehnle VP Customer Success
  5. 5. Why a Streaming Platform? All your data Real-time Fault tolerant Secure
  6. 6. Confluent Platform: Enterprise Streaming based on Apache Kafka Database Changes Log Events loT Data Web Events … CRM Data Warehouse Database Hadoop Data
 Integration … Monitoring Analytics Custom Apps Transformations Real-time Applications … Apache Open Source Confluent Open Source Confluent Enterprise Confluent Platform Apache Kafka® Core | Connect API | Streams API Data Compatibility Schema Registry Confluent Platform Monitoring & Administration Confluent Control Center | Security Operations Replicator | Auto Data Balancing | JMS Client | JMS Connectors Development and Connectivity Clients | Connectors | REST Proxy | CLI Apache Open Source Confluent Open Source Confluent Enterprise SQL Stream Processing KSQL (Streams API)
  7. 7. Confidential 7 Key concepts
  8. 8. Streaming
  9. 9. A Kafka story! https://github.com/framiere/a-kafka-story
  10. 10. No Panic, it's a walktrough! https://github.com/framiere/a-kafka-story/tree/master/step6
  11. 11. Trade-Offs • subscribe() • poll() • send() • flush() Consumer, Producer Flexibility Simplicity
  12. 12. Trade-Offs • subscribe() • poll() • send() • flush() Consumer, Producer • filter() • join() • aggregate() Kafka Streams Flexibility Simplicity
  13. 13. App Streams API Not running inside brokers!
  14. 14. Brokers? Nope! App Streams API App Streams API App Streams API Same app, many instances
  15. 15. Before DashboardProcessing Cluster Your Job Shared Database
  16. 16. After Dashboard APP Streams API
  17. 17. Things Kafka Streams Does Runs everywhere Clustering done for you Exactly-once processing Event-time processing Integrated database Joins, windowing, aggregation S/M/L/XL/XXL/XXXL sizes
  18. 18. Stream Processing in Kafka ● KStream KStream<byte[], String> textLines = builder .stream("textlines-topic", Consumed.with(Serdes.ByteArray(), Serdes.String())) .mapValues(String::toUpperCase)); KTable<String, Long> wordCounts = textLines .flatMapValues(textLine -> Arrays.asList(textLine.split("W+"))) .groupBy((key, word) -> word) .count(); ● KTable
  19. 19. Trade-Offs • subscribe() • poll() • send() • flush() Consumer, Producer • filter() • join() • aggregate() Kafka Streams • Copy In • Copy Out • SMT Kafka Connect Flexibility Simplicity
  20. 20. Apache Kafka™ Connect API – Streaming Data Capture JDBC Mongo MySQL Elastic Cassandra HDFS
 Kafka Connect API Kafka Pipeline Connector Connector Connector Connector Connector Connector Sources Sinks Fault tolerant Manage hundreds of data sources and sinks Preserves data schema Part of Apache Kafka project Integrated within Confluent Platform’s Control Center Flexible Integrated Reliable Compatible Connect any source to any target system
  21. 21. Single Message Transforms •Mask sensitive information •Add identifiers •Tag events •Lineage/provenance •Remove unnecessary columns •Route high priority events to faster data stores •Direct events to different Elasticsearch indexes •Cast data types to match destination •Remove unnecessary columns Modify events before storing in Kafka: Modify events going out of Kafka:
  22. 22. But…Easy to Implement /**
 * Single message transformation for Kafka Connect record types.
 *
 * Connectors can be configured with transformations to make lightweight * message-at-a-time modifications.
 */
 public interface Transformation<R extends ConnectRecord<R>> extends Configurable, Closeable {
 
 /**
 * Apply transformation to the {@code record} and return another record object.
 *
 * The implementation must be thread-safe.
 */
 R apply(R record); 
 
 /** Configuration specification for this transformation. **/
 ConfigDef config(); 
 
 /** Signal that this transformation instance will no longer will be used. **/
 @Override
 void close();
 
 }
  23. 23. Trade-Offs • subscribe() • poll() • send() • flush() Consumer, Producer • filter() • join() • aggregate() Kafka Streams • Select…from… • Join…where… • Group by.. KSQL Flexibility Simplicity
  24. 24. Declarative Stream Language Processing KSQLis a
  25. 25. KSQL for Data Exploration SELECT status, bytes FROM clickstream WHERE user_agent = 'Mozilla/5.0 (compatible; MSIE 6.0)'; An easy way to inspect data in a running cluster
  26. 26. KSQL for Streaming ETL • Kafka is popular for data pipelines. • KSQL enables easy transformations of data within the pipe. • Transforming data while moving from Kafka to another system. CREATE STREAM vip_actions AS 
 SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id 
 WHERE u.level = 'Platinum';
  27. 27. KSQL for Anomaly Detection CREATE TABLE possible_fraud AS
 SELECT card_number, count(*)
 FROM authorization_attempts 
 WINDOW TUMBLING (SIZE 5 SECONDS)
 GROUP BY card_number
 HAVING count(*) > 3; Identifying patterns or anomalies in real-time data, surfaced in milliseconds
  28. 28. Once again KSQL implements for you the Kafka Stream application you would have implemented if you had • ... the time • ... the experience • ... the KSQL as a spec • ... the willingness to do boring code
  29. 29. KSQL is really Kafka Stream ? ... yes! ./confluent start ./jmc& echo '{"something":"value"}' | ./kafka-console-producer --broker-list localhost:9092 --topic temp ./kafka-console-consumer --bootstrap-server localhost:9092 --topic temp --from-beginning {"something":"value"} ./ksql ksql>SET 'auto.offset.reset' = 'earliest'; ksql>CREATE STREAM TEMP (something varchar) WITH ( kafka_topic='temp',value_format='JSON'); ksql>SELECT * FROM TEMP; 1526371655810 | null | value
  30. 30. Where is KSQL not such a great fit? BI reports (Tableau etc.) • No indexes • No JDBC (most BI tools are not good with continuous results!) Ad-hoc queries • Limited span of time usually retained in Kafka • No indexes
  31. 31. Demo
  32. 32. Demo fun https://www.confluent.io/blog/taking-ksql-spin-using-real-time-device-data/
  33. 33. Demo ... less fun https://bit.ly/2KqPZYo
  34. 34. Demo ... less fun https://bit.ly/2L5l2dj
  35. 35. Demo ... less fun https://github.com/framiere/a-kafka-story/tree/master/step19 Change Data Capture Docker Producer Consumer Kafka Stream KSQL Event sourcing Influxdb Grafana S3 docker run --rm -it --name dcv -v $(pwd):/input pmsipilot/docker-compose-viz render --horizontal --output-format image --force docker-compose.yml
  36. 36. Demo ... less fun ... without Confluent Control Center links
  37. 37. Confluent 4.2 - Nested Types SELECT userid, address.city FROM users WHERE address.state = 'CA' https://github.com/confluentinc/ksql/pull/1114
  38. 38. Confluent 4.2 - Remaining joins SELECT orderid, shipmentid FROM orders INNER JOIN shipments ON order.id = shipmentid;
  39. 39. Where to go from here ● KSQL project page ○ https://www.confluent.io/product/ksql ● Confluent blog ○ http://blog.confluent.io/ ● Blog Formule1 game ○ https://www.confluent.io/blog/taking-ksql-spin-using-real-time-device-data/ ● KSQL github repo ○ https://github.com/confluentinc/ksql ● CP-Demo ○ https://github.com/confluentinc/cp-demo ● A-Kafka-Story ○ https://github.com/framiere/a-kafka-story ● Un tour de l'environement Kafka ○ https://www.youtube.com/watch?v=BBo-rqmhpDM ● KSQL Recipies ○ https://github.com/bluemonk3y/ksql-recipe-fraudulent-txns/
  40. 40. Confluent Download – 4.1 – Kafka 1.1 and KSQL GA
  41. 41. KSQL Capacity Planning – Sizing https://docs.confluent.io/current/ksql/docs/capacity-planning.html
  42. 42. Resources - Confluent Enterprise Reference Architecture https://www.confluent.io/whitepaper/confluent-enterprise-reference- architecture/
  43. 43. Resources – Community Slack and Mailing List https://slackpass.io/confluentcommunity https://groups.google.com/forum/#!forum/confluent-platform
  44. 44. Devoxx France https://www.youtube.com/watch?v=BBo-rqmhpDM
  45. 45. Q&A

×