Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

London Apache Kafka Meetup (Jan 2017)


Published on

Landoop presenting how to simplify your ETL process using Kafka Connect for (E) and (L). Introducing KCQL - the Kafka Connect Query Language & how it can simplify fast-data (ingress & egress) pipelines. How KCQL can be used to set up Kafka Connectors for popular in-memory and analytical systems and live demos with HazelCast, Redis and InfluxDB. How to get started with a fast-data docker kafka development environment. Enhance your existing Cloudera (Hadoop) clusters with fast-data capabilities.

Published in: Technology
  • How Bookies CHEAT and How to beat them?●●●
    Are you sure you want to  Yes  No
    Your message goes here

London Apache Kafka Meetup (Jan 2017)

  1. 1. Delivering Fast Data Systems with Kafka LANDOOP Antonios Chalkiopoulos 18/1/2017
  2. 2. @chalkiopoulos Open Source contributor Big Data projects in Media, Betting, Retail and 
 Investment Banks in London Books Author, Programming MapReduce with Scalding 
 Founder of Landoop
  3. 3. DevOps Big Data Scala Automation Distributed Systems Monitoring Hadoop Fast Data / Streams Kafka
  4. 4. KAFKA CONNECT a bit of context
  5. 5. KAFKA CONNECT “a common framework for allowing stream data flow between kafka and other systems”
  6. 6. Data is produced from a source and consumed to a sink. Data Source KafkaConnect KafkaConnect KAFKA Data SinkData Source KafkaConnect KafkaConnect KAFKA Data Sink Stream processing
  7. 7. Data Source KafkaConnect KafkaConnect KAFKA Data Sink Stream processing E T L
  8. 8. Developers don’t care about:
 Move data to/from sink/source Support delivery semantics Offset Management Serialization / de-serialization Partitioning / Scalability Fault tolerance / fail-over Schema Registry integration Developers care about:
 Domain specific transformations
  9. 9. CONNECTORS Kafka Connect’s framework allows developers to create connectors that copy data to/from other systems just by writing configuration files and submitting them to Connect with no code necessary
  10. 10. Connector configurations are key-value mappings name connector’s unique name connector.class connector’s java class tasks.max maximum tasks to create topics list of topics (to source or sink data)
  11. 11. Introducing a query language for the connectors name connector’s unique name connector.class connector’s java class tasks.max maximum tasks to create topics list of topics (to source or sink data) query KCQL query specifies fields/actions for the target system
  12. 12. KCQL Kafka Connect Query Language is a SQL like syntax allowing streamlined configuration of Kafka Sink Connectors and then some more.. Example: Project fields, rename or ignore them and further customise in plain text INSERT INTO transactions SELECT field1 AS column1, field2 AS column2, field3 FROM TransactionTopic; INSERT INTO audits SELECT * FROM AuditsTopic; INSERT INTO logs SELECT * FROM LogsTopic AUTOEVOLVE; INSERT INTO invoices SELECT * FROM InvoiceTopic PK invoiceID;
  13. 13. So while integrating Kafka with in-memory data grid, key-value, document stores, NoSQL, search etc systems.. INSERT INTO $TARGET SELECT *|columns(i.e col1,col2 | col1 AS column1,col2) FROM $TOPIC_NAME [ IGNORE columns ] [ AUTOCREATE ] [ PK columns ] [ AUTOEVOLVE ] [ BATCH = N ] [ CAPITALIZE ] [ INITIALIZE ] [ PARTITIONBY cola[,colb] ] [ DISTRIBUTEBY cola[,colb] ] [ CLUSTERBY cola[,colb] ] [ TIMESTAMP cola|sys_current ] [ STOREAS $YOUR_TYPE([key=value, .....]) ] [ WITHFORMAT TEXT|AVRO|JSON|BINARY|OBJECT|MAP ] KCQL How does it look like?
  14. 14. Topic to target mapping Field selection Auto creation Auto evolution Error policies Multiple KCQLs / topic 
 - Field extraction
 - Access to Key & Metadata Why KCQL ?
  15. 15. KCQL Advanced Features Examples
  16. 16. KCQL | { "sensor_id": "01" , "temperature": 52.7943, "ts": 1484648810 } { “sensor_id": "02" , "temperature": 28.8597, "ts": 1484648810 } Example Kafka topic with IoT data INSERT INTO sensor_ringbuffer 
 SELECT sensor_id, temperature, ts 
 FROM coap_sensor_topic 
 STOREAS RING_BUFFER INSERT INTO sensor_reliabletopic 
 SELECT sensor_id, temperature, ts
 FROM coap_sensor_topic 
  17. 17. INSERT INTO FXSortedSet 
 SELECT symbol, price 
 FROM yahooFX-topic 
 STOREAS SortedSet(score=ts) SELECT price 
 FROM yahooFX-topic 
 PK symbol 
 STOREAS SortedSet(score=ts) KCQL | { "symbol": "USDGBP" , "price": 0.7943, "ts": 1484648810 } { "symbol": "EURGBP" , "price": 0.8597, "ts": 1484648810 } Example Kafka topic with FX data B:1 A:2 D:3 C:20 Sorted Set -> { value : score }
  18. 18. Stream reactor connectors support KCQL kafka-connect-blockchain kafka-connect-bloomberg kafka-connect-cassandra kafka-connect-coap kafka-connect-druid kafka-connect-elastic kafka-connect-ftp kafka-connect-hazelcast kafka-connect-hbase kafka-connect-influxdb kafka-connect-jms kafka-connect-kudu kafka-connect-mongodb kafka-connect-mqtt kafka-connect-redis kafka-connect-rethink kafka-connect-voltdb kafka-connect-yahoo Source: Integration Tests:
  19. 19. DEMO Kafka Connect InfluxDB We ‘ll need: • Zookeeper • Kafka Broker • Schema Registry • Kafka Connect Distributed • Kafka REST Proxy We ‘ll also use: • StreamReactor connectors • Landoop Fast Data Web Tools docker run --rm -it -p 2181:2181 -p 3030:3030 -p 8081:8081 -p 8082:8082 -p 8083:8083 -p 9092:9092 -e ADV_HOST= landoop/fast-data-dev case class DeviceMeasurements(
 deviceId: Int, temperature: Int, moreData: String, timestamp: Long) We’ll generate some Avro messages
  20. 20. DEMO Kafka Development Environment @ Fast-data-dev docker image
  21. 21. DEMO Integration testing with Coyote for connectors & infrastructure
  22. 22. Schema Registry UI
  23. 23. Kafka Topics UI
  24. 24. Kafka Connect UI
  25. 25. Connectors Performance
  26. 26. Monitoring & Alerting via JMX
  27. 27. Deployment
 apps Containers 
 mesos -kubernetes Hadoop 
 integration * state-less apps = container-friendly
 schema registry, kafka connect How do I IT? Available features: 
 Kafka ecosystem StreamReactor Connectors Landoop web tools Monitoring & Alerting Security features
  28. 28. Wrap up - KCQL - Connectors - Kafka Web Tools - Automation & Integrations
  29. 29. Coming up - Kafka backend enhanced UIs | Timetravel
  30. 30. $ locate
  31. 31. Thank you ;)