London Apache Kafka Meetup (Jan 2017)

Delivering Fast Data Systems with Kafka
LANDOOP
www.landoop.com
Antonios Chalkiopoulos
18/1/2017

@chalkiopoulos
Open Source contributor
Big Data projects in Media, Betting, Retail and  
Investment Banks in London
Books
Author, Programming MapReduce with Scalding
 
Founder of Landoop

DevOps Big Data Scala
Automation Distributed Systems Monitoring
Hadoop Fast Data / Streams Kafka

KAFKA CONNECT
a bit of context

KAFKA CONNECT
“a common framework
for allowing stream data flow
between kafka and other systems”

Data is produced from a source and consumed to a sink.
Data Source
KafkaConnect
KafkaConnect
KAFKA Data SinkData Source
KafkaConnect
KafkaConnect
KAFKA Data Sink
Stream processing

Data Source
KafkaConnect
KafkaConnect
KAFKA Data Sink
Stream processing
E T L

Developers don’t care about: 
Move data to/from sink/source
Support delivery semantics
Offset Management
Serialization / de-serialization
Partitioning / Scalability
Fault tolerance / fail-over
Schema Registry integration
Developers care about: 
Domain specific transformations

CONNECTORS
Kafka Connect’s framework allows developers to create connectors that
copy data to/from other systems just by writing configuration files and
submitting them to Connect with no code necessary

Connector configurations are key-value mappings
name connector’s unique name
connector.class connector’s java class
tasks.max maximum tasks to create
topics list of topics (to source or sink data)

Introducing a query language for the connectors
name connector’s unique name
connector.class connector’s java class
tasks.max maximum tasks to create
topics list of topics (to source or sink data)
query KCQL query specifies fields/actions for the target system

KCQL
Kafka Connect Query Language
is a SQL like syntax allowing streamlined configuration of Kafka Sink Connectors and then some more..
Example:
Project fields, rename or ignore them and further customise in plain text
INSERT INTO transactions SELECT field1 AS column1, field2 AS column2, field3 FROM TransactionTopic;
INSERT INTO audits SELECT * FROM AuditsTopic;
INSERT INTO logs SELECT * FROM LogsTopic AUTOEVOLVE;
INSERT INTO invoices SELECT * FROM InvoiceTopic PK invoiceID;

So while integrating
Kafka with in-memory
data grid, key-value,
document stores,
NoSQL, search etc
systems..
INSERT INTO $TARGET
SELECT *|columns(i.e col1,col2 | col1 AS column1,col2)
FROM $TOPIC_NAME
[ IGNORE columns ]
[ AUTOCREATE ]
[ PK columns ]
[ AUTOEVOLVE ]
[ BATCH = N ]
[ CAPITALIZE ]
[ INITIALIZE ]
[ PARTITIONBY cola[,colb] ]
[ DISTRIBUTEBY cola[,colb] ]
[ CLUSTERBY cola[,colb] ]
[ TIMESTAMP cola|sys_current ]
[ STOREAS $YOUR_TYPE([key=value, .....]) ]
[ WITHFORMAT TEXT|AVRO|JSON|BINARY|OBJECT|MAP ]
KCQL
How does it look like?

Topic to target mapping
Field selection
Auto creation
Auto evolution
Error policies
Multiple KCQLs / topic  
- Field extraction 
- Access to Key & Metadata
Why KCQL ?

KCQL
Advanced Features Examples

KCQL |
{ "sensor_id": "01" , "temperature": 52.7943, "ts": 1484648810 }
{ “sensor_id": "02" , "temperature": 28.8597, "ts": 1484648810 }
Example Kafka topic with IoT data
INSERT INTO sensor_ringbuffer  
SELECT sensor_id, temperature, ts  
FROM coap_sensor_topic  
WITHFORMAT JSON 
STOREAS RING_BUFFER
INSERT INTO sensor_reliabletopic  
SELECT sensor_id, temperature, ts 
FROM coap_sensor_topic  
WITHFORMAT AVRO 
STOREAS RELIABLE_TOPIC

INSERT INTO FXSortedSet  
SELECT symbol, price  
FROM yahooFX-topic  
STOREAS SortedSet(score=ts)
SELECT price  
FROM yahooFX-topic  
PK symbol  
STOREAS SortedSet(score=ts)
KCQL |
{ "symbol": "USDGBP" , "price": 0.7943, "ts": 1484648810 }
{ "symbol": "EURGBP" , "price": 0.8597, "ts": 1484648810 }
Example Kafka topic with FX data
B:1 A:2 D:3 C:20
Sorted Set -> { value : score }

Stream reactor connectors support KCQL
kafka-connect-blockchain
kafka-connect-bloomberg
kafka-connect-cassandra
kafka-connect-coap
kafka-connect-druid
kafka-connect-elastic
kafka-connect-ftp
kafka-connect-hazelcast
kafka-connect-hbase
kafka-connect-influxdb
kafka-connect-jms
kafka-connect-kudu
kafka-connect-mongodb
kafka-connect-mqtt
kafka-connect-redis
kafka-connect-rethink
kafka-connect-voltdb
kafka-connect-yahoo
Source: https://github.com/datamountaineer/stream-reactor
Integration Tests: http://coyote.landoop.com/connect/

DEMO
Kafka Connect InfluxDB
We ‘ll need:
• Zookeeper
• Kafka Broker
• Schema Registry
• Kafka Connect Distributed
• Kafka REST Proxy
We ‘ll also use:
• StreamReactor connectors
• Landoop Fast Data Web Tools
docker run --rm -it
-p 2181:2181 -p 3030:3030 -p 8081:8081
-p 8082:8082 -p 8083:8083 -p 9092:9092
-e ADV_HOST=192.168.99.100
landoop/fast-data-dev
case class DeviceMeasurements( 
deviceId: Int,
temperature: Int,
moreData: String,
timestamp: Long)
We’ll generate some Avro messages

DEMO
Kafka Development Environment
@ Fast-data-dev docker image
https://hub.docker.com/r/landoop/fast-data-dev/

DEMO
Integration testing with Coyote
for connectors & infrastructure
https://github.com/Landoop/coyote

Schema Registry UI
https://github.com/Landoop/schema-registry-ui

Kafka Topics UI
https://github.com/Landoop/kafka-topics-ui

Kafka Connect UI
https://github.com/Landoop/kafka-connect-ui

Deployment 
apps
Containers  
mesos -kubernetes
Hadoop  
integration
* state-less apps = container-friendly 
schema registry, kafka connect
How do I IT?
Available features:  
Kafka ecosystem
StreamReactor
Connectors
Landoop web tools
Monitoring & Alerting
Security features

Wrap up
- KCQL
- Connectors
- Kafka Web Tools
- Automation & Integrations

Coming up
- Kafka backend
enhanced UIs | Timetravel

$ locate
https://github.com/Landoop
https://hub.docker.com/r/landoop/
https://github.com/datamountaineer/stream-reactor
http://www.landoop.com

London Apache Kafka Meetup (Jan 2017)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to London Apache Kafka Meetup (Jan 2017)

Similar to London Apache Kafka Meetup (Jan 2017) (20)

Recently uploaded

Recently uploaded (20)

London Apache Kafka Meetup (Jan 2017)

Editor's Notes