1. WHAT YOU SEE
IS WHAT YOU GET
Kafka Connect
implementation at GumGum
08.15.2017
2. 2
About GumGum
! Artificial Intelligence company
! 9 year old, 225 employees
! Offices in New York, Chicago, London, Sydney
! Thousands of Publishers and Advertisers
! Process billions of impressions every day
12. Overriding Kafka Connect classes
12
! Need to compress our events
○ Need to compress the data to reduce S3 costs
○ Custom implementation of the Avro Record Writer Provider using
SNAPPY Compression (Available in Confluent platform 3.3.0)
○ Gzip compression for some of our other events
1 Introduction
1 public class RTBTimestampExtractor implements TimestampExtractor {
2
3 @Override
4 public Long extract(ConnectRecord<?> record) {
5 Object value = record.value();
6 if (value instanceof Struct) {
7 Struct struct = (Struct) value;
8 value = struct.get("eventMetadata");
9 if (value instanceof Struct) {
10 Struct eventMetadataStruct = (Struct) value;
11 Object timestamp = eventMetadataStruct.get("timestamp");
12 if (timestamp instanceof Long) {
13 return (Long) timestamp;
14 }
15 ...
1 public class GumGumAvroRecordWriterProvider extends AvroRecordWriterProvider {
2
3 @Override
4 public RecordWriter getRecordWriter(final S3SinkConnectorConfig conf,
5 final String filename) {
6 // This is not meant to be a thread-safe writer!
7 return new RecordWriter() {
8 final DataFileWriter<Object> writer =
9 new DataFileWriter<>(new GenericDatumWriter<>())
10 .setCodec(CodecFactory.snappyCodec());
11 ...
13. Overriding Kafka Connect classes
13
! Creating a String format
Tue Jul 04 01:00:00 -0700 2017, {"id":"32237763-4c55-4d35-84df-23f8be320449","t":
1499155200608,"cl":"js","ua":"Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X)
AppleWebKit/603.1.30 (KHTML, like Gecko) Mobile/14E304 [FBAN/FBIOS;FBAV/99.0.0.57.70;FBBV/
63577032;FBDV/iPhone5,3;FBMD/iPhone;FBSN/iOS;FBSV/10.3.1;FBSS/2;FBCR/Verizon;FBID/phone;FBLC/
en_US;FBOP/5;FBRV/0]","bty":2,"bfa":"Facebook App","bn":"Facebook","bof":"iOS","bon":"iPhone
OS","ip":"141.239.172.162","cc":"US","rg":"HI","ct":"Kailua","pc":"96734","mc":
744,"isp":"Hawaiian Telcom","bf":"704a0c01a4995359fc8c336d5751d0ad17f1c301","lt":"Mon Jul 03
22:00:00 -1000 2017","sip":"10.11.152.18","awsr":"us-west-1"},
{"v":"1.1","pv":"0e27633e-025b-43fd-a971-9ebf854188c0","r":"release-1211-15-
gfa55c30","t":"5e6e2525","a":[{"i":11,"u":"http://wishesndishes.com/images/adthrive/2017/06/
Weekly-Meal-Plan-Week-100-480x480.jpg","w":300,"h":300,"x":10,"y":
10367,"lt":"in","af":false,"lu":"http://wishesndishes.com/weekly-meal-plan-week-100/?
m&m","ia":"Weekly Meal Plan {Week 100} - 10 great bloggers bringing you a full week of summer
recipes including dinner, sides dishes, and desserts!"}],"rf":"http://wishesndishes.com/
creamy-pecan-crunch-grape-salad/","p":"http://wishesndishes.com/creamy-pecan-crunch-grape-
salad/?m","fs":false,"ce":true,"ac":{"25855":5},"vp":{"ii":false,"w":320,"h":546},"sc":{"w":
320,"h":568,"d":2},"tr":0.6,"pid":11685,"pn":"Ad Thrive","vid":16,"ths":["GGT0"],"aevt":
["GGE24-3","GGE24-4","GGE26-1"],"pcat":["IAB8","IAB8-1"],"ss":"0.75","hk":
["pecan","bloggers","bringing","dishes","crunch","desserts","dinner","creamy","salad","dishes
and desserts"],"ut":[1,2,34,3,4,20,6,9,10]}
19. 19
! Schema: Defines the possible fields of the message
! Use Maven plugin when generating your schema
! Make sure you use the schema evolution properties properly
! Kafka-Connect performance can decrease drastically because of a
schema evolution
Schema evolution
11 Object timestamp = eventMetadataStruct.get("timestamp");
12 if (timestamp instanceof Long) {
13 return (Long) timestamp;
14 }
15 ...
1 public class GumGumAvroRecordWriterProvider extends AvroRecordWriterProvider {
2
3 @Override
4 public RecordWriter getRecordWriter(final S3SinkConnectorConfig conf,
5 final String filename) {
6 // This is not meant to be a thread-safe writer!
7 return new RecordWriter() {
8 final DataFileWriter<Object> writer =
9 new DataFileWriter<>(new GenericDatumWriter<>())
10 .setCodec(CodecFactory.snappyCodec());
11 ...
1 {"namespace": "example.avro",
2 "type": "record",
3 "name": "User",
4 "fields": [
5 {"name": "name", "type": "string"},
6 {"name": "favorite_number", "type": ["int", "null"]},
7 {"name": "favorite_color", "type": ["string", "null"]}
8 ]
9 }
25. 25
Monitoring Kafka Connect
! Use of Zookeeper and Kafka monitoring tools to carefully monitor the lag
○ AWS Cloud Watch Alerts
! Monitoring of the connectors with the Kafka-Connect REST API
26. 26
Auto remediation
! Monitoring of the connectors with the Kafka-Connect REST API
○ What happen when something fails?
○ Only 8 hours of data in Kafka - Need to recover quickly
○ Notification on connector failure
27. 27
Auto remediation
! In case of massive outage of Kafka-Connect, what to do with invalid
offsets?
○ auto.offset.reset property