SlideShare a Scribd company logo
1 of 28
Download to read offline
WHAT YOU SEE
IS WHAT YOU GET
Kafka Connect
implementation at GumGum
08.15.2017
2
About GumGum
! Artificial Intelligence company
! 9 year old, 225 employees
! Offices in New York, Chicago, London, Sydney
! Thousands of Publishers and Advertisers
! Process billions of impressions every day
3
Advertising
4
GumGum Sports
GumGum’s
Architecture
Previous Architecture: Pipeline A
Real Time
Primary
AWS
Redshift
Amazon S3File Uplaod
Previous Architecture: Pipeline B
AWS
Redshift
Amazon S3
8
! Stateful Ad Servers
! Data Loss
! Reducing Network Transfer
Problems with that architecture
Migration to
Kafka Connect
Our Constraints
10
! No duplicate events
! Consume all the messages from Kafka
! Kafka Connect must integrate with the current storage format
Overriding Kafka Connect classes
11
! Overriding the S3 sink destination
○ From bucket/topic/topicName/ to our constraints
7 },
8 "lastName": {
9 "type": "string"
10 },
11 "age": {
12 "type": "integer",
13 "minimum": 0
14 }
15 },
16 "required": ["firstName", "lastName"]}
1 public class TopicPartitionWriter {
2 ...
3 private String fileKey(String keyPrefix, String name) {
4 // return topicsPrefix + dirDelim + keyPrefix + dirDelim + name;
5 return keyPrefix + dirDelim + name;
6 }
7
8 private String fileKeyToCommit(String dirPrefix, long startOffset) {
9 String name = tp.topic()
10 + fileDelim
11 + tp.partition()
12 + fileDelim
13 + String.format(zeroPadOffsetFormat, startOffset)
14 + extension;
15 // return fileKey(topicsDir, dirPrefix, name);
16 return fileKey(dirPrefix, name);
17 }
18 ...
Overriding Kafka Connect classes
12
! Need to compress our events
○ Need to compress the data to reduce S3 costs
○ Custom implementation of the Avro Record Writer Provider using
SNAPPY Compression (Available in Confluent platform 3.3.0)
○ Gzip compression for some of our other events
1 Introduction
1 public class RTBTimestampExtractor implements TimestampExtractor {
2
3 @Override
4 public Long extract(ConnectRecord<?> record) {
5 Object value = record.value();
6 if (value instanceof Struct) {
7 Struct struct = (Struct) value;
8 value = struct.get("eventMetadata");
9 if (value instanceof Struct) {
10 Struct eventMetadataStruct = (Struct) value;
11 Object timestamp = eventMetadataStruct.get("timestamp");
12 if (timestamp instanceof Long) {
13 return (Long) timestamp;
14 }
15 ...
1 public class GumGumAvroRecordWriterProvider extends AvroRecordWriterProvider {
2
3 @Override
4 public RecordWriter getRecordWriter(final S3SinkConnectorConfig conf,
5 final String filename) {
6 // This is not meant to be a thread-safe writer!
7 return new RecordWriter() {
8 final DataFileWriter<Object> writer =
9 new DataFileWriter<>(new GenericDatumWriter<>())
10 .setCodec(CodecFactory.snappyCodec());
11 ...
Overriding Kafka Connect classes
13
! Creating a String format
Tue Jul 04 01:00:00 -0700 2017, {"id":"32237763-4c55-4d35-84df-23f8be320449","t":
1499155200608,"cl":"js","ua":"Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X)
AppleWebKit/603.1.30 (KHTML, like Gecko) Mobile/14E304 [FBAN/FBIOS;FBAV/99.0.0.57.70;FBBV/
63577032;FBDV/iPhone5,3;FBMD/iPhone;FBSN/iOS;FBSV/10.3.1;FBSS/2;FBCR/Verizon;FBID/phone;FBLC/
en_US;FBOP/5;FBRV/0]","bty":2,"bfa":"Facebook App","bn":"Facebook","bof":"iOS","bon":"iPhone
OS","ip":"141.239.172.162","cc":"US","rg":"HI","ct":"Kailua","pc":"96734","mc":
744,"isp":"Hawaiian Telcom","bf":"704a0c01a4995359fc8c336d5751d0ad17f1c301","lt":"Mon Jul 03
22:00:00 -1000 2017","sip":"10.11.152.18","awsr":"us-west-1"},
{"v":"1.1","pv":"0e27633e-025b-43fd-a971-9ebf854188c0","r":"release-1211-15-
gfa55c30","t":"5e6e2525","a":[{"i":11,"u":"http://wishesndishes.com/images/adthrive/2017/06/
Weekly-Meal-Plan-Week-100-480x480.jpg","w":300,"h":300,"x":10,"y":
10367,"lt":"in","af":false,"lu":"http://wishesndishes.com/weekly-meal-plan-week-100/?
m&m","ia":"Weekly Meal Plan {Week 100} - 10 great bloggers bringing you a full week of summer
recipes including dinner, sides dishes, and desserts!"}],"rf":"http://wishesndishes.com/
creamy-pecan-crunch-grape-salad/","p":"http://wishesndishes.com/creamy-pecan-crunch-grape-
salad/?m","fs":false,"ce":true,"ac":{"25855":5},"vp":{"ii":false,"w":320,"h":546},"sc":{"w":
320,"h":568,"d":2},"tr":0.6,"pid":11685,"pn":"Ad Thrive","vid":16,"ths":["GGT0"],"aevt":
["GGE24-3","GGE24-4","GGE26-1"],"pcat":["IAB8","IAB8-1"],"ss":"0.75","hk":
["pecan","bloggers","bringing","dishes","crunch","desserts","dinner","creamy","salad","dishes
and desserts"],"ut":[1,2,34,3,4,20,6,9,10]}
Previous Architecture: Pipeline A
Now with Kafka Connect: Pipeline A
Previous Architecture: Pipeline B
Now with Kafka Connect: Pipeline B
Production
Issues
19
! Schema: Defines the possible fields of the message
! Use Maven plugin when generating your schema
! Make sure you use the schema evolution properties properly
! Kafka-Connect performance can decrease drastically because of a
schema evolution
Schema evolution
11 Object timestamp = eventMetadataStruct.get("timestamp");
12 if (timestamp instanceof Long) {
13 return (Long) timestamp;
14 }
15 ...
1 public class GumGumAvroRecordWriterProvider extends AvroRecordWriterProvider {
2
3 @Override
4 public RecordWriter getRecordWriter(final S3SinkConnectorConfig conf,
5 final String filename) {
6 // This is not meant to be a thread-safe writer!
7 return new RecordWriter() {
8 final DataFileWriter<Object> writer =
9 new DataFileWriter<>(new GenericDatumWriter<>())
10 .setCodec(CodecFactory.snappyCodec());
11 ...
1 {"namespace": "example.avro",
2 "type": "record",
3 "name": "User",
4 "fields": [
5 {"name": "name", "type": "string"},
6 {"name": "favorite_number", "type": ["int", "null"]},
7 {"name": "favorite_color", "type": ["string", "null"]}
8 ]
9 }
20
Schema evolution: NONE
E1 E2 E2 E1 E1
E2
E2
E1
S1 S2 S1
21
Schema evolution: FORWARD
E1
E2
E2
E1
S1
E1 E2 E2 E1
22
Schema evolution: BACKWARD & FULL
E1
E2
E2
E1
S1 S2
E1 E2 E2 E1
23
Monitoring Kafka Connect
! Monitoring health of Kafka-Connect cluster
○ Ganglia Monitoring
24
Monitoring Kafka Connect
! Monitoring health of Kafka-Connect
○ Log ingestion through Sumo Logic / Splunk
25
Monitoring Kafka Connect
! Use of Zookeeper and Kafka monitoring tools to carefully monitor the lag
○ AWS Cloud Watch Alerts
! Monitoring of the connectors with the Kafka-Connect REST API
26
Auto remediation
! Monitoring of the connectors with the Kafka-Connect REST API
○ What happen when something fails?
○ Only 8 hours of data in Kafka - Need to recover quickly
○ Notification on connector failure
27
Auto remediation
! In case of massive outage of Kafka-Connect, what to do with invalid
offsets?
○ auto.offset.reset property
28
THANK YOU!
Karim Lamouri
karim@gumgum.com

More Related Content

What's hot

Bcn open stack meet up - july 2014
Bcn open stack meet up - july 2014Bcn open stack meet up - july 2014
Bcn open stack meet up - july 2014
Jaume Devesa Gomez
 

What's hot (20)

An Introduction to Priam
An Introduction to PriamAn Introduction to Priam
An Introduction to Priam
 
Taskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerTaskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task Manager
 
JEEConf. Vanilla java
JEEConf. Vanilla javaJEEConf. Vanilla java
JEEConf. Vanilla java
 
Build 2017 - P4168 - Managing Secure, Scalable, Azure Service Fabric Clusters...
Build 2017 - P4168 - Managing Secure, Scalable, Azure Service Fabric Clusters...Build 2017 - P4168 - Managing Secure, Scalable, Azure Service Fabric Clusters...
Build 2017 - P4168 - Managing Secure, Scalable, Azure Service Fabric Clusters...
 
Getting Started with OpenStack from Hong Kong Summit Session November 5
Getting Started with OpenStack from Hong Kong Summit Session November 5Getting Started with OpenStack from Hong Kong Summit Session November 5
Getting Started with OpenStack from Hong Kong Summit Session November 5
 
Bcn open stack meet up - july 2014
Bcn open stack meet up - july 2014Bcn open stack meet up - july 2014
Bcn open stack meet up - july 2014
 
Node Summit 2018 - Optimize your Lambda functions
Node Summit 2018 - Optimize your Lambda functionsNode Summit 2018 - Optimize your Lambda functions
Node Summit 2018 - Optimize your Lambda functions
 
Seastar @ NYCC++UG
Seastar @ NYCC++UGSeastar @ NYCC++UG
Seastar @ NYCC++UG
 
Quantum Computers and Where to Hide from Them
Quantum Computers and Where to Hide from ThemQuantum Computers and Where to Hide from Them
Quantum Computers and Where to Hide from Them
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
 
Seastar @ SF/BA C++UG
Seastar @ SF/BA C++UGSeastar @ SF/BA C++UG
Seastar @ SF/BA C++UG
 
When webpack -p is not enough
When webpack -p is not enoughWhen webpack -p is not enough
When webpack -p is not enough
 
Scaling Writes on CockroachDB with Apache NiFi
Scaling Writes on CockroachDB with Apache NiFiScaling Writes on CockroachDB with Apache NiFi
Scaling Writes on CockroachDB with Apache NiFi
 
Disruptor
DisruptorDisruptor
Disruptor
 
Bsdtw17: mariusz zaborski: case studies of sandboxing base system with capsicum
Bsdtw17: mariusz zaborski: case studies of sandboxing base system with capsicumBsdtw17: mariusz zaborski: case studies of sandboxing base system with capsicum
Bsdtw17: mariusz zaborski: case studies of sandboxing base system with capsicum
 
LMAX Disruptor as real-life example
LMAX Disruptor as real-life exampleLMAX Disruptor as real-life example
LMAX Disruptor as real-life example
 
Logs management
Logs managementLogs management
Logs management
 
Prezo at-mesos con2015-final
Prezo at-mesos con2015-finalPrezo at-mesos con2015-final
Prezo at-mesos con2015-final
 
Mobile Programming - 3 UDP
Mobile Programming - 3 UDPMobile Programming - 3 UDP
Mobile Programming - 3 UDP
 
Highly Scalable Java Programming for Multi-Core System
Highly Scalable Java Programming for Multi-Core SystemHighly Scalable Java Programming for Multi-Core System
Highly Scalable Java Programming for Multi-Core System
 

Similar to Kafka Connect implementation at GumGum

Samsung WebCL Prototype API
Samsung WebCL Prototype APISamsung WebCL Prototype API
Samsung WebCL Prototype API
Ryo Jin
 

Similar to Kafka Connect implementation at GumGum (20)

(DEV204) Building High-Performance Native Cloud Apps In C++
(DEV204) Building High-Performance Native Cloud Apps In C++(DEV204) Building High-Performance Native Cloud Apps In C++
(DEV204) Building High-Performance Native Cloud Apps In C++
 
Lightbend Lagom: Microservices Just Right
Lightbend Lagom: Microservices Just RightLightbend Lagom: Microservices Just Right
Lightbend Lagom: Microservices Just Right
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
maxbox starter72 multilanguage coding
maxbox starter72 multilanguage codingmaxbox starter72 multilanguage coding
maxbox starter72 multilanguage coding
 
The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212
 
Building a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless frameworkBuilding a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless framework
 
How to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita GalkinHow to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita Galkin
 
Drools, jBPM OptaPlanner presentation
Drools, jBPM OptaPlanner presentationDrools, jBPM OptaPlanner presentation
Drools, jBPM OptaPlanner presentation
 
The Ring programming language version 1.9 book - Part 9 of 210
The Ring programming language version 1.9 book - Part 9 of 210The Ring programming language version 1.9 book - Part 9 of 210
The Ring programming language version 1.9 book - Part 9 of 210
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Speed up your Web applications with HTML5 WebSockets
Speed up your Web applications with HTML5 WebSocketsSpeed up your Web applications with HTML5 WebSockets
Speed up your Web applications with HTML5 WebSockets
 
Scaling application with RabbitMQ
Scaling application with RabbitMQScaling application with RabbitMQ
Scaling application with RabbitMQ
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Samsung WebCL Prototype API
Samsung WebCL Prototype APISamsung WebCL Prototype API
Samsung WebCL Prototype API
 
Hazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMSHazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMS
 
Native Java with GraalVM
Native Java with GraalVMNative Java with GraalVM
Native Java with GraalVM
 
Docker & ECS: Secure Nearline Execution
Docker & ECS: Secure Nearline ExecutionDocker & ECS: Secure Nearline Execution
Docker & ECS: Secure Nearline Execution
 
Apache Flink Stream Processing
Apache Flink Stream ProcessingApache Flink Stream Processing
Apache Flink Stream Processing
 

Recently uploaded

Recently uploaded (20)

Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Kraków
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
solid state electronics ktu module 5 slides
solid state electronics ktu module 5 slidessolid state electronics ktu module 5 slides
solid state electronics ktu module 5 slides
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor bank
 
ANSI(ST)-III_Manufacturing-I_05052020.pdf
ANSI(ST)-III_Manufacturing-I_05052020.pdfANSI(ST)-III_Manufacturing-I_05052020.pdf
ANSI(ST)-III_Manufacturing-I_05052020.pdf
 
Electrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineElectrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission line
 
Quiz application system project report..pdf
Quiz application system project report..pdfQuiz application system project report..pdf
Quiz application system project report..pdf
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.
 
Intelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsIntelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent Acts
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdf
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 
ChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdfChatGPT Prompt Engineering for project managers.pdf
ChatGPT Prompt Engineering for project managers.pdf
 
Artificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian ReasoningArtificial Intelligence Bayesian Reasoning
Artificial Intelligence Bayesian Reasoning
 
Lab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docxLab Manual Arduino UNO Microcontrollar.docx
Lab Manual Arduino UNO Microcontrollar.docx
 
ROAD CONSTRUCTION PRESENTATION.PPTX.pptx
ROAD CONSTRUCTION PRESENTATION.PPTX.pptxROAD CONSTRUCTION PRESENTATION.PPTX.pptx
ROAD CONSTRUCTION PRESENTATION.PPTX.pptx
 
Dairy management system project report..pdf
Dairy management system project report..pdfDairy management system project report..pdf
Dairy management system project report..pdf
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 

Kafka Connect implementation at GumGum

  • 1. WHAT YOU SEE IS WHAT YOU GET Kafka Connect implementation at GumGum 08.15.2017
  • 2. 2 About GumGum ! Artificial Intelligence company ! 9 year old, 225 employees ! Offices in New York, Chicago, London, Sydney ! Thousands of Publishers and Advertisers ! Process billions of impressions every day
  • 6. Previous Architecture: Pipeline A Real Time Primary AWS Redshift Amazon S3File Uplaod
  • 7. Previous Architecture: Pipeline B AWS Redshift Amazon S3
  • 8. 8 ! Stateful Ad Servers ! Data Loss ! Reducing Network Transfer Problems with that architecture
  • 10. Our Constraints 10 ! No duplicate events ! Consume all the messages from Kafka ! Kafka Connect must integrate with the current storage format
  • 11. Overriding Kafka Connect classes 11 ! Overriding the S3 sink destination ○ From bucket/topic/topicName/ to our constraints 7 }, 8 "lastName": { 9 "type": "string" 10 }, 11 "age": { 12 "type": "integer", 13 "minimum": 0 14 } 15 }, 16 "required": ["firstName", "lastName"]} 1 public class TopicPartitionWriter { 2 ... 3 private String fileKey(String keyPrefix, String name) { 4 // return topicsPrefix + dirDelim + keyPrefix + dirDelim + name; 5 return keyPrefix + dirDelim + name; 6 } 7 8 private String fileKeyToCommit(String dirPrefix, long startOffset) { 9 String name = tp.topic() 10 + fileDelim 11 + tp.partition() 12 + fileDelim 13 + String.format(zeroPadOffsetFormat, startOffset) 14 + extension; 15 // return fileKey(topicsDir, dirPrefix, name); 16 return fileKey(dirPrefix, name); 17 } 18 ...
  • 12. Overriding Kafka Connect classes 12 ! Need to compress our events ○ Need to compress the data to reduce S3 costs ○ Custom implementation of the Avro Record Writer Provider using SNAPPY Compression (Available in Confluent platform 3.3.0) ○ Gzip compression for some of our other events 1 Introduction 1 public class RTBTimestampExtractor implements TimestampExtractor { 2 3 @Override 4 public Long extract(ConnectRecord<?> record) { 5 Object value = record.value(); 6 if (value instanceof Struct) { 7 Struct struct = (Struct) value; 8 value = struct.get("eventMetadata"); 9 if (value instanceof Struct) { 10 Struct eventMetadataStruct = (Struct) value; 11 Object timestamp = eventMetadataStruct.get("timestamp"); 12 if (timestamp instanceof Long) { 13 return (Long) timestamp; 14 } 15 ... 1 public class GumGumAvroRecordWriterProvider extends AvroRecordWriterProvider { 2 3 @Override 4 public RecordWriter getRecordWriter(final S3SinkConnectorConfig conf, 5 final String filename) { 6 // This is not meant to be a thread-safe writer! 7 return new RecordWriter() { 8 final DataFileWriter<Object> writer = 9 new DataFileWriter<>(new GenericDatumWriter<>()) 10 .setCodec(CodecFactory.snappyCodec()); 11 ...
  • 13. Overriding Kafka Connect classes 13 ! Creating a String format Tue Jul 04 01:00:00 -0700 2017, {"id":"32237763-4c55-4d35-84df-23f8be320449","t": 1499155200608,"cl":"js","ua":"Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Mobile/14E304 [FBAN/FBIOS;FBAV/99.0.0.57.70;FBBV/ 63577032;FBDV/iPhone5,3;FBMD/iPhone;FBSN/iOS;FBSV/10.3.1;FBSS/2;FBCR/Verizon;FBID/phone;FBLC/ en_US;FBOP/5;FBRV/0]","bty":2,"bfa":"Facebook App","bn":"Facebook","bof":"iOS","bon":"iPhone OS","ip":"141.239.172.162","cc":"US","rg":"HI","ct":"Kailua","pc":"96734","mc": 744,"isp":"Hawaiian Telcom","bf":"704a0c01a4995359fc8c336d5751d0ad17f1c301","lt":"Mon Jul 03 22:00:00 -1000 2017","sip":"10.11.152.18","awsr":"us-west-1"}, {"v":"1.1","pv":"0e27633e-025b-43fd-a971-9ebf854188c0","r":"release-1211-15- gfa55c30","t":"5e6e2525","a":[{"i":11,"u":"http://wishesndishes.com/images/adthrive/2017/06/ Weekly-Meal-Plan-Week-100-480x480.jpg","w":300,"h":300,"x":10,"y": 10367,"lt":"in","af":false,"lu":"http://wishesndishes.com/weekly-meal-plan-week-100/? m&m","ia":"Weekly Meal Plan {Week 100} - 10 great bloggers bringing you a full week of summer recipes including dinner, sides dishes, and desserts!"}],"rf":"http://wishesndishes.com/ creamy-pecan-crunch-grape-salad/","p":"http://wishesndishes.com/creamy-pecan-crunch-grape- salad/?m","fs":false,"ce":true,"ac":{"25855":5},"vp":{"ii":false,"w":320,"h":546},"sc":{"w": 320,"h":568,"d":2},"tr":0.6,"pid":11685,"pn":"Ad Thrive","vid":16,"ths":["GGT0"],"aevt": ["GGE24-3","GGE24-4","GGE26-1"],"pcat":["IAB8","IAB8-1"],"ss":"0.75","hk": ["pecan","bloggers","bringing","dishes","crunch","desserts","dinner","creamy","salad","dishes and desserts"],"ut":[1,2,34,3,4,20,6,9,10]}
  • 15. Now with Kafka Connect: Pipeline A
  • 17. Now with Kafka Connect: Pipeline B
  • 19. 19 ! Schema: Defines the possible fields of the message ! Use Maven plugin when generating your schema ! Make sure you use the schema evolution properties properly ! Kafka-Connect performance can decrease drastically because of a schema evolution Schema evolution 11 Object timestamp = eventMetadataStruct.get("timestamp"); 12 if (timestamp instanceof Long) { 13 return (Long) timestamp; 14 } 15 ... 1 public class GumGumAvroRecordWriterProvider extends AvroRecordWriterProvider { 2 3 @Override 4 public RecordWriter getRecordWriter(final S3SinkConnectorConfig conf, 5 final String filename) { 6 // This is not meant to be a thread-safe writer! 7 return new RecordWriter() { 8 final DataFileWriter<Object> writer = 9 new DataFileWriter<>(new GenericDatumWriter<>()) 10 .setCodec(CodecFactory.snappyCodec()); 11 ... 1 {"namespace": "example.avro", 2 "type": "record", 3 "name": "User", 4 "fields": [ 5 {"name": "name", "type": "string"}, 6 {"name": "favorite_number", "type": ["int", "null"]}, 7 {"name": "favorite_color", "type": ["string", "null"]} 8 ] 9 }
  • 20. 20 Schema evolution: NONE E1 E2 E2 E1 E1 E2 E2 E1 S1 S2 S1
  • 22. 22 Schema evolution: BACKWARD & FULL E1 E2 E2 E1 S1 S2 E1 E2 E2 E1
  • 23. 23 Monitoring Kafka Connect ! Monitoring health of Kafka-Connect cluster ○ Ganglia Monitoring
  • 24. 24 Monitoring Kafka Connect ! Monitoring health of Kafka-Connect ○ Log ingestion through Sumo Logic / Splunk
  • 25. 25 Monitoring Kafka Connect ! Use of Zookeeper and Kafka monitoring tools to carefully monitor the lag ○ AWS Cloud Watch Alerts ! Monitoring of the connectors with the Kafka-Connect REST API
  • 26. 26 Auto remediation ! Monitoring of the connectors with the Kafka-Connect REST API ○ What happen when something fails? ○ Only 8 hours of data in Kafka - Need to recover quickly ○ Notification on connector failure
  • 27. 27 Auto remediation ! In case of massive outage of Kafka-Connect, what to do with invalid offsets? ○ auto.offset.reset property