SlideShare a Scribd company logo
1 of 28
Download to read offline
WHAT YOU SEE
IS WHAT YOU GET
Kafka Connect
implementation at GumGum
08.15.2017
2
About GumGum
! Artificial Intelligence company
! 9 year old, 225 employees
! Offices in New York, Chicago, London, Sydney
! Thousands of Publishers and Advertisers
! Process billions of impressions every day
3
Advertising
4
GumGum Sports
GumGum’s
Architecture
Previous Architecture: Pipeline A
Real Time
Primary
AWS
Redshift
Amazon S3File Uplaod
Previous Architecture: Pipeline B
AWS
Redshift
Amazon S3
8
! Stateful Ad Servers
! Data Loss
! Reducing Network Transfer
Problems with that architecture
Migration to
Kafka Connect
Our Constraints
10
! No duplicate events
! Consume all the messages from Kafka
! Kafka Connect must integrate with the current storage format
Overriding Kafka Connect classes
11
! Overriding the S3 sink destination
○ From bucket/topic/topicName/ to our constraints
7 },
8 "lastName": {
9 "type": "string"
10 },
11 "age": {
12 "type": "integer",
13 "minimum": 0
14 }
15 },
16 "required": ["firstName", "lastName"]}
1 public class TopicPartitionWriter {
2 ...
3 private String fileKey(String keyPrefix, String name) {
4 // return topicsPrefix + dirDelim + keyPrefix + dirDelim + name;
5 return keyPrefix + dirDelim + name;
6 }
7
8 private String fileKeyToCommit(String dirPrefix, long startOffset) {
9 String name = tp.topic()
10 + fileDelim
11 + tp.partition()
12 + fileDelim
13 + String.format(zeroPadOffsetFormat, startOffset)
14 + extension;
15 // return fileKey(topicsDir, dirPrefix, name);
16 return fileKey(dirPrefix, name);
17 }
18 ...
Overriding Kafka Connect classes
12
! Need to compress our events
○ Need to compress the data to reduce S3 costs
○ Custom implementation of the Avro Record Writer Provider using
SNAPPY Compression (Available in Confluent platform 3.3.0)
○ Gzip compression for some of our other events
1 Introduction
1 public class RTBTimestampExtractor implements TimestampExtractor {
2
3 @Override
4 public Long extract(ConnectRecord<?> record) {
5 Object value = record.value();
6 if (value instanceof Struct) {
7 Struct struct = (Struct) value;
8 value = struct.get("eventMetadata");
9 if (value instanceof Struct) {
10 Struct eventMetadataStruct = (Struct) value;
11 Object timestamp = eventMetadataStruct.get("timestamp");
12 if (timestamp instanceof Long) {
13 return (Long) timestamp;
14 }
15 ...
1 public class GumGumAvroRecordWriterProvider extends AvroRecordWriterProvider {
2
3 @Override
4 public RecordWriter getRecordWriter(final S3SinkConnectorConfig conf,
5 final String filename) {
6 // This is not meant to be a thread-safe writer!
7 return new RecordWriter() {
8 final DataFileWriter<Object> writer =
9 new DataFileWriter<>(new GenericDatumWriter<>())
10 .setCodec(CodecFactory.snappyCodec());
11 ...
Overriding Kafka Connect classes
13
! Creating a String format
Tue Jul 04 01:00:00 -0700 2017, {"id":"32237763-4c55-4d35-84df-23f8be320449","t":
1499155200608,"cl":"js","ua":"Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X)
AppleWebKit/603.1.30 (KHTML, like Gecko) Mobile/14E304 [FBAN/FBIOS;FBAV/99.0.0.57.70;FBBV/
63577032;FBDV/iPhone5,3;FBMD/iPhone;FBSN/iOS;FBSV/10.3.1;FBSS/2;FBCR/Verizon;FBID/phone;FBLC/
en_US;FBOP/5;FBRV/0]","bty":2,"bfa":"Facebook App","bn":"Facebook","bof":"iOS","bon":"iPhone
OS","ip":"141.239.172.162","cc":"US","rg":"HI","ct":"Kailua","pc":"96734","mc":
744,"isp":"Hawaiian Telcom","bf":"704a0c01a4995359fc8c336d5751d0ad17f1c301","lt":"Mon Jul 03
22:00:00 -1000 2017","sip":"10.11.152.18","awsr":"us-west-1"},
{"v":"1.1","pv":"0e27633e-025b-43fd-a971-9ebf854188c0","r":"release-1211-15-
gfa55c30","t":"5e6e2525","a":[{"i":11,"u":"http://wishesndishes.com/images/adthrive/2017/06/
Weekly-Meal-Plan-Week-100-480x480.jpg","w":300,"h":300,"x":10,"y":
10367,"lt":"in","af":false,"lu":"http://wishesndishes.com/weekly-meal-plan-week-100/?
m&m","ia":"Weekly Meal Plan {Week 100} - 10 great bloggers bringing you a full week of summer
recipes including dinner, sides dishes, and desserts!"}],"rf":"http://wishesndishes.com/
creamy-pecan-crunch-grape-salad/","p":"http://wishesndishes.com/creamy-pecan-crunch-grape-
salad/?m","fs":false,"ce":true,"ac":{"25855":5},"vp":{"ii":false,"w":320,"h":546},"sc":{"w":
320,"h":568,"d":2},"tr":0.6,"pid":11685,"pn":"Ad Thrive","vid":16,"ths":["GGT0"],"aevt":
["GGE24-3","GGE24-4","GGE26-1"],"pcat":["IAB8","IAB8-1"],"ss":"0.75","hk":
["pecan","bloggers","bringing","dishes","crunch","desserts","dinner","creamy","salad","dishes
and desserts"],"ut":[1,2,34,3,4,20,6,9,10]}
Previous Architecture: Pipeline A
Now with Kafka Connect: Pipeline A
Previous Architecture: Pipeline B
Now with Kafka Connect: Pipeline B
Production
Issues
19
! Schema: Defines the possible fields of the message
! Use Maven plugin when generating your schema
! Make sure you use the schema evolution properties properly
! Kafka-Connect performance can decrease drastically because of a
schema evolution
Schema evolution
11 Object timestamp = eventMetadataStruct.get("timestamp");
12 if (timestamp instanceof Long) {
13 return (Long) timestamp;
14 }
15 ...
1 public class GumGumAvroRecordWriterProvider extends AvroRecordWriterProvider {
2
3 @Override
4 public RecordWriter getRecordWriter(final S3SinkConnectorConfig conf,
5 final String filename) {
6 // This is not meant to be a thread-safe writer!
7 return new RecordWriter() {
8 final DataFileWriter<Object> writer =
9 new DataFileWriter<>(new GenericDatumWriter<>())
10 .setCodec(CodecFactory.snappyCodec());
11 ...
1 {"namespace": "example.avro",
2 "type": "record",
3 "name": "User",
4 "fields": [
5 {"name": "name", "type": "string"},
6 {"name": "favorite_number", "type": ["int", "null"]},
7 {"name": "favorite_color", "type": ["string", "null"]}
8 ]
9 }
20
Schema evolution: NONE
E1 E2 E2 E1 E1
E2
E2
E1
S1 S2 S1
21
Schema evolution: FORWARD
E1
E2
E2
E1
S1
E1 E2 E2 E1
22
Schema evolution: BACKWARD & FULL
E1
E2
E2
E1
S1 S2
E1 E2 E2 E1
23
Monitoring Kafka Connect
! Monitoring health of Kafka-Connect cluster
○ Ganglia Monitoring
24
Monitoring Kafka Connect
! Monitoring health of Kafka-Connect
○ Log ingestion through Sumo Logic / Splunk
25
Monitoring Kafka Connect
! Use of Zookeeper and Kafka monitoring tools to carefully monitor the lag
○ AWS Cloud Watch Alerts
! Monitoring of the connectors with the Kafka-Connect REST API
26
Auto remediation
! Monitoring of the connectors with the Kafka-Connect REST API
○ What happen when something fails?
○ Only 8 hours of data in Kafka - Need to recover quickly
○ Notification on connector failure
27
Auto remediation
! In case of massive outage of Kafka-Connect, what to do with invalid
offsets?
○ auto.offset.reset property
28
THANK YOU!
Karim Lamouri
karim@gumgum.com

More Related Content

What's hot

An Introduction to Priam
An Introduction to PriamAn Introduction to Priam
An Introduction to PriamJason Brown
 
Taskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerTaskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerRaghavendra Prabhu
 
Build 2017 - P4168 - Managing Secure, Scalable, Azure Service Fabric Clusters...
Build 2017 - P4168 - Managing Secure, Scalable, Azure Service Fabric Clusters...Build 2017 - P4168 - Managing Secure, Scalable, Azure Service Fabric Clusters...
Build 2017 - P4168 - Managing Secure, Scalable, Azure Service Fabric Clusters...Windows Developer
 
Getting Started with OpenStack from Hong Kong Summit Session November 5
Getting Started with OpenStack from Hong Kong Summit Session November 5Getting Started with OpenStack from Hong Kong Summit Session November 5
Getting Started with OpenStack from Hong Kong Summit Session November 5Niki Acosta
 
Bcn open stack meet up - july 2014
Bcn open stack meet up - july 2014Bcn open stack meet up - july 2014
Bcn open stack meet up - july 2014Jaume Devesa Gomez
 
Node Summit 2018 - Optimize your Lambda functions
Node Summit 2018 - Optimize your Lambda functionsNode Summit 2018 - Optimize your Lambda functions
Node Summit 2018 - Optimize your Lambda functionsMatt Lavin
 
Seastar @ NYCC++UG
Seastar @ NYCC++UGSeastar @ NYCC++UG
Seastar @ NYCC++UGAvi Kivity
 
Quantum Computers and Where to Hide from Them
Quantum Computers and Where to Hide from ThemQuantum Computers and Where to Hide from Them
Quantum Computers and Where to Hide from Themmapmeld
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projectsDmitriy Dumanskiy
 
Seastar @ SF/BA C++UG
Seastar @ SF/BA C++UGSeastar @ SF/BA C++UG
Seastar @ SF/BA C++UGAvi Kivity
 
When webpack -p is not enough
When webpack -p is not enoughWhen webpack -p is not enough
When webpack -p is not enoughMaciej Komorowski
 
Scaling Writes on CockroachDB with Apache NiFi
Scaling Writes on CockroachDB with Apache NiFiScaling Writes on CockroachDB with Apache NiFi
Scaling Writes on CockroachDB with Apache NiFiChris Casano
 
Bsdtw17: mariusz zaborski: case studies of sandboxing base system with capsicum
Bsdtw17: mariusz zaborski: case studies of sandboxing base system with capsicumBsdtw17: mariusz zaborski: case studies of sandboxing base system with capsicum
Bsdtw17: mariusz zaborski: case studies of sandboxing base system with capsicumScott Tsai
 
LMAX Disruptor as real-life example
LMAX Disruptor as real-life exampleLMAX Disruptor as real-life example
LMAX Disruptor as real-life exampleGuy Nir
 
Prezo at-mesos con2015-final
Prezo at-mesos con2015-finalPrezo at-mesos con2015-final
Prezo at-mesos con2015-finalSharma Podila
 
Mobile Programming - 3 UDP
Mobile Programming - 3 UDPMobile Programming - 3 UDP
Mobile Programming - 3 UDPRiza Fahmi
 
Highly Scalable Java Programming for Multi-Core System
Highly Scalable Java Programming for Multi-Core SystemHighly Scalable Java Programming for Multi-Core System
Highly Scalable Java Programming for Multi-Core SystemJames Gan
 

What's hot (20)

An Introduction to Priam
An Introduction to PriamAn Introduction to Priam
An Introduction to Priam
 
Taskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerTaskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task Manager
 
JEEConf. Vanilla java
JEEConf. Vanilla javaJEEConf. Vanilla java
JEEConf. Vanilla java
 
Build 2017 - P4168 - Managing Secure, Scalable, Azure Service Fabric Clusters...
Build 2017 - P4168 - Managing Secure, Scalable, Azure Service Fabric Clusters...Build 2017 - P4168 - Managing Secure, Scalable, Azure Service Fabric Clusters...
Build 2017 - P4168 - Managing Secure, Scalable, Azure Service Fabric Clusters...
 
Getting Started with OpenStack from Hong Kong Summit Session November 5
Getting Started with OpenStack from Hong Kong Summit Session November 5Getting Started with OpenStack from Hong Kong Summit Session November 5
Getting Started with OpenStack from Hong Kong Summit Session November 5
 
Bcn open stack meet up - july 2014
Bcn open stack meet up - july 2014Bcn open stack meet up - july 2014
Bcn open stack meet up - july 2014
 
Node Summit 2018 - Optimize your Lambda functions
Node Summit 2018 - Optimize your Lambda functionsNode Summit 2018 - Optimize your Lambda functions
Node Summit 2018 - Optimize your Lambda functions
 
Seastar @ NYCC++UG
Seastar @ NYCC++UGSeastar @ NYCC++UG
Seastar @ NYCC++UG
 
Quantum Computers and Where to Hide from Them
Quantum Computers and Where to Hide from ThemQuantum Computers and Where to Hide from Them
Quantum Computers and Where to Hide from Them
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
 
Seastar @ SF/BA C++UG
Seastar @ SF/BA C++UGSeastar @ SF/BA C++UG
Seastar @ SF/BA C++UG
 
When webpack -p is not enough
When webpack -p is not enoughWhen webpack -p is not enough
When webpack -p is not enough
 
Scaling Writes on CockroachDB with Apache NiFi
Scaling Writes on CockroachDB with Apache NiFiScaling Writes on CockroachDB with Apache NiFi
Scaling Writes on CockroachDB with Apache NiFi
 
Disruptor
DisruptorDisruptor
Disruptor
 
Bsdtw17: mariusz zaborski: case studies of sandboxing base system with capsicum
Bsdtw17: mariusz zaborski: case studies of sandboxing base system with capsicumBsdtw17: mariusz zaborski: case studies of sandboxing base system with capsicum
Bsdtw17: mariusz zaborski: case studies of sandboxing base system with capsicum
 
LMAX Disruptor as real-life example
LMAX Disruptor as real-life exampleLMAX Disruptor as real-life example
LMAX Disruptor as real-life example
 
Logs management
Logs managementLogs management
Logs management
 
Prezo at-mesos con2015-final
Prezo at-mesos con2015-finalPrezo at-mesos con2015-final
Prezo at-mesos con2015-final
 
Mobile Programming - 3 UDP
Mobile Programming - 3 UDPMobile Programming - 3 UDP
Mobile Programming - 3 UDP
 
Highly Scalable Java Programming for Multi-Core System
Highly Scalable Java Programming for Multi-Core SystemHighly Scalable Java Programming for Multi-Core System
Highly Scalable Java Programming for Multi-Core System
 

Similar to Kafka Connect implementation at GumGum

(DEV204) Building High-Performance Native Cloud Apps In C++
(DEV204) Building High-Performance Native Cloud Apps In C++(DEV204) Building High-Performance Native Cloud Apps In C++
(DEV204) Building High-Performance Native Cloud Apps In C++Amazon Web Services
 
Lightbend Lagom: Microservices Just Right
Lightbend Lagom: Microservices Just RightLightbend Lagom: Microservices Just Right
Lightbend Lagom: Microservices Just Rightmircodotta
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxpetabridge
 
maxbox starter72 multilanguage coding
maxbox starter72 multilanguage codingmaxbox starter72 multilanguage coding
maxbox starter72 multilanguage codingMax Kleiner
 
The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212Mahmoud Samir Fayed
 
Building a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless frameworkBuilding a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless frameworkLuciano Mammino
 
How to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita GalkinHow to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita GalkinSigma Software
 
Drools, jBPM OptaPlanner presentation
Drools, jBPM OptaPlanner presentationDrools, jBPM OptaPlanner presentation
Drools, jBPM OptaPlanner presentationMark Proctor
 
The Ring programming language version 1.9 book - Part 9 of 210
The Ring programming language version 1.9 book - Part 9 of 210The Ring programming language version 1.9 book - Part 9 of 210
The Ring programming language version 1.9 book - Part 9 of 210Mahmoud Samir Fayed
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKShu-Jeng Hsieh
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
Speed up your Web applications with HTML5 WebSockets
Speed up your Web applications with HTML5 WebSocketsSpeed up your Web applications with HTML5 WebSockets
Speed up your Web applications with HTML5 WebSocketsYakov Fain
 
Scaling application with RabbitMQ
Scaling application with RabbitMQScaling application with RabbitMQ
Scaling application with RabbitMQNahidul Kibria
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Stephan Ewen
 
Samsung WebCL Prototype API
Samsung WebCL Prototype APISamsung WebCL Prototype API
Samsung WebCL Prototype APIRyo Jin
 
Hazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMSHazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMSuzquiano
 
Native Java with GraalVM
Native Java with GraalVMNative Java with GraalVM
Native Java with GraalVMSylvain Wallez
 
Docker & ECS: Secure Nearline Execution
Docker & ECS: Secure Nearline ExecutionDocker & ECS: Secure Nearline Execution
Docker & ECS: Secure Nearline ExecutionBrennan Saeta
 
Apache Flink Stream Processing
Apache Flink Stream ProcessingApache Flink Stream Processing
Apache Flink Stream ProcessingSuneel Marthi
 

Similar to Kafka Connect implementation at GumGum (20)

(DEV204) Building High-Performance Native Cloud Apps In C++
(DEV204) Building High-Performance Native Cloud Apps In C++(DEV204) Building High-Performance Native Cloud Apps In C++
(DEV204) Building High-Performance Native Cloud Apps In C++
 
Lightbend Lagom: Microservices Just Right
Lightbend Lagom: Microservices Just RightLightbend Lagom: Microservices Just Right
Lightbend Lagom: Microservices Just Right
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
maxbox starter72 multilanguage coding
maxbox starter72 multilanguage codingmaxbox starter72 multilanguage coding
maxbox starter72 multilanguage coding
 
The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212
 
Building a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless frameworkBuilding a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless framework
 
How to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita GalkinHow to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita Galkin
 
Drools, jBPM OptaPlanner presentation
Drools, jBPM OptaPlanner presentationDrools, jBPM OptaPlanner presentation
Drools, jBPM OptaPlanner presentation
 
The Ring programming language version 1.9 book - Part 9 of 210
The Ring programming language version 1.9 book - Part 9 of 210The Ring programming language version 1.9 book - Part 9 of 210
The Ring programming language version 1.9 book - Part 9 of 210
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Speed up your Web applications with HTML5 WebSockets
Speed up your Web applications with HTML5 WebSocketsSpeed up your Web applications with HTML5 WebSockets
Speed up your Web applications with HTML5 WebSockets
 
Scaling application with RabbitMQ
Scaling application with RabbitMQScaling application with RabbitMQ
Scaling application with RabbitMQ
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Samsung WebCL Prototype API
Samsung WebCL Prototype APISamsung WebCL Prototype API
Samsung WebCL Prototype API
 
Hazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMSHazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMS
 
Native Java with GraalVM
Native Java with GraalVMNative Java with GraalVM
Native Java with GraalVM
 
Docker & ECS: Secure Nearline Execution
Docker & ECS: Secure Nearline ExecutionDocker & ECS: Secure Nearline Execution
Docker & ECS: Secure Nearline Execution
 
Apache Flink Stream Processing
Apache Flink Stream ProcessingApache Flink Stream Processing
Apache Flink Stream Processing
 

Recently uploaded

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 

Recently uploaded (20)

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 

Kafka Connect implementation at GumGum

  • 1. WHAT YOU SEE IS WHAT YOU GET Kafka Connect implementation at GumGum 08.15.2017
  • 2. 2 About GumGum ! Artificial Intelligence company ! 9 year old, 225 employees ! Offices in New York, Chicago, London, Sydney ! Thousands of Publishers and Advertisers ! Process billions of impressions every day
  • 6. Previous Architecture: Pipeline A Real Time Primary AWS Redshift Amazon S3File Uplaod
  • 7. Previous Architecture: Pipeline B AWS Redshift Amazon S3
  • 8. 8 ! Stateful Ad Servers ! Data Loss ! Reducing Network Transfer Problems with that architecture
  • 10. Our Constraints 10 ! No duplicate events ! Consume all the messages from Kafka ! Kafka Connect must integrate with the current storage format
  • 11. Overriding Kafka Connect classes 11 ! Overriding the S3 sink destination ○ From bucket/topic/topicName/ to our constraints 7 }, 8 "lastName": { 9 "type": "string" 10 }, 11 "age": { 12 "type": "integer", 13 "minimum": 0 14 } 15 }, 16 "required": ["firstName", "lastName"]} 1 public class TopicPartitionWriter { 2 ... 3 private String fileKey(String keyPrefix, String name) { 4 // return topicsPrefix + dirDelim + keyPrefix + dirDelim + name; 5 return keyPrefix + dirDelim + name; 6 } 7 8 private String fileKeyToCommit(String dirPrefix, long startOffset) { 9 String name = tp.topic() 10 + fileDelim 11 + tp.partition() 12 + fileDelim 13 + String.format(zeroPadOffsetFormat, startOffset) 14 + extension; 15 // return fileKey(topicsDir, dirPrefix, name); 16 return fileKey(dirPrefix, name); 17 } 18 ...
  • 12. Overriding Kafka Connect classes 12 ! Need to compress our events ○ Need to compress the data to reduce S3 costs ○ Custom implementation of the Avro Record Writer Provider using SNAPPY Compression (Available in Confluent platform 3.3.0) ○ Gzip compression for some of our other events 1 Introduction 1 public class RTBTimestampExtractor implements TimestampExtractor { 2 3 @Override 4 public Long extract(ConnectRecord<?> record) { 5 Object value = record.value(); 6 if (value instanceof Struct) { 7 Struct struct = (Struct) value; 8 value = struct.get("eventMetadata"); 9 if (value instanceof Struct) { 10 Struct eventMetadataStruct = (Struct) value; 11 Object timestamp = eventMetadataStruct.get("timestamp"); 12 if (timestamp instanceof Long) { 13 return (Long) timestamp; 14 } 15 ... 1 public class GumGumAvroRecordWriterProvider extends AvroRecordWriterProvider { 2 3 @Override 4 public RecordWriter getRecordWriter(final S3SinkConnectorConfig conf, 5 final String filename) { 6 // This is not meant to be a thread-safe writer! 7 return new RecordWriter() { 8 final DataFileWriter<Object> writer = 9 new DataFileWriter<>(new GenericDatumWriter<>()) 10 .setCodec(CodecFactory.snappyCodec()); 11 ...
  • 13. Overriding Kafka Connect classes 13 ! Creating a String format Tue Jul 04 01:00:00 -0700 2017, {"id":"32237763-4c55-4d35-84df-23f8be320449","t": 1499155200608,"cl":"js","ua":"Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Mobile/14E304 [FBAN/FBIOS;FBAV/99.0.0.57.70;FBBV/ 63577032;FBDV/iPhone5,3;FBMD/iPhone;FBSN/iOS;FBSV/10.3.1;FBSS/2;FBCR/Verizon;FBID/phone;FBLC/ en_US;FBOP/5;FBRV/0]","bty":2,"bfa":"Facebook App","bn":"Facebook","bof":"iOS","bon":"iPhone OS","ip":"141.239.172.162","cc":"US","rg":"HI","ct":"Kailua","pc":"96734","mc": 744,"isp":"Hawaiian Telcom","bf":"704a0c01a4995359fc8c336d5751d0ad17f1c301","lt":"Mon Jul 03 22:00:00 -1000 2017","sip":"10.11.152.18","awsr":"us-west-1"}, {"v":"1.1","pv":"0e27633e-025b-43fd-a971-9ebf854188c0","r":"release-1211-15- gfa55c30","t":"5e6e2525","a":[{"i":11,"u":"http://wishesndishes.com/images/adthrive/2017/06/ Weekly-Meal-Plan-Week-100-480x480.jpg","w":300,"h":300,"x":10,"y": 10367,"lt":"in","af":false,"lu":"http://wishesndishes.com/weekly-meal-plan-week-100/? m&m","ia":"Weekly Meal Plan {Week 100} - 10 great bloggers bringing you a full week of summer recipes including dinner, sides dishes, and desserts!"}],"rf":"http://wishesndishes.com/ creamy-pecan-crunch-grape-salad/","p":"http://wishesndishes.com/creamy-pecan-crunch-grape- salad/?m","fs":false,"ce":true,"ac":{"25855":5},"vp":{"ii":false,"w":320,"h":546},"sc":{"w": 320,"h":568,"d":2},"tr":0.6,"pid":11685,"pn":"Ad Thrive","vid":16,"ths":["GGT0"],"aevt": ["GGE24-3","GGE24-4","GGE26-1"],"pcat":["IAB8","IAB8-1"],"ss":"0.75","hk": ["pecan","bloggers","bringing","dishes","crunch","desserts","dinner","creamy","salad","dishes and desserts"],"ut":[1,2,34,3,4,20,6,9,10]}
  • 15. Now with Kafka Connect: Pipeline A
  • 17. Now with Kafka Connect: Pipeline B
  • 19. 19 ! Schema: Defines the possible fields of the message ! Use Maven plugin when generating your schema ! Make sure you use the schema evolution properties properly ! Kafka-Connect performance can decrease drastically because of a schema evolution Schema evolution 11 Object timestamp = eventMetadataStruct.get("timestamp"); 12 if (timestamp instanceof Long) { 13 return (Long) timestamp; 14 } 15 ... 1 public class GumGumAvroRecordWriterProvider extends AvroRecordWriterProvider { 2 3 @Override 4 public RecordWriter getRecordWriter(final S3SinkConnectorConfig conf, 5 final String filename) { 6 // This is not meant to be a thread-safe writer! 7 return new RecordWriter() { 8 final DataFileWriter<Object> writer = 9 new DataFileWriter<>(new GenericDatumWriter<>()) 10 .setCodec(CodecFactory.snappyCodec()); 11 ... 1 {"namespace": "example.avro", 2 "type": "record", 3 "name": "User", 4 "fields": [ 5 {"name": "name", "type": "string"}, 6 {"name": "favorite_number", "type": ["int", "null"]}, 7 {"name": "favorite_color", "type": ["string", "null"]} 8 ] 9 }
  • 20. 20 Schema evolution: NONE E1 E2 E2 E1 E1 E2 E2 E1 S1 S2 S1
  • 22. 22 Schema evolution: BACKWARD & FULL E1 E2 E2 E1 S1 S2 E1 E2 E2 E1
  • 23. 23 Monitoring Kafka Connect ! Monitoring health of Kafka-Connect cluster ○ Ganglia Monitoring
  • 24. 24 Monitoring Kafka Connect ! Monitoring health of Kafka-Connect ○ Log ingestion through Sumo Logic / Splunk
  • 25. 25 Monitoring Kafka Connect ! Use of Zookeeper and Kafka monitoring tools to carefully monitor the lag ○ AWS Cloud Watch Alerts ! Monitoring of the connectors with the Kafka-Connect REST API
  • 26. 26 Auto remediation ! Monitoring of the connectors with the Kafka-Connect REST API ○ What happen when something fails? ○ Only 8 hours of data in Kafka - Need to recover quickly ○ Notification on connector failure
  • 27. 27 Auto remediation ! In case of massive outage of Kafka-Connect, what to do with invalid offsets? ○ auto.offset.reset property