SlideShare a Scribd company logo
MyHeritage and Kafka

Author: Ran Levy
Feb 2014
Agenda

• MyHeritage use cases

• Possible solutions
• Kafka overview
• Actual implementation @MyHeritage
• Summary
Use cases

•

Two major use case:

– Indexing to SuperSearch and Record Matching.
– Stats reporting to BI.
Use case 1

•

Indexing to SuperSearch and Record Matching
Use case 1 – con’t

•

Custom and non-scalable solution that involved changes processing and
updating SuperSearch (SOLR over Lucene).

•

Required solution should support:
– Continuous mode.
– High throughput.
– Scaling up.
– Repeating the process from some point.
– Guaranteed order of processed items.
– Reliable.
– Multiple consumers.
Use case 2

•

Statistics reporting to BI system
Use case 2 – con’t

•

Required solution should support:
•
•
•
•

High scale (~500GB of data / day).
Scale up – few hundred millions per day.
Repeating the process from some point.
Multiple consumers.
Agenda

 MyHeritage use cases

•

Possible solutions

•

Kafka overview

•

Actual implementation @MyHeritage

•

Summary
Possible Solutions

•

So what we have considered ….
– DB

•

Queues
Possible Solutions

•

Key point about queues
– Messages are deleted after consumed.
– Messages are duplicated to support multiple readers.
Agenda

 MyHeritage use cases

 Possible solutions
•

Kafka overview

•

Actual implementation @MyHeritage

•

Summary
Kafka Overview

•

A high throughput distributed messaging system

–
–
–
–
–

Fast
Scalable
Durable
Distributed by design
Simplicity (over functionality)
Kafka Overview

•

Fast (very fast) – both for producer and consumer

Reference: http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
Kafka Overview

•

Main entities
– Producer – push data.
– Consumer – pull data.
– Brokers – load balance producers by partition.
– Topic – feeds of messages belongs to the same logical category.
Kafka Overview – some internals

•

Communication between the clients and the servers is done with a simple,
high-performance TCP protocol.

•

For each topic, the Kafka cluster maintains a partitioned log which is a
commit-log (appends only).
Kafka Overview – some internals

•

Messages stay on disk when consumed, deleted after defined TTL.

•

The partitions of the log are distributed over the servers in the Kafka cluster
with each server handling data and requests for a share of the partitions.

•

Each partition is replicated across a configurable number of servers for fault
tolerance.
Agenda

 MyHeritage use cases
 Possible solutions
 Kafka overview
•

Actual implementation @MyHeritage

•

Summary
High Level Overview

…

Daemons

Family Tree
changes Topic

Family Tree
changes Topic

part 1

part 1

part 2

part 2
DRBD
replica
Of
Broker
2

part 32

Consumers

Activity Topic

Indexing

part 1

part 1

RecordMatching

part 2

part 2

…

part 32

…

Face recog.

Broker 2

…

Web

Broker 1

…

Producers

Logstash reader

part 32

part 32

Activity Topic

DRBD
replica
Of
Broker
1
Kafka @Myheritage - producers

App
App
Module
App
Module
Module

Subscriber
Dispatch event

Events
System

Notify

Subscriber
EventLogger
Subscriber

Activity
Manage
r

ILogWrite
Kafka @Myheritage - producers

Topic

BrokersConfig

IStats
KafkaWriter

ISelector

ILogger

ISerializer
Kafka @Myheritage - producers

App
App
Module
App
Module
Module

Subscriber
Dispatch event

Events
System

Notify

Subscriber
EventLogger
Subscriber

KafkaWriter
(if failed) Attempt 2nd
broker

Broker

Attempt 1st broker

Broker
Kafka @Myheritage – Consumers (Indexing)
1 Per consumer
type, reader per
partition

KafkaWatermark
Get/update watermark

Broker 1

EventProcessor
EventProcessor
EventProcessor
Broker 2

Add event to queue

IndexingQueue
Fetch work

IndexingWorkers
IndexingWorkers
IndexingWorkers

Update item

SOLR
Agenda

 MyHeritage use cases

 Possible solutions
 Kafka overview
 Actual implementation @MyHeritage
•

Summary
Summary

Kafka is very fast and scalable system, that
is extensively used at MyHeritage, and you
would want to consider it for high scale
systems you are using.
Thank you and questions

ranl@myheritage.com

More Related Content

What's hot

How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
HostedbyConfluent
 
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, ConfluentIntroducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
HostedbyConfluent
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
HostedbyConfluent
 
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
confluent
 
Uber: Kafka Consumer Proxy
Uber: Kafka Consumer ProxyUber: Kafka Consumer Proxy
Uber: Kafka Consumer Proxy
confluent
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancing
confluent
 
Upleveling Brownfield Integration
Upleveling Brownfield IntegrationUpleveling Brownfield Integration
Upleveling Brownfield Integration
Asanka Abeyweera
 
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, GlooHow a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
HostedbyConfluent
 
Relational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the CloudRelational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the Cloud
Hossein Riasati
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service mesh
Gwen (Chen) Shapira
 
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
confluent
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
confluent
 
Kafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless EnvironmentsKafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless Environments
confluent
 
Monitoring docker container and dockerized applications
Monitoring docker container and dockerized applicationsMonitoring docker container and dockerized applications
Monitoring docker container and dockerized applications
Ananth Padmanabhan
 
Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!
Paolo Castagna
 
Samza tech talk_2015 - strata
Samza tech talk_2015 - strataSamza tech talk_2015 - strata
Samza tech talk_2015 - strata
Yi Pan
 
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon CrosbyEasily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
StreamNative
 
Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environment
Rakuten Group, Inc.
 
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
Flink Forward
 

What's hot (19)

How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
 
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, ConfluentIntroducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
 
Uber: Kafka Consumer Proxy
Uber: Kafka Consumer ProxyUber: Kafka Consumer Proxy
Uber: Kafka Consumer Proxy
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancing
 
Upleveling Brownfield Integration
Upleveling Brownfield IntegrationUpleveling Brownfield Integration
Upleveling Brownfield Integration
 
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, GlooHow a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
 
Relational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the CloudRelational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the Cloud
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service mesh
 
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
 
Kafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless EnvironmentsKafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless Environments
 
Monitoring docker container and dockerized applications
Monitoring docker container and dockerized applicationsMonitoring docker container and dockerized applications
Monitoring docker container and dockerized applications
 
Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!
 
Samza tech talk_2015 - strata
Samza tech talk_2015 - strataSamza tech talk_2015 - strata
Samza tech talk_2015 - strata
 
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon CrosbyEasily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
 
Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environment
 
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
 

Similar to MyHeritage Kakfa use cases - Feb 2014 Meetup

Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
José Román Martín Gil
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
 
Timesten Architecture
Timesten ArchitectureTimesten Architecture
Timesten Architecture
SrirakshaSrinivasan2
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache Kafka
Jay Kreps
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
Apache Kafka TLV
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
PivotalOpenSourceHub
 
Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache Kafka
Saumitra Srivastav
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
Michael Kehoe
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
Apache Apex
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?
Jagadish Venkatraman
 
MyHeritage backend group - build to scale
MyHeritage backend group - build to scaleMyHeritage backend group - build to scale
MyHeritage backend group - build to scale
Ran Levy
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Ricardo Bravo
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
StreamNative
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
Timothy McCormick
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexity
Paolo Platter
 
Megastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storageMegastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storage
Niels Claeys
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
Data Con LA
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
Yoni Farin
 

Similar to MyHeritage Kakfa use cases - Feb 2014 Meetup (20)

Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Timesten Architecture
Timesten ArchitectureTimesten Architecture
Timesten Architecture
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache Kafka
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache Kafka
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?
 
MyHeritage backend group - build to scale
MyHeritage backend group - build to scaleMyHeritage backend group - build to scale
MyHeritage backend group - build to scale
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexity
 
Megastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storageMegastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storage
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
 

Recently uploaded

“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 

Recently uploaded (20)

“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 

MyHeritage Kakfa use cases - Feb 2014 Meetup

  • 1. MyHeritage and Kafka Author: Ran Levy Feb 2014
  • 2. Agenda • MyHeritage use cases • Possible solutions • Kafka overview • Actual implementation @MyHeritage • Summary
  • 3. Use cases • Two major use case: – Indexing to SuperSearch and Record Matching. – Stats reporting to BI.
  • 4. Use case 1 • Indexing to SuperSearch and Record Matching
  • 5. Use case 1 – con’t • Custom and non-scalable solution that involved changes processing and updating SuperSearch (SOLR over Lucene). • Required solution should support: – Continuous mode. – High throughput. – Scaling up. – Repeating the process from some point. – Guaranteed order of processed items. – Reliable. – Multiple consumers.
  • 6. Use case 2 • Statistics reporting to BI system
  • 7. Use case 2 – con’t • Required solution should support: • • • • High scale (~500GB of data / day). Scale up – few hundred millions per day. Repeating the process from some point. Multiple consumers.
  • 8. Agenda  MyHeritage use cases • Possible solutions • Kafka overview • Actual implementation @MyHeritage • Summary
  • 9. Possible Solutions • So what we have considered …. – DB • Queues
  • 10. Possible Solutions • Key point about queues – Messages are deleted after consumed. – Messages are duplicated to support multiple readers.
  • 11. Agenda  MyHeritage use cases  Possible solutions • Kafka overview • Actual implementation @MyHeritage • Summary
  • 12. Kafka Overview • A high throughput distributed messaging system – – – – – Fast Scalable Durable Distributed by design Simplicity (over functionality)
  • 13. Kafka Overview • Fast (very fast) – both for producer and consumer Reference: http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
  • 14. Kafka Overview • Main entities – Producer – push data. – Consumer – pull data. – Brokers – load balance producers by partition. – Topic – feeds of messages belongs to the same logical category.
  • 15. Kafka Overview – some internals • Communication between the clients and the servers is done with a simple, high-performance TCP protocol. • For each topic, the Kafka cluster maintains a partitioned log which is a commit-log (appends only).
  • 16. Kafka Overview – some internals • Messages stay on disk when consumed, deleted after defined TTL. • The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. • Each partition is replicated across a configurable number of servers for fault tolerance.
  • 17. Agenda  MyHeritage use cases  Possible solutions  Kafka overview • Actual implementation @MyHeritage • Summary
  • 18. High Level Overview … Daemons Family Tree changes Topic Family Tree changes Topic part 1 part 1 part 2 part 2 DRBD replica Of Broker 2 part 32 Consumers Activity Topic Indexing part 1 part 1 RecordMatching part 2 part 2 … part 32 … Face recog. Broker 2 … Web Broker 1 … Producers Logstash reader part 32 part 32 Activity Topic DRBD replica Of Broker 1
  • 19. Kafka @Myheritage - producers App App Module App Module Module Subscriber Dispatch event Events System Notify Subscriber EventLogger Subscriber Activity Manage r ILogWrite
  • 20. Kafka @Myheritage - producers Topic BrokersConfig IStats KafkaWriter ISelector ILogger ISerializer
  • 21. Kafka @Myheritage - producers App App Module App Module Module Subscriber Dispatch event Events System Notify Subscriber EventLogger Subscriber KafkaWriter (if failed) Attempt 2nd broker Broker Attempt 1st broker Broker
  • 22. Kafka @Myheritage – Consumers (Indexing) 1 Per consumer type, reader per partition KafkaWatermark Get/update watermark Broker 1 EventProcessor EventProcessor EventProcessor Broker 2 Add event to queue IndexingQueue Fetch work IndexingWorkers IndexingWorkers IndexingWorkers Update item SOLR
  • 23. Agenda  MyHeritage use cases  Possible solutions  Kafka overview  Actual implementation @MyHeritage • Summary
  • 24. Summary Kafka is very fast and scalable system, that is extensively used at MyHeritage, and you would want to consider it for high scale systems you are using.
  • 25. Thank you and questions ranl@myheritage.com