Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
ingestion from kafka using gobblin
Ziyang Liu
in production @linkedin
3 Kafka clusters
> 2500 topics
> 50 billion records
> 15TB
daily:
related systems
copycat
semantics
at least once
publish data before
persisting checkpoints
at most once
persist checkpoints
before publishing data...
load balancing
normal bin packing two level bin packing
-  Different partitions
of a topic usually go
to different mappers...
compaction
Dedup or non-dedup
Multiple levels of compaction
Multiple options for handling late events
to learn more
https://github.com/linkedin/gobblin
Kafka-HDFS Ingestion (with end-to-end examples)
Camus → Gobblin Migratio...
Upcoming SlideShare
Loading in …5
×

Ingestion from Kafka Using Gobblin

599 views

Published on

Ingesting data from Kafka to HDFS using Gobblin

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Ingestion from Kafka Using Gobblin

  1. 1. ingestion from kafka using gobblin Ziyang Liu
  2. 2. in production @linkedin 3 Kafka clusters > 2500 topics > 50 billion records > 15TB daily:
  3. 3. related systems copycat
  4. 4. semantics at least once publish data before persisting checkpoints at most once persist checkpoints before publishing data exactly once publish data and persist checkpoints atomically
  5. 5. load balancing normal bin packing two level bin packing -  Different partitions of a topic usually go to different mappers. -  Less skew, more small files & task overhead. -  First group certain partitions of a topic together. -  More skew.
  6. 6. compaction Dedup or non-dedup Multiple levels of compaction Multiple options for handling late events
  7. 7. to learn more https://github.com/linkedin/gobblin Kafka-HDFS Ingestion (with end-to-end examples) Camus → Gobblin Migration https://groups.google.com/forum/#!forum/gobblin-users Countless active users and discussions Post any question, get the answer usually within a day Join us! Data And Analytics (DNA)

×