SlideShare a Scribd company logo
1 of 39
1	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Introduc8on	
  to	
  Apache	
  Ka;a	
  
	
  -­‐	
  The	
  Big	
  Data	
  Message	
  Bus	
  
Ashish	
  Singh	
  |	
  SoCware	
  Engineer,	
  Cloudera	
  
2	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
•  SoCware	
  Engineer	
  @	
  Cloudera	
  
•  Contributed	
  to	
  Ka;a,	
  Hive,	
  Parquet	
  and	
  Sentry	
  
•  Used	
  to	
  work	
  in	
  HPC	
  
•  @singhasdev	
  
About	
  Me	
  
3	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Why	
  Ka;a	
  
Client	
   Source	
  
Data	
  Pipelines	
  Start	
  like	
  this.	
  
4	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Why	
  Ka;a	
  
Client	
   Source	
  
Client	
  
Client	
  
Client	
  
Then	
  we	
  reuse	
  them	
  
5	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Why	
  Ka;a	
  
Client	
   Backend	
  
Client	
  
Client	
  
Client	
  
Then	
  we	
  add	
  consumers	
  to	
  the	
  
exis8ng	
  sources	
  
Another	
  
Backend	
  
6	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Why	
  Ka;a	
  
Client	
   Backend	
  
Client	
  
Client	
  
Client	
  
Then	
  it	
  starts	
  to	
  look	
  like	
  this	
  
Another	
  
Backend	
  
Another	
  
Backend	
  
Another	
  
Backend	
  
7	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Why	
  Ka;a	
  
Client	
   Backend	
  
Client	
  
Client	
  
Client	
  
With	
  maybe	
  some	
  of	
  this	
  
Another	
  
Backend	
  
Another	
  
Backend	
  
Another	
  
Backend	
  
8	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
How	
  we	
  got	
  here	
  
8	
  
Applica8on	
  
RDBMS	
  
We	
  Wanted	
  to	
  Do	
  some	
  stuff	
  in	
  
Hadoop	
  
Hadoop	
  
RDBMS	
  
RDBMS	
  
RDBMS	
  
Applica8on	
   Applica8on	
   Applica8on	
  
Batch	
  
File	
  
transfer	
  
Applica8on	
  
Repor8ng	
  
9	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
How	
  we	
  got	
  here	
  
9	
  
Applica8on	
  
RDBMS	
  
We	
  Wanted	
  to	
  Do	
  some	
  stuff	
  in	
  
Hadoop	
  
Hadoop	
  
RDBMS	
  
RDBMS	
  
RDBMS	
  
Applica8on	
   Applica8on	
   Applica8on	
  
Batch	
  
File	
  
transfer	
  
Applica8on	
  
Repor8ng	
  
10	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
	
  
Ka;a	
  decouples	
  data	
  pipelines	
  
Why	
  Ka;a	
  
10	
  
Source	
  System	
   Source	
  System	
   Source	
  System	
   Source	
  System	
  
Hadoop	
   Security	
  Systems	
  
Real-­‐8me	
  
monitoring	
  
Data	
  Warehouse	
  
Ka;a	
  
Producers	
  
Broker	
  
Consumers	
  
11	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
About	
  Ka;a	
  
•  Publish/Subscribe	
  Messaging	
  System	
  From	
  LinkedIn	
  
•  High	
  throughput	
  (100’s	
  of	
  k	
  messages/sec)	
  
•  Low	
  latency	
  (sub-­‐second	
  to	
  low	
  seconds)	
  
•  Fault-­‐tolerant	
  (Replicated	
  and	
  Distributed)	
  
•  Supports	
  Agnos8c	
  Messaging	
  
•  Standardizes	
  format	
  and	
  delivery	
  
12	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Concepts	
  
Basic	
  Ka;a	
  Concepts	
  
13	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Key	
  terminology	
  
•  Ka;a	
  maintains	
  feeds	
  of	
  messages	
  in	
  categories	
  called	
  topics.	
  
•  Processes	
  that	
  publish	
  messages	
  to	
  a	
  Ka;a	
  topic	
  are	
  called	
  producers.	
  
•  Processes	
  that	
  subscribe	
  to	
  topics	
  and	
  process	
  the	
  feed	
  of	
  published	
  messages	
  
are	
  called	
  consumers.	
  
•  Ka;a	
  is	
  run	
  as	
  a	
  cluster	
  comprised	
  of	
  one	
  or	
  more	
  servers	
  each	
  of	
  which	
  is	
  called	
  
a	
  broker.	
  
•  Communica8on	
  between	
  all	
  components	
  is	
  done	
  via	
  a	
  high	
  performance	
  simple	
  
binary	
  API	
  over	
  TCP	
  protocol	
  
14	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Architecture	
  
14	
  
Producer	
  
Consumer	
   Consumer	
  
Producers	
  
Ka;a	
  
Cluster	
  
Consumers	
  
Broker	
   Broker	
   Broker	
   Broker	
  
Producer	
  
Zookeeper	
  
Offsets	
  
15	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Topics	
  -­‐	
  Par88ons	
  
•  Topics	
  are	
  broken	
  up	
  into	
  ordered	
  commit	
  logs	
  called	
  par88ons.	
  
•  Each	
  message	
  in	
  a	
  par88on	
  is	
  assigned	
  a	
  sequen8al	
  id	
  called	
  an	
  offset.	
  
•  Data	
  is	
  retained	
  for	
  a	
  configurable	
  period	
  of	
  8me	
  
	
  
0	
   1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
  
1
0	
  
1
1	
  
1
2	
  
1
3	
  
0	
   1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
  
1
0	
  
1
1	
  
0	
   1	
   2	
   3	
   4	
   5	
   6	
   7	
   8	
   9	
  
1
0	
  
1
1	
  
1
2	
  
1
3	
  
Par88on	
  
1	
  
Par88on	
  
2	
  
Par88on	
  
3	
  
Writes	
  
Old	
   New	
  
16	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Message	
  Ordering	
  
•  Ordering	
  is	
  only	
  guaranteed	
  within	
  a	
  par88on	
  for	
  a	
  topic	
  
•  To	
  ensure	
  ordering:	
  
• Group	
  messages	
  in	
  a	
  par88on	
  by	
  key	
  (producer)	
  
• Configure	
  exactly	
  one	
  consumer	
  instance	
  per	
  par88on	
  within	
  a	
  consumer	
  
group	
  
17	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Guarantees	
  
•  Messages	
  sent	
  by	
  a	
  producer	
  to	
  a	
  par8cular	
  topic	
  par88on	
  will	
  be	
  appended	
  in	
  
the	
  order	
  they	
  are	
  sent	
  
•  A	
  consumer	
  instance	
  sees	
  messages	
  in	
  the	
  order	
  they	
  are	
  stored	
  in	
  the	
  log	
  
•  For	
  a	
  topic	
  with	
  replica8on	
  factor	
  N,	
  Ka;a	
  can	
  tolerate	
  up	
  to	
  N-­‐1	
  server	
  failures	
  
without	
  “losing”	
  any	
  messages	
  commiled	
  to	
  the	
  log	
  
18	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Topics	
  -­‐	
  Replica8on	
  
•  Topics	
  can	
  (and	
  should)	
  be	
  replicated.	
  	
  
•  The	
  unit	
  of	
  replica8on	
  is	
  the	
  par88on	
  
•  Each	
  par88on	
  in	
  a	
  topic	
  has	
  1	
  leader	
  and	
  0	
  or	
  more	
  replicas.	
  
•  A	
  replica	
  is	
  deemed	
  to	
  be	
  “in-­‐sync”	
  if	
  
• The	
  replica	
  can	
  communicate	
  with	
  Zookeeper	
  
• The	
  replica	
  is	
  not	
  “too	
  far”	
  behind	
  the	
  leader	
  (configurable)	
  
•  The	
  group	
  of	
  in-­‐sync	
  replicas	
  for	
  a	
  par88on	
  is	
  called	
  the	
  ISR	
  (In-­‐Sync	
  Replicas)	
  
•  The	
  Replica8on	
  factor	
  cannot	
  be	
  lowered	
  
19	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Topics	
  -­‐	
  Replica8on	
  
•  Durability	
  can	
  be	
  configured	
  with	
  the	
  producer	
  configura8on	
  
request.required.acks	
  
• 0	
  	
  The	
  producer	
  never	
  waits	
  for	
  an	
  ack	
  
• 1	
  	
  	
  The	
  producer	
  gets	
  an	
  ack	
  aCer	
  the	
  leader	
  replica	
  has	
  received	
  the	
  data	
  
• -­‐1	
  	
  The	
  producer	
  gets	
  an	
  ack	
  aCer	
  all	
  ISRs	
  receive	
  the	
  data	
  
•  Minimum	
  available	
  ISR	
  can	
  also	
  be	
  configured	
  such	
  that	
  an	
  error	
  is	
  returned	
  if	
  
enough	
  replicas	
  are	
  not	
  available	
  to	
  replicate	
  data	
  
	
  
20	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
•  Producers	
  can	
  choose	
  to	
  trade	
  throughput	
  for	
  durability	
  of	
  writes:	
  
	
  
•  Throughput	
  can	
  also	
  be	
  raised	
  with	
  more	
  brokers…	
  (so	
  do	
  this	
  instead)!	
  
	
  
•  A	
  sane	
  configura8on:	
  
Durable	
  Writes	
  
Durability	
   Behaviour	
   Per	
  Event	
  Latency	
   Required	
  Acknowledgements	
  
(request.required.acks)	
  
Highest	
   ACK	
  all	
  ISRs	
  have	
  received	
   Highest	
   -­‐1	
  
Medium	
   ACK	
  once	
  the	
  leader	
  has	
  received	
   Medium	
   1	
  
Lowest	
   No	
  ACKs	
  required	
   Lowest	
   0	
  
Property	
   Value	
  
replica8on	
   3	
  
min.insync.replicas	
   2	
  
request.required.acks	
   -­‐1	
  
21	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Producer	
  
•  Producers	
  publish	
  to	
  a	
  topic	
  of	
  their	
  choosing	
  (push)	
  
•  Load	
  can	
  be	
  distributed	
  
• Typically	
  by	
  “round-­‐robin”	
  
• Can	
  also	
  do	
  “seman8c	
  par88oning”	
  	
  based	
  on	
  a	
  key	
  in	
  the	
  message	
  
•  Brokers	
  load	
  balance	
  by	
  par88on	
  
•  Can	
  support	
  async	
  (less	
  durable)	
  sending	
  
•  All	
  nodes	
  can	
  answer	
  metadata	
  requests	
  about:	
  	
  
• Which	
  servers	
  are	
  alive	
  
• Where	
  leaders	
  are	
  for	
  the	
  par88ons	
  of	
  a	
  topic	
  
	
  
22	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Producer	
  –	
  Load	
  Balancing	
  and	
  ISRs	
  
0	
  
1	
  
2	
  
0	
  
1	
  
2	
  
0	
  
1	
  
2	
  
Producer	
  
Broker	
  100	
   Broker	
  101	
   Broker	
  102	
  
Topic: 	
  	
  
Par88ons:	
  
Replicas:	
  
my_topic	
  
3	
  
3	
  	
  
Par88on:	
  
Leader:	
  
ISR:	
  	
  
1	
  
101	
  
100,102	
  	
  
Par88on:	
  
Leader:	
  
ISR:	
  	
  
2	
  
102	
  
101,100	
  	
  
Par88on:	
  
Leader:	
  
ISR:	
  	
  
0	
  
100	
  
101,102	
  	
  
23	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Consumer	
  
•  Mul8ple	
  Consumers	
  can	
  read	
  from	
  the	
  same	
  topic	
  
•  Each	
  Consumer	
  is	
  responsible	
  for	
  managing	
  it’s	
  own	
  offset	
  
•  Messages	
  stay	
  on	
  Ka;a…they	
  are	
  not	
  removed	
  aCer	
  they	
  are	
  consumed	
  
1234567	
  
1234568	
  
1234569	
  
1234570	
  
1234571	
  
1234572	
  
1234573	
  
1234574	
  
1234575	
  
1234576	
  
1234577	
  
Consumer	
  
Producer	
  
Consumer	
  
Consumer	
  
1234577	
  
Send	
  
Write	
  
Fetch	
  
Fetch	
  
Fetch	
  
24	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Consumer	
  
•  Consumers	
  can	
  go	
  away	
  
1234567	
  
1234568	
  
1234569	
  
1234570	
  
1234571	
  
1234572	
  
1234573	
  
1234574	
  
1234575	
  
1234576	
  
1234577	
  
Consumer	
  
Producer	
  
Consumer	
  
1234577	
  
Send	
  
Write	
  
Fetch	
  
Fetch	
  
25	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Consumer	
  
•  And	
  then	
  come	
  back	
  
1234567	
  
1234568	
  
1234569	
  
1234570	
  
1234571	
  
1234572	
  
1234573	
  
1234574	
  
1234575	
  
1234576	
  
1234577	
  
Consumer	
  
Producer	
  
Consumer	
  
Consumer	
  
1234577	
  
Send	
  
Write	
  
Fetch	
  
Fetch	
  
Fetch	
  
26	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Consumer	
  -­‐	
  Groups	
  
•  Consumers	
  can	
  be	
  organized	
  into	
  Consumer	
  Groups	
  
•  Common	
  Palerns:	
  
•  1)	
  All	
  consumer	
  instances	
  in	
  one	
  group	
  
• Acts	
  like	
  a	
  tradi8onal	
  queue	
  with	
  load	
  balancing	
  
•  2)	
  All	
  consumer	
  instances	
  in	
  different	
  groups	
  
• All	
  messages	
  are	
  broadcast	
  to	
  all	
  consumer	
  instances	
  
•  3)	
  “Logical	
  Subscriber”	
  –	
  Many	
  consumer	
  instances	
  in	
  a	
  group	
  
• Consumers	
  are	
  added	
  for	
  scalability	
  and	
  fault	
  tolerance	
  	
  
• Each	
  consumer	
  instance	
  reads	
  from	
  one	
  or	
  more	
  par88ons	
  for	
  a	
  topic	
  
• There	
  cannot	
  be	
  more	
  consumer	
  instances	
  than	
  par88ons	
  
27	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Consumer	
  -­‐	
  Groups	
  
P0	
   P3	
   P1	
   P2	
  
C1	
   C2	
   C3	
   C4	
   C5	
   C6	
  
Ka;a	
  Cluster	
  
Broker	
  1	
   Broker	
  2	
  
Consumer	
  Group	
  A	
   Consumer	
  Group	
  B	
  
Consumer	
  Groups	
  
provide	
  isola8on	
  to	
  
topics	
  and	
  par88ons	
  
28	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Consumer	
  -­‐	
  Groups	
  
P0	
   P3	
   P1	
   P2	
  
C1	
   C2	
   C3	
   C4	
   C5	
   C6	
  
Ka;a	
  Cluster	
  
Broker	
  1	
   Broker	
  2	
  
Consumer	
  Group	
  A	
   Consumer	
  Group	
  B	
  
Can	
  rebalance	
  
themselves	
   X	
  
29	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Schema	
  
30	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Schema	
  is	
  a	
  MUST	
  HAVE	
  for	
  	
  
data	
  integra8on	
  
31	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Data	
  Exchange	
  in	
  Distributed	
  Architectures	
  
•  Mul8ple	
  systems	
  interac8ng	
  together	
  benefit	
  from	
  a	
  common	
  data	
  exchange	
  
format.	
  	
  
•  Choosing	
  the	
  correct	
  standard	
  can	
  significantly	
  impact	
  applica8on	
  design	
  
Client	
   Client	
  
serialize	
  
serialize	
  
deserialize	
  
deserialize	
  
Common	
  Data	
  Format	
  
32	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Goals	
  
•  Simple	
  
•  Flexible	
  
•  Efficient	
  
•  Change	
  Tolerant	
  
•  Interoperable	
  
	
  As	
  systems	
  become	
  more	
  complex,	
  data	
  endpoints	
  need	
  to	
  be	
  decoupled	
  
33	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Click	
  to	
  enter	
  confiden8ality	
  
34	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
I	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Avro	
  	
  
•  Define	
  Schema	
  
•  Generate	
  code	
  for	
  objects	
  
•  Serialize	
  /	
  Deserialize	
  into	
  Bytes	
  or	
  JSON	
  
•  Embed	
  schema	
  in	
  files	
  /	
  records…	
  or	
  not	
  
•  Support	
  for	
  our	
  favorite	
  languages…	
  Except	
  Go.	
  
•  Schema	
  Evolu8on	
  
• Add	
  and	
  remove	
  fields	
  without	
  breaking	
  anything	
  
35	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Use	
  Cases	
  
36	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Use	
  Cases	
  
•  Real-­‐Time	
  Stream	
  Processing	
  (combined	
  with	
  Spark	
  Streaming)	
  
•  General	
  purpose	
  Message	
  Bus	
  
•  Collec8ng	
  User	
  Ac8vity	
  Data	
  
•  Collec8ng	
  Opera8onal	
  Metrics	
  from	
  applica8ons,	
  servers	
  or	
  devices	
  
•  Log	
  Aggrega8on	
  
•  Change	
  Data	
  Capture	
  
•  Commit	
  Log	
  for	
  distributed	
  systems	
  
37	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Frequently	
  Asked	
  Ques8ons	
  
38	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
FAQs	
  
•  Should	
  I	
  use	
  SSDs	
  for	
  Ka;a	
  Brokers?	
  
•  How	
  do	
  I	
  encrypt	
  the	
  data	
  persisted	
  on	
  my	
  Ka;a	
  Brokers?	
  
•  Is	
  it	
  true	
  that	
  Zookeeper	
  can	
  become	
  a	
  pain	
  point	
  with	
  a	
  Ka;a	
  cluster?	
  
•  Does	
  Ka;a	
  support	
  cross-­‐data	
  center	
  availability?	
  
•  What	
  type	
  of	
  data	
  transforma8ons	
  are	
  supported	
  on	
  Ka;a?	
  
•  How	
  to	
  send	
  large	
  messages	
  or	
  payloads	
  through	
  Ka;a?	
  
•  Does	
  Ka;a	
  support	
  MQTT	
  or	
  JMS	
  protocols?	
  
39	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Thank	
  you	
  
Ashish	
  Singh	
  
asingh@cloudera.com	
  
@singhasdev	
  

More Related Content

What's hot

Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterconfluent
 
HPBigData2015 PSTL kafka spark vertica
HPBigData2015 PSTL kafka spark verticaHPBigData2015 PSTL kafka spark vertica
HPBigData2015 PSTL kafka spark verticaJack Gudenkauf
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...confluent
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streamingdatamantra
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CaptureJeff Klukas
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...HostedbyConfluent
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020confluent
 
101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)Henning Spjelkavik
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray
 
Kappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.ioKappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.ioPiotr Czarnas
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsLightbend
 
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data PlatformStream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platformconfluent
 
Streaming all over the world Real life use cases with Kafka Streams
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streamsconfluent
 
Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsLightbend
 

What's hot (20)

Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
HPBigData2015 PSTL kafka spark vertica
HPBigData2015 PSTL kafka spark verticaHPBigData2015 PSTL kafka spark vertica
HPBigData2015 PSTL kafka spark vertica
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 
Kappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.ioKappa Architecture on Apache Kafka and Querona: datamass.io
Kappa Architecture on Apache Kafka and Querona: datamass.io
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
 
Data Pipeline with Kafka
Data Pipeline with KafkaData Pipeline with Kafka
Data Pipeline with Kafka
 
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data PlatformStream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Streaming all over the world Real life use cases with Kafka Streams
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streams
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML Models
 

Similar to Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bus by Ashish Singh of Cloudera

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaGrant Henke
 
Kafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User GroupKafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User GroupJeff Holoman
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduJeremy Beard
 
Intro to Apache Kafka
Intro to Apache KafkaIntro to Apache Kafka
Intro to Apache KafkaJason Hubbard
 
Best Practices for Scaling an InfluxEnterprise Cluster
Best Practices for Scaling an InfluxEnterprise ClusterBest Practices for Scaling an InfluxEnterprise Cluster
Best Practices for Scaling an InfluxEnterprise ClusterInfluxData
 
Javaone 2016 - Operational Excellence with Hystrix
Javaone 2016 - Operational Excellence with HystrixJavaone 2016 - Operational Excellence with Hystrix
Javaone 2016 - Operational Excellence with HystrixBilly Yuen
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
New Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceNew Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceAnil Nair
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Back to Basics: An Introduction to MQTT
Back to Basics: An Introduction to MQTTBack to Basics: An Introduction to MQTT
Back to Basics: An Introduction to MQTTHiveMQ
 
The Flink - Apache Bigtop integration
The Flink - Apache Bigtop integrationThe Flink - Apache Bigtop integration
The Flink - Apache Bigtop integrationMárton Balassi
 

Similar to Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bus by Ashish Singh of Cloudera (20)

intro-kafka
intro-kafkaintro-kafka
intro-kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
 
Kafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User GroupKafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User Group
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
Intro to Apache Kafka
Intro to Apache KafkaIntro to Apache Kafka
Intro to Apache Kafka
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Best Practices for Scaling an InfluxEnterprise Cluster
Best Practices for Scaling an InfluxEnterprise ClusterBest Practices for Scaling an InfluxEnterprise Cluster
Best Practices for Scaling an InfluxEnterprise Cluster
 
Javaone 2016 - Operational Excellence with Hystrix
Javaone 2016 - Operational Excellence with HystrixJavaone 2016 - Operational Excellence with Hystrix
Javaone 2016 - Operational Excellence with Hystrix
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
New Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceNew Generation Oracle RAC Performance
New Generation Oracle RAC Performance
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Back to Basics: An Introduction to MQTT
Back to Basics: An Introduction to MQTTBack to Basics: An Introduction to MQTT
Back to Basics: An Introduction to MQTT
 
The Flink - Apache Bigtop integration
The Flink - Apache Bigtop integrationThe Flink - Apache Bigtop integration
The Flink - Apache Bigtop integration
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Recently uploaded (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bus by Ashish Singh of Cloudera

  • 1. 1  ©  Cloudera,  Inc.  All  rights  reserved.   Introduc8on  to  Apache  Ka;a    -­‐  The  Big  Data  Message  Bus   Ashish  Singh  |  SoCware  Engineer,  Cloudera  
  • 2. 2  ©  Cloudera,  Inc.  All  rights  reserved.   •  SoCware  Engineer  @  Cloudera   •  Contributed  to  Ka;a,  Hive,  Parquet  and  Sentry   •  Used  to  work  in  HPC   •  @singhasdev   About  Me  
  • 3. 3  ©  Cloudera,  Inc.  All  rights  reserved.   Why  Ka;a   Client   Source   Data  Pipelines  Start  like  this.  
  • 4. 4  ©  Cloudera,  Inc.  All  rights  reserved.   Why  Ka;a   Client   Source   Client   Client   Client   Then  we  reuse  them  
  • 5. 5  ©  Cloudera,  Inc.  All  rights  reserved.   Why  Ka;a   Client   Backend   Client   Client   Client   Then  we  add  consumers  to  the   exis8ng  sources   Another   Backend  
  • 6. 6  ©  Cloudera,  Inc.  All  rights  reserved.   Why  Ka;a   Client   Backend   Client   Client   Client   Then  it  starts  to  look  like  this   Another   Backend   Another   Backend   Another   Backend  
  • 7. 7  ©  Cloudera,  Inc.  All  rights  reserved.   Why  Ka;a   Client   Backend   Client   Client   Client   With  maybe  some  of  this   Another   Backend   Another   Backend   Another   Backend  
  • 8. 8  ©  Cloudera,  Inc.  All  rights  reserved.   How  we  got  here   8   Applica8on   RDBMS   We  Wanted  to  Do  some  stuff  in   Hadoop   Hadoop   RDBMS   RDBMS   RDBMS   Applica8on   Applica8on   Applica8on   Batch   File   transfer   Applica8on   Repor8ng  
  • 9. 9  ©  Cloudera,  Inc.  All  rights  reserved.   How  we  got  here   9   Applica8on   RDBMS   We  Wanted  to  Do  some  stuff  in   Hadoop   Hadoop   RDBMS   RDBMS   RDBMS   Applica8on   Applica8on   Applica8on   Batch   File   transfer   Applica8on   Repor8ng  
  • 10. 10  ©  Cloudera,  Inc.  All  rights  reserved.     Ka;a  decouples  data  pipelines   Why  Ka;a   10   Source  System   Source  System   Source  System   Source  System   Hadoop   Security  Systems   Real-­‐8me   monitoring   Data  Warehouse   Ka;a   Producers   Broker   Consumers  
  • 11. 11  ©  Cloudera,  Inc.  All  rights  reserved.   About  Ka;a   •  Publish/Subscribe  Messaging  System  From  LinkedIn   •  High  throughput  (100’s  of  k  messages/sec)   •  Low  latency  (sub-­‐second  to  low  seconds)   •  Fault-­‐tolerant  (Replicated  and  Distributed)   •  Supports  Agnos8c  Messaging   •  Standardizes  format  and  delivery  
  • 12. 12  ©  Cloudera,  Inc.  All  rights  reserved.   Concepts   Basic  Ka;a  Concepts  
  • 13. 13  ©  Cloudera,  Inc.  All  rights  reserved.   Key  terminology   •  Ka;a  maintains  feeds  of  messages  in  categories  called  topics.   •  Processes  that  publish  messages  to  a  Ka;a  topic  are  called  producers.   •  Processes  that  subscribe  to  topics  and  process  the  feed  of  published  messages   are  called  consumers.   •  Ka;a  is  run  as  a  cluster  comprised  of  one  or  more  servers  each  of  which  is  called   a  broker.   •  Communica8on  between  all  components  is  done  via  a  high  performance  simple   binary  API  over  TCP  protocol  
  • 14. 14  ©  Cloudera,  Inc.  All  rights  reserved.   Architecture   14   Producer   Consumer   Consumer   Producers   Ka;a   Cluster   Consumers   Broker   Broker   Broker   Broker   Producer   Zookeeper   Offsets  
  • 15. 15  ©  Cloudera,  Inc.  All  rights  reserved.   Topics  -­‐  Par88ons   •  Topics  are  broken  up  into  ordered  commit  logs  called  par88ons.   •  Each  message  in  a  par88on  is  assigned  a  sequen8al  id  called  an  offset.   •  Data  is  retained  for  a  configurable  period  of  8me     0   1   2   3   4   5   6   7   8   9   1 0   1 1   1 2   1 3   0   1   2   3   4   5   6   7   8   9   1 0   1 1   0   1   2   3   4   5   6   7   8   9   1 0   1 1   1 2   1 3   Par88on   1   Par88on   2   Par88on   3   Writes   Old   New  
  • 16. 16  ©  Cloudera,  Inc.  All  rights  reserved.   Message  Ordering   •  Ordering  is  only  guaranteed  within  a  par88on  for  a  topic   •  To  ensure  ordering:   • Group  messages  in  a  par88on  by  key  (producer)   • Configure  exactly  one  consumer  instance  per  par88on  within  a  consumer   group  
  • 17. 17  ©  Cloudera,  Inc.  All  rights  reserved.   Guarantees   •  Messages  sent  by  a  producer  to  a  par8cular  topic  par88on  will  be  appended  in   the  order  they  are  sent   •  A  consumer  instance  sees  messages  in  the  order  they  are  stored  in  the  log   •  For  a  topic  with  replica8on  factor  N,  Ka;a  can  tolerate  up  to  N-­‐1  server  failures   without  “losing”  any  messages  commiled  to  the  log  
  • 18. 18  ©  Cloudera,  Inc.  All  rights  reserved.   Topics  -­‐  Replica8on   •  Topics  can  (and  should)  be  replicated.     •  The  unit  of  replica8on  is  the  par88on   •  Each  par88on  in  a  topic  has  1  leader  and  0  or  more  replicas.   •  A  replica  is  deemed  to  be  “in-­‐sync”  if   • The  replica  can  communicate  with  Zookeeper   • The  replica  is  not  “too  far”  behind  the  leader  (configurable)   •  The  group  of  in-­‐sync  replicas  for  a  par88on  is  called  the  ISR  (In-­‐Sync  Replicas)   •  The  Replica8on  factor  cannot  be  lowered  
  • 19. 19  ©  Cloudera,  Inc.  All  rights  reserved.   Topics  -­‐  Replica8on   •  Durability  can  be  configured  with  the  producer  configura8on   request.required.acks   • 0    The  producer  never  waits  for  an  ack   • 1      The  producer  gets  an  ack  aCer  the  leader  replica  has  received  the  data   • -­‐1    The  producer  gets  an  ack  aCer  all  ISRs  receive  the  data   •  Minimum  available  ISR  can  also  be  configured  such  that  an  error  is  returned  if   enough  replicas  are  not  available  to  replicate  data    
  • 20. 20  ©  Cloudera,  Inc.  All  rights  reserved.   •  Producers  can  choose  to  trade  throughput  for  durability  of  writes:     •  Throughput  can  also  be  raised  with  more  brokers…  (so  do  this  instead)!     •  A  sane  configura8on:   Durable  Writes   Durability   Behaviour   Per  Event  Latency   Required  Acknowledgements   (request.required.acks)   Highest   ACK  all  ISRs  have  received   Highest   -­‐1   Medium   ACK  once  the  leader  has  received   Medium   1   Lowest   No  ACKs  required   Lowest   0   Property   Value   replica8on   3   min.insync.replicas   2   request.required.acks   -­‐1  
  • 21. 21  ©  Cloudera,  Inc.  All  rights  reserved.   Producer   •  Producers  publish  to  a  topic  of  their  choosing  (push)   •  Load  can  be  distributed   • Typically  by  “round-­‐robin”   • Can  also  do  “seman8c  par88oning”    based  on  a  key  in  the  message   •  Brokers  load  balance  by  par88on   •  Can  support  async  (less  durable)  sending   •  All  nodes  can  answer  metadata  requests  about:     • Which  servers  are  alive   • Where  leaders  are  for  the  par88ons  of  a  topic    
  • 22. 22  ©  Cloudera,  Inc.  All  rights  reserved.   Producer  –  Load  Balancing  and  ISRs   0   1   2   0   1   2   0   1   2   Producer   Broker  100   Broker  101   Broker  102   Topic:     Par88ons:   Replicas:   my_topic   3   3     Par88on:   Leader:   ISR:     1   101   100,102     Par88on:   Leader:   ISR:     2   102   101,100     Par88on:   Leader:   ISR:     0   100   101,102    
  • 23. 23  ©  Cloudera,  Inc.  All  rights  reserved.   Consumer   •  Mul8ple  Consumers  can  read  from  the  same  topic   •  Each  Consumer  is  responsible  for  managing  it’s  own  offset   •  Messages  stay  on  Ka;a…they  are  not  removed  aCer  they  are  consumed   1234567   1234568   1234569   1234570   1234571   1234572   1234573   1234574   1234575   1234576   1234577   Consumer   Producer   Consumer   Consumer   1234577   Send   Write   Fetch   Fetch   Fetch  
  • 24. 24  ©  Cloudera,  Inc.  All  rights  reserved.   Consumer   •  Consumers  can  go  away   1234567   1234568   1234569   1234570   1234571   1234572   1234573   1234574   1234575   1234576   1234577   Consumer   Producer   Consumer   1234577   Send   Write   Fetch   Fetch  
  • 25. 25  ©  Cloudera,  Inc.  All  rights  reserved.   Consumer   •  And  then  come  back   1234567   1234568   1234569   1234570   1234571   1234572   1234573   1234574   1234575   1234576   1234577   Consumer   Producer   Consumer   Consumer   1234577   Send   Write   Fetch   Fetch   Fetch  
  • 26. 26  ©  Cloudera,  Inc.  All  rights  reserved.   Consumer  -­‐  Groups   •  Consumers  can  be  organized  into  Consumer  Groups   •  Common  Palerns:   •  1)  All  consumer  instances  in  one  group   • Acts  like  a  tradi8onal  queue  with  load  balancing   •  2)  All  consumer  instances  in  different  groups   • All  messages  are  broadcast  to  all  consumer  instances   •  3)  “Logical  Subscriber”  –  Many  consumer  instances  in  a  group   • Consumers  are  added  for  scalability  and  fault  tolerance     • Each  consumer  instance  reads  from  one  or  more  par88ons  for  a  topic   • There  cannot  be  more  consumer  instances  than  par88ons  
  • 27. 27  ©  Cloudera,  Inc.  All  rights  reserved.   Consumer  -­‐  Groups   P0   P3   P1   P2   C1   C2   C3   C4   C5   C6   Ka;a  Cluster   Broker  1   Broker  2   Consumer  Group  A   Consumer  Group  B   Consumer  Groups   provide  isola8on  to   topics  and  par88ons  
  • 28. 28  ©  Cloudera,  Inc.  All  rights  reserved.   Consumer  -­‐  Groups   P0   P3   P1   P2   C1   C2   C3   C4   C5   C6   Ka;a  Cluster   Broker  1   Broker  2   Consumer  Group  A   Consumer  Group  B   Can  rebalance   themselves   X  
  • 29. 29  ©  Cloudera,  Inc.  All  rights  reserved.   Schema  
  • 30. 30  ©  Cloudera,  Inc.  All  rights  reserved.   Schema  is  a  MUST  HAVE  for     data  integra8on  
  • 31. 31  ©  Cloudera,  Inc.  All  rights  reserved.   Data  Exchange  in  Distributed  Architectures   •  Mul8ple  systems  interac8ng  together  benefit  from  a  common  data  exchange   format.     •  Choosing  the  correct  standard  can  significantly  impact  applica8on  design   Client   Client   serialize   serialize   deserialize   deserialize   Common  Data  Format  
  • 32. 32  ©  Cloudera,  Inc.  All  rights  reserved.   Goals   •  Simple   •  Flexible   •  Efficient   •  Change  Tolerant   •  Interoperable    As  systems  become  more  complex,  data  endpoints  need  to  be  decoupled  
  • 33. 33  ©  Cloudera,  Inc.  All  rights  reserved.   Click  to  enter  confiden8ality  
  • 34. 34  ©  Cloudera,  Inc.  All  rights  reserved.   I                        Avro     •  Define  Schema   •  Generate  code  for  objects   •  Serialize  /  Deserialize  into  Bytes  or  JSON   •  Embed  schema  in  files  /  records…  or  not   •  Support  for  our  favorite  languages…  Except  Go.   •  Schema  Evolu8on   • Add  and  remove  fields  without  breaking  anything  
  • 35. 35  ©  Cloudera,  Inc.  All  rights  reserved.   Use  Cases  
  • 36. 36  ©  Cloudera,  Inc.  All  rights  reserved.   Use  Cases   •  Real-­‐Time  Stream  Processing  (combined  with  Spark  Streaming)   •  General  purpose  Message  Bus   •  Collec8ng  User  Ac8vity  Data   •  Collec8ng  Opera8onal  Metrics  from  applica8ons,  servers  or  devices   •  Log  Aggrega8on   •  Change  Data  Capture   •  Commit  Log  for  distributed  systems  
  • 37. 37  ©  Cloudera,  Inc.  All  rights  reserved.   Frequently  Asked  Ques8ons  
  • 38. 38  ©  Cloudera,  Inc.  All  rights  reserved.   FAQs   •  Should  I  use  SSDs  for  Ka;a  Brokers?   •  How  do  I  encrypt  the  data  persisted  on  my  Ka;a  Brokers?   •  Is  it  true  that  Zookeeper  can  become  a  pain  point  with  a  Ka;a  cluster?   •  Does  Ka;a  support  cross-­‐data  center  availability?   •  What  type  of  data  transforma8ons  are  supported  on  Ka;a?   •  How  to  send  large  messages  or  payloads  through  Ka;a?   •  Does  Ka;a  support  MQTT  or  JMS  protocols?  
  • 39. 39  ©  Cloudera,  Inc.  All  rights  reserved.   Thank  you   Ashish  Singh   asingh@cloudera.com   @singhasdev