Kafka




                      Twitter: yanaoki
                        2011/11/27
                 16               +WEB

                                  http://www.flickr.com/photos/devnull/19765635/
2011   11   27
•
                 •
                 •   Kafka

                 •   Kafka




2011   11   27
•   Twitter: @yanaoki

                 •
                     •

                 •   Java Ruby   Hadoop/Mahout Cassandra




2011   11   27
•
                 •

                 •   2011




2011   11   27
Facebook Insights




                 •   2011   3

                     •   “Like” ”Share”   CTR

                     •   Facebook

2011   11   27
Google Analytics




                 •   2011   9

                     •
                     •   PV UU

                     •
2011   11   27
Twitter Web Analytics




                 •   2011    09

                     •             Twitter

                     •   Twitter

                     •
2011   11   27
Linktedin




                 •                    Kafka

                     •   PV

                     •
                         •
                         •
                     •
2011   11   27
•   Facebook Insight

                     •   PUMA        Scribe / HDFS / pTail / HBase /Thrift

                     •   http://slidesha.re/ijWfPh

                 •   Twitter Promoted Tweets Reporting

                     •   Rainbird       ZooKeeper / Cassandra

                     •   http://slidesha.re/dRxtIp

                 •   Twitter Web Analytics

                     •   Storm       Zookieper

                     •   http://slidesha.re/qbpKbY

                 •   Google Analytics

                     •
                 •   LinktedIn
                     •   Kafka      ZooKeeper


2011   11   27
Kafka
            •    Kafka

                 •
                 •   Linkedin

                 •              http://incubator.apache.org

                 •




2011   11   27
Linkedin
                 •
                 •                              SNS

                 •
                 •   2011     11

                 •   Kafka

                     •   2010      11

                     •   2011      07   Apache incubator project

                     •   a Distributed Messaging System for Log Processing

                         •   http://research.microsoft.com/en-us/um/people/
                             srikanth/netdb11/netdb11papers/netdb11-final12.pdf

2011   11   27
Kafka

            •
                 •          Kafka


                 •
                     •
                 •
                     •   Hadoop/HDFS

                     •                 DWH

            •
2011   11   27
2011   11   27
•
            •    ZooKeeper




2011   11   27
•
                     •               SPOF

                     •   ZooKeeper

                     •




2011   11   27
Push or Pull
                 •   Push
                     •               scribe   flume




                 •   Pull

                     •   Kafka

                         •




2011   11   27
•        Pub/Sub

                 •
                 •   Publish
                     •

                 •   Subscribe
                     •                   subscribe


                     •
                                 Kafka


2011   11   27
•
                     •
                         •
                         •
                 •   Kafka

                     •       Zookeeper

                     •

2011   11   27
•
                     •
                         •
                     •
                         •   ZooKeeper

                 •
                     •
                     •
2011   11   27
2011   11   27
2011   11   27
A   B       D
                         C




2011   11   27
A   B       D
                         C




2011   11   27
A   B       D
                         C




2011   11   27
●       ●

                                        ―            ―



                        ●       ●           ●


                                        ―
                                ●


                        ●       ●   ●       ●
                                    ●           ZK




                                    ●       ●
                                ●


            ZooKeeper       ―   ●




2011   11   27
•
                     •   Kafka           O(log n)   O(1)

                 •   OS

                 •   Java        GC

                 •               BTree




2011   11   27
•
                     •


                 •
                     •                                Java
                         NIO   (※ Linux sendfile   )




2011   11   27
HUG January 2011 Kafka Presentation




                        http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation
2011   11   27
2011   11   27
Facebook
             •                                                MapReduce(Not HadoopMR)

                 •   Scribe/PTail/Puma                            Map

                 •   HBase                                               Reduce
            http://www.slideshare.net/tatsuya6502/tokyo-hbase-meetup-realtime-big-data-at-facebook-ja




                                                    Map                    Reduce




2011   11   27
Kafka HBase

                 •   Twitter

                     •                                     (en       ja

                                                                          lang
                                                                          client



                                        lang           HBaseImport
                                                        Comsumer
                     TwitterStreaming
                         Producer
                                                                              ja→10
                                                                              en→32



                                        Kafka Broker
2011   11   27
Kafka HBase


                 •                                      Twitter


                                                                      lang
                                                                      client



                                        lang
                                                        HBaseImport
                                                         Comsumer
                     TwitterStreaming
                         Producer                                        ja→10
                                                                         en→32

                                        client                         web → 100
                                                                      iPhoone→10
                                         Kafka Broker                 Android→10

2011   11   27
Hadoop


                 •                                     Hadoop



                     •   Hadoop    Map        Kafka



                     •   KafkaETLJob / KafkaETLInputFormat / KafkaETLRecordReader

                     •     MapReuduce API




2011   11   27
Hadoop
                  Offset

                                                      Offset

                                            HDFS
                                                               Map
                                                                            Offset
                           Offset   Limit
                                                             map
                                                                       →
       Kafka                                  Mapper Kafka
                                                   Reducer



                                                                     HDFS




2011    11   27
2011   11   27

Kafkaによるリアルタイム処理

  • 1.
    Kafka Twitter: yanaoki 2011/11/27 16 +WEB http://www.flickr.com/photos/devnull/19765635/ 2011 11 27
  • 2.
    • • Kafka • Kafka 2011 11 27
  • 3.
    Twitter: @yanaoki • • • Java Ruby Hadoop/Mahout Cassandra 2011 11 27
  • 4.
    • • 2011 2011 11 27
  • 5.
    Facebook Insights • 2011 3 • “Like” ”Share” CTR • Facebook 2011 11 27
  • 6.
    Google Analytics • 2011 9 • • PV UU • 2011 11 27
  • 7.
    Twitter Web Analytics • 2011 09 • Twitter • Twitter • 2011 11 27
  • 8.
    Linktedin • Kafka • PV • • • • 2011 11 27
  • 9.
    Facebook Insight • PUMA Scribe / HDFS / pTail / HBase /Thrift • http://slidesha.re/ijWfPh • Twitter Promoted Tweets Reporting • Rainbird ZooKeeper / Cassandra • http://slidesha.re/dRxtIp • Twitter Web Analytics • Storm Zookieper • http://slidesha.re/qbpKbY • Google Analytics • • LinktedIn • Kafka ZooKeeper 2011 11 27
  • 10.
    Kafka • Kafka • • Linkedin • http://incubator.apache.org • 2011 11 27
  • 11.
    Linkedin • • SNS • • 2011 11 • Kafka • 2010 11 • 2011 07 Apache incubator project • a Distributed Messaging System for Log Processing • http://research.microsoft.com/en-us/um/people/ srikanth/netdb11/netdb11papers/netdb11-final12.pdf 2011 11 27
  • 12.
    Kafka • • Kafka • • • • Hadoop/HDFS • DWH • 2011 11 27
  • 13.
    2011 11 27
  • 14.
    • ZooKeeper 2011 11 27
  • 15.
    • SPOF • ZooKeeper • 2011 11 27
  • 16.
    Push or Pull • Push • scribe flume • Pull • Kafka • 2011 11 27
  • 17.
    Pub/Sub • • Publish • • Subscribe • subscribe • Kafka 2011 11 27
  • 18.
    • • • • Kafka • Zookeeper • 2011 11 27
  • 19.
    • • • • ZooKeeper • • • 2011 11 27
  • 20.
    2011 11 27
  • 21.
    2011 11 27
  • 22.
    A B D C 2011 11 27
  • 23.
    A B D C 2011 11 27
  • 24.
    A B D C 2011 11 27
  • 25.
    ● ― ― ● ● ● ― ● ● ● ● ● ● ZK ● ● ● ZooKeeper ― ● 2011 11 27
  • 26.
    • Kafka O(log n) O(1) • OS • Java GC • BTree 2011 11 27
  • 27.
    • • • Java NIO (※ Linux sendfile ) 2011 11 27
  • 28.
    HUG January 2011Kafka Presentation http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation 2011 11 27
  • 29.
    2011 11 27
  • 30.
    Facebook • MapReduce(Not HadoopMR) • Scribe/PTail/Puma Map • HBase Reduce http://www.slideshare.net/tatsuya6502/tokyo-hbase-meetup-realtime-big-data-at-facebook-ja Map Reduce 2011 11 27
  • 31.
    Kafka HBase • Twitter • (en ja lang client lang HBaseImport Comsumer TwitterStreaming Producer ja→10 en→32 Kafka Broker 2011 11 27
  • 32.
    Kafka HBase • Twitter lang client lang HBaseImport Comsumer TwitterStreaming Producer ja→10 en→32 client web → 100 iPhoone→10 Kafka Broker Android→10 2011 11 27
  • 33.
    Hadoop • Hadoop • Hadoop Map Kafka • KafkaETLJob / KafkaETLInputFormat / KafkaETLRecordReader • MapReuduce API 2011 11 27
  • 34.
    Hadoop Offset Offset HDFS Map Offset Offset Limit map → Kafka Mapper Kafka Reducer HDFS 2011 11 27
  • 35.
    2011 11 27