Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Airstream: Spark Streaming At Airbnb

6,461 views

Published on

Spark Summit 2016 talk by Liyin Tang and Jingwei Lu

Published in: Data & Analytics
  • I have always found it hard to meet the requirements of being a student. Ever since my years of high school, I really have no idea what professors are looking for to give good grades. After some google searching, I found this service ⇒ www.WritePaper.info ⇐ who helped me write my research paper. The final result was amazing, and I highly recommend ⇒ www.WritePaper.info ⇐ to anyone in the same mindset as me.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ♥♥♥ http://bit.ly/369VOVb ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ❤❤❤ http://bit.ly/369VOVb ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Airstream: Spark Streaming At Airbnb

  1. 1. AirStream LIYIN TANG & JINGWEI LU
  2. 2. Data Infrastructure at Airbnb
  3. 3. Event Logs MySQL Dumps Gold Cluster HDFS Hive Kafka Sqoop Silver Cluster Spark Cluster Spark ReAir Airflow Scheduling S3 Presto Cluster AirPal Caravel Tableau Batch Infrastructure Yarn HDFS Hive Yarn Liyin Tang and Jingwei Lu 3
  4. 4. Streaming at Airbnb Event Logging MySQL BINLOG Cluster HDFS Hive Spinal tap Presto Cluster Yarn Kafka HBase Spark Streaming Datadog Druid Kafka Liyin Tang and Jingwei Lu 4
  5. 5. Growing Pain
  6. 6. Stateless Liyin Tang and Jingwei Lu Computation SinkSource DStream DF DF
  7. 7. Stateful Liyin Tang and Jingwei Lu ComputationSource DStream DF DF Sink1 Sink2 Sink N State Storage RDD
  8. 8. Multiple Streams Liyin Tang and Jingwei Lu DataFrame Sink1 Process A Sink2 Sink3 SinkN … DataFrame Sink1 Process N Sink2 Sink3 SinkN … Source DStream Align by Time DataFrame DataFrame State Source DStream …
  9. 9. Streaming + Batch Liyin Tang and Jingwei Lu DataFrame Sink1 Process A Sink2 Sink3 SinkN … DataFrame State DStream … Align by Time … DataFrame Sink1 Process A Sink2 Sink3 SinkN …
  10. 10. Simplify and Unify
  11. 11. AirStream Architecture Liyin Tang and Jingwei Lu Sources Stream #1 Stream #N Hive Tables HBase Tables Virtual Table Views for Computation Sinks … Customized ComputationSpark SQL Simple Config HBase Services Streaming SourcesDruid
  12. 12. AirStream Architecture Liyin Tang and Jingwei Lu Sources Stream #1 Stream #N Hive Tables HBase Tables Virtual Table Views for Computation Sinks … Customized ComputationSpark SQL HBase Services Streaming SourcesDruid Same Computation for Batch processing
  13. 13. Stateful
  14. 14. Liyin Tang and Jingwei Lu State Store • Merge changes • Provide fast lookup • Fast persistent storage across streaming and batch jobs 14
  15. 15. Why HBase Liyin Tang and Jingwei Lu Rich Functionalities Rich Integration with Hadoop EcoSystem Easy Management Strong Community Reliable and Scalable
  16. 16. HBase State Store Operators in Airstream Liyin Tang and Jingwei Lu 16 Full Table Scan Simple Aggregation Bulk Upload Key/Prefix Lookup Update
  17. 17. Liyin Tang and Jingwei Lu Computation DAG 17 Input Data Left Outer Join Result Key Lookup
  18. 18. Liyin Tang and Jingwei Lu Key Space Design • Hash partition key space for load balance • Composite key for K-> V • Support full key lookup • Prefix lookup supported for all keys used in hash function Hash key1 key2 key3 Hash based on key prefix Hash key1 key2 Lookup based on key prefix key1 = ‘value1’ and key2 = ‘value2’ 18
  19. 19. • Partition based on key before write • Use bulk upload for large volume update Write Performance Liyin Tang and Jingwei Lu 19
  20. 20. Case Study Liyin Tang and Jingwei Lu Experiment realtime feedback 20 Update Experiment Assignment Event Lookup HBase with TTL Booking Event Druid Datadog one airstream configjob 2 job 1
  21. 21. Realtime Data Ingestion
  22. 22. Realtime Ingestion on HBase Data Infrastructure MySQL Analytical Events Kafka Spark Streamin HBase HDFS Presto/Hive/ Spark Source Ingest RealtimeQuery Snapshot BatchQuery Liyin Tang and Jingwei Lu 22
  23. 23. Access Data in HBase Liyin Tang and Jingwei Lu HBase Hive Presto Spark SQL Spark Streaming Batch Jobs Interactive Query Streaming HDFS Snapshot Table Mapping/Unifed View on realtime data 23
  24. 24. Snapshot&Reseed Liyin Tang and Jingwei Lu HBase HDFS Snapshot HFile Links) Bulk Upload 24
  25. 25. Case Study 1: Events Ingestion Liyin Tang and Jingwei Lu Kafka topic … topic topic Spark Executor1 … Executor Executor HBase DeDup HDFS Daily Realtime Hive Presto Events Partition 25
  26. 26. Case Study 2: Streaming DB Export KafkaRDS Table1 … Spinalta p. … Table2 TableN Spinaltap. Table2 Spinaltap. TableN Spark Executor1 … Executor2 Executor K HBase Region1 … Region2 Region M HDFS Daily Snapshot Realtime Query Liyin Tang and Jingwei Lu 26
  27. 27. Case Study: Streaming DB Export Rows CF: Colums Version Value <ShardKey><DB_TABLE_#1><PK_a=A> id Fri May 19 00:33:19 2016 101 <ShardKey><DB_TABLE_#1><PK_a=A> city Fri May 19 00:33:19 2016 San Francisco <ShardKey><DB_TABLE_#1><PK_a=A> city Fri May 10 00:34:19 2016 New York <ShardKey><DB_TABLE_#2><PK_a=A’> id Fri May 19 00:33:19 2016 1 Liyin Tang and Jingwei Lu 27
  28. 28. Case Study: Streaming DB Export TXN 1 Commit_TS: 101 … TXN 2 Commit_TS: 102 TXN 3 Commit_TS: 103 TXN N Commit_TS: N’ Binlog Order Liyin Tang and Jingwei Lu 28
  29. 29. Case Study: Streaming DB Export TXN 1 Commit_TS: 101 … TXN 2 Commit_TS: 103 TXN 3 Commit_TS: 102 TXN N Commit_TS: N’ NTP Binlog Order Liyin Tang and Jingwei Lu 29
  30. 30. Case Study: Streaming DB Export TXN 1 Commit_TS: 101 … Binlog Order TXN 2 Commit_TS: 103 TXN 3 Commit_TS: 102 TXN N Commit_TS: N’ Point-in-Time Restore on TS 102 Liyin Tang and Jingwei Lu 30
  31. 31. Case Study: Streaming DB Export Rows CF: Colums Version Value <ShardKey><DB_TABLE_#1><PK_a=A> id bin100 101 <ShardKey><DB_TABLE_#1><PK_a=A> city bin101 San Francisco <ShardKey><DB_TABLE_#1><PK_a=A> city bin102 New York <ShardKey><DB_TABLE_#2><PK_a=A’> id bin100 1 Liyin Tang and Jingwei Lu 31
  32. 32. Case Study: Streaming DB Export Rows Version (Logical Offset) Value <ShardKey><DB_TABLE_#1><2016-05-23 23><100> 100 mysql-bin.00000:100 <ShardKey><DB_TABLE_#1><2016-05-23 23><101> 101 mysql-bin.00000:101 <ShardKey><DB_TABLE_#1><2016-05-23 23><103> 103 mysql-bin.00000:103 <ShardKey><DB_TABLE_#1><2016-05-24 00><102> 102 mysql-bin.00000:102 Liyin Tang and Jingwei Lu 32
  33. 33. Case Study: Streaming DB Export Rows Version (Logical Offset) Value <ShardKey><DB_TABLE_#1><2016-05-23 23><100> 100 mysql-bin.00000:100 <ShardKey><DB_TABLE_#1><2016-05-23 23><101> 101 mysql-bin.00000:101 <ShardKey><DB_TABLE_#1><2016-05-23 23><103> 103 mysql-bin.00000:103 <ShardKey><DB_TABLE_#1><2016-05-24 00><102> 102 mysql-bin.00000:102 Liyin Tang and Jingwei Lu 33
  34. 34. Operation
  35. 35. Job Management: Scaling up Config Driver Streaming Job Yarn Spark Jobs … Liyin Tang & Jingwei Lu Config Driver Streaming Job … … … … Spark Jobs Config Driver Streaming Job Spark Jobs
  36. 36. Spark Job 1 Spark Job2 Spark Job N Concurrent … … Liyin Tang & Jingwei Lu Config Driver Streaming Job Yarn Job Management: Scaling up
  37. 37. Job Management: Fault Tolerant Driver Spark Job 1 Spark Job2 Spark Job N Streaming Job Concurrent Yarn … … Liyin Tang & Jingwei Lu OffsetManagement Mesos Driver Driver Config Config Config …… Checkpoint Rewind
  38. 38. Job Management: Monitoring&Alerting Driver Spark Job 1 Spark Job2 Spark Job N Streaming Job Concurrent Yarn … …AirStreamListener Liyin Tang & Jingwei Lu
  39. 39. Summary Liyin Tang and Jingwei Lu Simplify and Unify Stream Batch Pipeline Rich Stateful Computation Rich Integration with Hadoop EcoSystem Easy Operation
  40. 40. 40

×