Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN
Upcoming SlideShare
Loading in …5
×

of

Productionizing a 24/7 Spark Streaming Service on YARN Slide 1 Productionizing a 24/7 Spark Streaming Service on YARN Slide 2 Productionizing a 24/7 Spark Streaming Service on YARN Slide 3 Productionizing a 24/7 Spark Streaming Service on YARN Slide 4 Productionizing a 24/7 Spark Streaming Service on YARN Slide 5 Productionizing a 24/7 Spark Streaming Service on YARN Slide 6 Productionizing a 24/7 Spark Streaming Service on YARN Slide 7 Productionizing a 24/7 Spark Streaming Service on YARN Slide 8 Productionizing a 24/7 Spark Streaming Service on YARN Slide 9 Productionizing a 24/7 Spark Streaming Service on YARN Slide 10 Productionizing a 24/7 Spark Streaming Service on YARN Slide 11 Productionizing a 24/7 Spark Streaming Service on YARN Slide 12 Productionizing a 24/7 Spark Streaming Service on YARN Slide 13 Productionizing a 24/7 Spark Streaming Service on YARN Slide 14 Productionizing a 24/7 Spark Streaming Service on YARN Slide 15 Productionizing a 24/7 Spark Streaming Service on YARN Slide 16 Productionizing a 24/7 Spark Streaming Service on YARN Slide 17 Productionizing a 24/7 Spark Streaming Service on YARN Slide 18 Productionizing a 24/7 Spark Streaming Service on YARN Slide 19 Productionizing a 24/7 Spark Streaming Service on YARN Slide 20 Productionizing a 24/7 Spark Streaming Service on YARN Slide 21 Productionizing a 24/7 Spark Streaming Service on YARN Slide 22 Productionizing a 24/7 Spark Streaming Service on YARN Slide 23 Productionizing a 24/7 Spark Streaming Service on YARN Slide 24 Productionizing a 24/7 Spark Streaming Service on YARN Slide 25 Productionizing a 24/7 Spark Streaming Service on YARN Slide 26 Productionizing a 24/7 Spark Streaming Service on YARN Slide 27 Productionizing a 24/7 Spark Streaming Service on YARN Slide 28 Productionizing a 24/7 Spark Streaming Service on YARN Slide 29
Upcoming SlideShare
Problem Based Integrated Teaching of Bronchial Asthma to Second MBBS Students
Next
Download to read offline and view in fullscreen.

9 Likes

Share

Download to read offline

Productionizing a 24/7 Spark Streaming Service on YARN

Download to read offline

At Ooyala we must process over two billion video events a day and provide rich, near real-time, and always-available analytics to thousands of customers. Spark Streaming is core to our state of the art ingestion pipeline. In developing this system we have encountered and resolved a large number of undocumented challenges which we would like to share: What are some of the challenges and lessons from productionizing a Spark Streaming pipeline over YARN? How do you ensure 24/7 availability and fault tolerance? What are the best practices for Spark Streaming and its integration with Kafka and YARN? How do you monitor and instrument the various stages of the pipeline? We will dive into all these topics and more.

BIO:
Issac Buenrostro is a software engineer at Ooyala creating a new ingestion system for video analytics events using Spark, YARN, Thrift, and Parquet. Before Ooyala he obtained a Bachelors degree from MIT and a Masters from Stanford in applied mathematics working on high performance scientific computing.

Arup Malakar works on the next gen ETL pipeline of analytics at Ooyala and uses Spark Streaming, YARN and Kafka for it. Before Ooyala he contributed to apache Hive, HCatalog and helped built the hosted platform for processing feeds at Yahoo! Arup holds a Bachelor in Computer Science from IIT, Guwahati.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • kennethowino9

    Aug. 21, 2019
  • trenthikim

    Nov. 7, 2015
  • bunkertor

    Dec. 11, 2014
  • allenkk

    Nov. 29, 2014
  • justin2061

    Nov. 6, 2014
  • nirvanesque1

    Jul. 21, 2014
  • arjones

    Jul. 1, 2014
  • mastropos

    Jun. 19, 2014
  • kimocrossman

    Jun. 19, 2014

At Ooyala we must process over two billion video events a day and provide rich, near real-time, and always-available analytics to thousands of customers. Spark Streaming is core to our state of the art ingestion pipeline. In developing this system we have encountered and resolved a large number of undocumented challenges which we would like to share: What are some of the challenges and lessons from productionizing a Spark Streaming pipeline over YARN? How do you ensure 24/7 availability and fault tolerance? What are the best practices for Spark Streaming and its integration with Kafka and YARN? How do you monitor and instrument the various stages of the pipeline? We will dive into all these topics and more. BIO: Issac Buenrostro is a software engineer at Ooyala creating a new ingestion system for video analytics events using Spark, YARN, Thrift, and Parquet. Before Ooyala he obtained a Bachelors degree from MIT and a Masters from Stanford in applied mathematics working on high performance scientific computing. Arup Malakar works on the next gen ETL pipeline of analytics at Ooyala and uses Spark Streaming, YARN and Kafka for it. Before Ooyala he contributed to apache Hive, HCatalog and helped built the hosted platform for processing feeds at Yahoo! Arup holds a Bachelor in Computer Science from IIT, Guwahati.

Views

Total views

934

On Slideshare

0

From embeds

0

Number of embeds

90

Actions

Downloads

54

Shares

0

Comments

0

Likes

9

×