SlideShare a Scribd company logo
Spark Meetup #2
Tal Sliwowicz
Director, R&D
tal@taboola.com
Who are we?
Ruthy Goldberg
Sr. Software Engineer
ruthy@taboola.com
Collaborative Filtering
Bucketed Consumption Groups
Geo
Region-based
Recommendations
Context
Metadata
Social
Facebook/Twitter API
User Behavior
Cookie Data
Engine Focused on Maximizing CTR & Post Click Engagement
Largest Content Discovery and
Monetization Network
550MMonthly Unique
Users
240BMonthly
Recommendations
10B+Daily User Events
5TB+Incoming Daily Data
• Using Spark in production since v0.8
• 6 Data Centers across the globe
• Dedicated Spark & Cassandra (for spark) cluster consists of
– 5000+ cores with 35TB of RAM memory and ~1PB of SSD local
storage, across 2 Data Centers.
• Data must be processed and analyzed in real time, for example:
– Real-time, per user content recommendations
– Real-time expenditure reports
– Automated campaign management
– Automated recommendation algorithms calibration
– Real-time analytics
What Does it Mean?
SPARK SUMMIT SF 2015
Highlights
• Spark DataFrames: Simple and Fast Analysis of
Structured Data
https://spark-summit.org/2015/events/spark-dataframes-simple-and-fast-analysis-
of-structured-data/
DataFrames
• From DataFrames to Tungsten: A Peek into Spark's
Future
https://spark-summit.org/2015/events/keynote-9/
• Deep Dive into Project Tungsten: Bringing Spark
Closer to Bare Metal
https://spark-summit.org/2015/events/deep-dive-into-project-tungsten-bringing-
spark-closer-to-bare-metal/
Tungsten
• Spark and Spark Streaming at Netflix
https://spark-summit.org/2015/events/spark-and-spark-streaming-at-netflix/
Interesting Users’ Experience - Netflix
• How Spark Fits into Baidu's Scale
https://spark-summit.org/2015/events/keynote-10/
Interesting Users’ Experience - Baidu
• Recipes for Running Spark Streaming Applications in
Production
https://spark-summit.org/2015/events/recipes-for-running-spark-streaming-
applications-in-production/
Databricks Practical Talks – Spark Streaming
• Building, Debugging, and Tuning Spark Machine
Learning Pipelines
https://spark-summit.org/2015/events/practical-machine-learning-pipelines-with-
mllib-2/
Databricks Practical Talks – Machine Learning
• Making Sense of Spark Performance
https://spark-summit.org/2015/events/making-sense-of-spark-performance/
• Taming GC Pauses for Humongous Java Heaps in
Spark Graph Computing
https://spark-summit.org/2015/events/taming-gc-pauses-for-humongous-java-
heaps-in-spark-graph-computing/
• IndexedRDD: Efficient Fine-Grained Updates for
RDDs
https://spark-summit.org/2015/events/indexedrdd-efficient-fine-grained-updates-
for-rdds/
Performance
• All Spark summit videos and presentations can be
found here https://spark-summit.org/2015/
Summary
USING SPARK AND C* TOGETHER
FOR DATA ANALYSIS USING DATA
FRAMES AND ZEPPELIN
Newsroom Dashboard
Cassandra
Main Taboola’s framework classes:
– CassandraTableSchemaProvider
– CassandraDataLoader
Cassandra Table  Spark DataFrame
CassandraTableSchemaProvider
Getting Cassandra Metadata
DF From Cassandra
CassandraDataLoader
CassandraDataLoader
Main Taboola’s framework classes:
– CassandraTableSchemaProvider
– CassandraDataLoader
Cassandra Table  Spark DataFrame
Mysql Table  Spark DataFrame
Zeppelin
Loading Our Code Into Zeppelin
SparkContext, SQLContext, ZeppelinContext are
automatically created and exposed as variable names
'sc', 'sqlContext' and 'z', respectively, both in scala and
python environments.
General Variables In Zeppelin
Executing Staging Code
DEMO
• Connect Zeppelin to the cluster (not
standalone)
• Load raw sessions data
• Run code (python/scala) for algorithmic
analysis
Zeppelin @Taboola - What’s next?
tal@taboola.com
ruthy@taboola.com
Thank You!

More Related Content

What's hot

Monitoring kubernetes wwith prometheus and grafana azure singapore - 19 aug...
Monitoring kubernetes wwith prometheus and grafana   azure singapore - 19 aug...Monitoring kubernetes wwith prometheus and grafana   azure singapore - 19 aug...
Monitoring kubernetes wwith prometheus and grafana azure singapore - 19 aug...
Nilesh Gule
 
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
HostedbyConfluent
 
WSO2Con ASIA 2016: API Driven Innovation Within the Enterprise
WSO2Con ASIA 2016: API Driven Innovation Within the EnterpriseWSO2Con ASIA 2016: API Driven Innovation Within the Enterprise
WSO2Con ASIA 2016: API Driven Innovation Within the Enterprise
WSO2
 
Icinga Camp Bangalore - Enterprise exceptions
Icinga Camp Bangalore - Enterprise exceptions Icinga Camp Bangalore - Enterprise exceptions
Icinga Camp Bangalore - Enterprise exceptions
Icinga
 
Monitoring cloud applications and containers
Monitoring cloud applications and containersMonitoring cloud applications and containers
Monitoring cloud applications and containers
ManageEngine, Zoho Corporation
 
Build Your Own Recommendation Engine
Build Your Own Recommendation EngineBuild Your Own Recommendation Engine
Build Your Own Recommendation Engine
Sri Ambati
 
Big data and non relational database
Big data and non relational databaseBig data and non relational database
Big data and non relational database
ManageEngine, Zoho Corporation
 
JEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldJEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java World
Serg Masyutin
 
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
HostedbyConfluent
 
Extending KEDA with External Scalers
Extending KEDA with External ScalersExtending KEDA with External Scalers
Extending KEDA with External Scalers
Baltazar Chua
 
Google Charts for native Android apps
Google Charts for native Android appsGoogle Charts for native Android apps
Google Charts for native Android apps
Chuck Greb
 
Cis 528presentation final
Cis 528presentation finalCis 528presentation final
Cis 528presentation final
priyalmistry4
 
Cis 528 big data
Cis 528 big dataCis 528 big data
Cis 528 big data
akashgandhi10
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Databricks
 
01 supermapiportaloverview
01 supermapiportaloverview01 supermapiportaloverview
01 supermapiportaloverview
GeoMedeelel
 
Kafka Summit NYC 2017 - The Rise of the Streaming Platform
Kafka Summit NYC 2017 - The Rise of the Streaming PlatformKafka Summit NYC 2017 - The Rise of the Streaming Platform
Kafka Summit NYC 2017 - The Rise of the Streaming Platform
confluent
 
Up and Running with firebase
Up and Running with firebaseUp and Running with firebase
Up and Running with firebase
Md. Sadhan Sarker
 
Real Time API delivering data @ Scale
Real Time API delivering data @ ScaleReal Time API delivering data @ Scale
Real Time API delivering data @ Scale
Akash Mishra
 
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDealIcinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
Icinga
 
02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction
GeoMedeelel
 

What's hot (20)

Monitoring kubernetes wwith prometheus and grafana azure singapore - 19 aug...
Monitoring kubernetes wwith prometheus and grafana   azure singapore - 19 aug...Monitoring kubernetes wwith prometheus and grafana   azure singapore - 19 aug...
Monitoring kubernetes wwith prometheus and grafana azure singapore - 19 aug...
 
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
 
WSO2Con ASIA 2016: API Driven Innovation Within the Enterprise
WSO2Con ASIA 2016: API Driven Innovation Within the EnterpriseWSO2Con ASIA 2016: API Driven Innovation Within the Enterprise
WSO2Con ASIA 2016: API Driven Innovation Within the Enterprise
 
Icinga Camp Bangalore - Enterprise exceptions
Icinga Camp Bangalore - Enterprise exceptions Icinga Camp Bangalore - Enterprise exceptions
Icinga Camp Bangalore - Enterprise exceptions
 
Monitoring cloud applications and containers
Monitoring cloud applications and containersMonitoring cloud applications and containers
Monitoring cloud applications and containers
 
Build Your Own Recommendation Engine
Build Your Own Recommendation EngineBuild Your Own Recommendation Engine
Build Your Own Recommendation Engine
 
Big data and non relational database
Big data and non relational databaseBig data and non relational database
Big data and non relational database
 
JEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldJEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java World
 
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
 
Extending KEDA with External Scalers
Extending KEDA with External ScalersExtending KEDA with External Scalers
Extending KEDA with External Scalers
 
Google Charts for native Android apps
Google Charts for native Android appsGoogle Charts for native Android apps
Google Charts for native Android apps
 
Cis 528presentation final
Cis 528presentation finalCis 528presentation final
Cis 528presentation final
 
Cis 528 big data
Cis 528 big dataCis 528 big data
Cis 528 big data
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 
01 supermapiportaloverview
01 supermapiportaloverview01 supermapiportaloverview
01 supermapiportaloverview
 
Kafka Summit NYC 2017 - The Rise of the Streaming Platform
Kafka Summit NYC 2017 - The Rise of the Streaming PlatformKafka Summit NYC 2017 - The Rise of the Streaming Platform
Kafka Summit NYC 2017 - The Rise of the Streaming Platform
 
Up and Running with firebase
Up and Running with firebaseUp and Running with firebase
Up and Running with firebase
 
Real Time API delivering data @ Scale
Real Time API delivering data @ ScaleReal Time API delivering data @ Scale
Real Time API delivering data @ Scale
 
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDealIcinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
 
02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction
 

Similar to Spark meetup2 final (Taboola)

Spark Magic Building and Deploying a High Scale Product in 4 Months
Spark Magic Building and Deploying a High Scale Product in 4 MonthsSpark Magic Building and Deploying a High Scale Product in 4 Months
Spark Magic Building and Deploying a High Scale Product in 4 Months
tsliwowicz
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
Burak Yavuz
 
Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...
Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...
Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...
HostedbyConfluent
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
Paco Nathan
 
Spark Hsinchu meetup
Spark Hsinchu meetupSpark Hsinchu meetup
Spark Hsinchu meetup
Yung-An He
 
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Lillian Pierson
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Yao Yao
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
Stepan Pushkarev
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
Pavel Hardak
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Eren Avşaroğulları
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL Performance
Takuya UESHIN
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Ali Hodroj
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
Stepan Pushkarev
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Christopher Gutknecht
 
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in dayCherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
Vishal Pawar
 
Desenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open Source
Desenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open SourceDesenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open Source
Desenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open Source
Rodrigo Kono
 
SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?
Nicolas Georgeault
 
Knowage roadmap-2022 (1)
Knowage roadmap-2022 (1)Knowage roadmap-2022 (1)
Knowage roadmap-2022 (1)
KNOWAGE
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Spark Summit
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
Taras Matyashovsky
 

Similar to Spark meetup2 final (Taboola) (20)

Spark Magic Building and Deploying a High Scale Product in 4 Months
Spark Magic Building and Deploying a High Scale Product in 4 MonthsSpark Magic Building and Deploying a High Scale Product in 4 Months
Spark Magic Building and Deploying a High Scale Product in 4 Months
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
 
Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...
Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...
Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Spark Hsinchu meetup
Spark Hsinchu meetupSpark Hsinchu meetup
Spark Hsinchu meetup
 
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL Performance
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
 
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in dayCherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
 
Desenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open Source
Desenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open SourceDesenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open Source
Desenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open Source
 
SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?
 
Knowage roadmap-2022 (1)
Knowage roadmap-2022 (1)Knowage roadmap-2022 (1)
Knowage roadmap-2022 (1)
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
 

More from tsliwowicz

Spark war stories taboola
Spark war stories taboolaSpark war stories taboola
Spark war stories taboola
tsliwowicz
 
Spark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboolaSpark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboola
tsliwowicz
 
Using apache spark to fight world hunger - Israel spark meetup at taboola
Using apache spark to fight world hunger - Israel spark meetup at taboolaUsing apache spark to fight world hunger - Israel spark meetup at taboola
Using apache spark to fight world hunger - Israel spark meetup at taboola
tsliwowicz
 
Inneractive - Spark meetup2
Inneractive - Spark meetup2Inneractive - Spark meetup2
Inneractive - Spark meetup2
tsliwowicz
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
tsliwowicz
 

More from tsliwowicz (6)

Spark war stories taboola
Spark war stories taboolaSpark war stories taboola
Spark war stories taboola
 
Spark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboolaSpark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboola
 
Using apache spark to fight world hunger - Israel spark meetup at taboola
Using apache spark to fight world hunger - Israel spark meetup at taboolaUsing apache spark to fight world hunger - Israel spark meetup at taboola
Using apache spark to fight world hunger - Israel spark meetup at taboola
 
Inneractive - Spark meetup2
Inneractive - Spark meetup2Inneractive - Spark meetup2
Inneractive - Spark meetup2
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
 

Recently uploaded

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 

Recently uploaded (20)

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 

Spark meetup2 final (Taboola)

  • 2. Tal Sliwowicz Director, R&D tal@taboola.com Who are we? Ruthy Goldberg Sr. Software Engineer ruthy@taboola.com
  • 3. Collaborative Filtering Bucketed Consumption Groups Geo Region-based Recommendations Context Metadata Social Facebook/Twitter API User Behavior Cookie Data Engine Focused on Maximizing CTR & Post Click Engagement
  • 4. Largest Content Discovery and Monetization Network 550MMonthly Unique Users 240BMonthly Recommendations 10B+Daily User Events 5TB+Incoming Daily Data
  • 5. • Using Spark in production since v0.8 • 6 Data Centers across the globe • Dedicated Spark & Cassandra (for spark) cluster consists of – 5000+ cores with 35TB of RAM memory and ~1PB of SSD local storage, across 2 Data Centers. • Data must be processed and analyzed in real time, for example: – Real-time, per user content recommendations – Real-time expenditure reports – Automated campaign management – Automated recommendation algorithms calibration – Real-time analytics What Does it Mean?
  • 6. SPARK SUMMIT SF 2015 Highlights
  • 7. • Spark DataFrames: Simple and Fast Analysis of Structured Data https://spark-summit.org/2015/events/spark-dataframes-simple-and-fast-analysis- of-structured-data/ DataFrames
  • 8.
  • 9.
  • 10. • From DataFrames to Tungsten: A Peek into Spark's Future https://spark-summit.org/2015/events/keynote-9/ • Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal https://spark-summit.org/2015/events/deep-dive-into-project-tungsten-bringing- spark-closer-to-bare-metal/ Tungsten
  • 11.
  • 12.
  • 13.
  • 14. • Spark and Spark Streaming at Netflix https://spark-summit.org/2015/events/spark-and-spark-streaming-at-netflix/ Interesting Users’ Experience - Netflix
  • 15.
  • 16. • How Spark Fits into Baidu's Scale https://spark-summit.org/2015/events/keynote-10/ Interesting Users’ Experience - Baidu
  • 17.
  • 18. • Recipes for Running Spark Streaming Applications in Production https://spark-summit.org/2015/events/recipes-for-running-spark-streaming- applications-in-production/ Databricks Practical Talks – Spark Streaming
  • 19.
  • 20. • Building, Debugging, and Tuning Spark Machine Learning Pipelines https://spark-summit.org/2015/events/practical-machine-learning-pipelines-with- mllib-2/ Databricks Practical Talks – Machine Learning
  • 21.
  • 22. • Making Sense of Spark Performance https://spark-summit.org/2015/events/making-sense-of-spark-performance/ • Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing https://spark-summit.org/2015/events/taming-gc-pauses-for-humongous-java- heaps-in-spark-graph-computing/ • IndexedRDD: Efficient Fine-Grained Updates for RDDs https://spark-summit.org/2015/events/indexedrdd-efficient-fine-grained-updates- for-rdds/ Performance
  • 23. • All Spark summit videos and presentations can be found here https://spark-summit.org/2015/ Summary
  • 24. USING SPARK AND C* TOGETHER FOR DATA ANALYSIS USING DATA FRAMES AND ZEPPELIN
  • 27. Main Taboola’s framework classes: – CassandraTableSchemaProvider – CassandraDataLoader Cassandra Table  Spark DataFrame
  • 32. Main Taboola’s framework classes: – CassandraTableSchemaProvider – CassandraDataLoader Cassandra Table  Spark DataFrame
  • 33. Mysql Table  Spark DataFrame
  • 35. Loading Our Code Into Zeppelin
  • 36. SparkContext, SQLContext, ZeppelinContext are automatically created and exposed as variable names 'sc', 'sqlContext' and 'z', respectively, both in scala and python environments. General Variables In Zeppelin
  • 38. DEMO
  • 39. • Connect Zeppelin to the cluster (not standalone) • Load raw sessions data • Run code (python/scala) for algorithmic analysis Zeppelin @Taboola - What’s next?

Editor's Notes

  1. Tungsten motivation – CPU stayed the same for the last 10 years, so need to optimize code (1) Runtime code generation (2)  Exploiting cache locality (3)  Off-heap memory management
  2. ----- Meeting Notes (2/8/15 22:08) ----- Dont forget Spark 1.4 + Kafka
  3. ----- Meeting Notes (3/8/15 14:00) ----- Lior Chaga