SlideShare a Scribd company logo
1 of 40
Spark Meetup #2
Tal Sliwowicz
Director, R&D
tal@taboola.com
Who are we?
Ruthy Goldberg
Sr. Software Engineer
ruthy@taboola.com
Collaborative Filtering
Bucketed Consumption Groups
Geo
Region-based
Recommendations
Context
Metadata
Social
Facebook/Twitter API
User Behavior
Cookie Data
Engine Focused on Maximizing CTR & Post Click Engagement
Largest Content Discovery and
Monetization Network
550MMonthly Unique
Users
240BMonthly
Recommendations
10B+Daily User Events
5TB+Incoming Daily Data
• Using Spark in production since v0.8
• 6 Data Centers across the globe
• Dedicated Spark & Cassandra (for spark) cluster consists of
– 5000+ cores with 35TB of RAM memory and ~1PB of SSD local
storage, across 2 Data Centers.
• Data must be processed and analyzed in real time, for example:
– Real-time, per user content recommendations
– Real-time expenditure reports
– Automated campaign management
– Automated recommendation algorithms calibration
– Real-time analytics
What Does it Mean?
SPARK SUMMIT SF 2015
Highlights
• Spark DataFrames: Simple and Fast Analysis of
Structured Data
https://spark-summit.org/2015/events/spark-dataframes-simple-and-fast-analysis-
of-structured-data/
DataFrames
• From DataFrames to Tungsten: A Peek into Spark's
Future
https://spark-summit.org/2015/events/keynote-9/
• Deep Dive into Project Tungsten: Bringing Spark
Closer to Bare Metal
https://spark-summit.org/2015/events/deep-dive-into-project-tungsten-bringing-
spark-closer-to-bare-metal/
Tungsten
• Spark and Spark Streaming at Netflix
https://spark-summit.org/2015/events/spark-and-spark-streaming-at-netflix/
Interesting Users’ Experience - Netflix
• How Spark Fits into Baidu's Scale
https://spark-summit.org/2015/events/keynote-10/
Interesting Users’ Experience - Baidu
• Recipes for Running Spark Streaming Applications in
Production
https://spark-summit.org/2015/events/recipes-for-running-spark-streaming-
applications-in-production/
Databricks Practical Talks – Spark Streaming
• Building, Debugging, and Tuning Spark Machine
Learning Pipelines
https://spark-summit.org/2015/events/practical-machine-learning-pipelines-with-
mllib-2/
Databricks Practical Talks – Machine Learning
• Making Sense of Spark Performance
https://spark-summit.org/2015/events/making-sense-of-spark-performance/
• Taming GC Pauses for Humongous Java Heaps in
Spark Graph Computing
https://spark-summit.org/2015/events/taming-gc-pauses-for-humongous-java-
heaps-in-spark-graph-computing/
• IndexedRDD: Efficient Fine-Grained Updates for
RDDs
https://spark-summit.org/2015/events/indexedrdd-efficient-fine-grained-updates-
for-rdds/
Performance
• All Spark summit videos and presentations can be
found here https://spark-summit.org/2015/
Summary
USING SPARK AND C* TOGETHER
FOR DATA ANALYSIS USING DATA
FRAMES AND ZEPPELIN
Newsroom Dashboard
Cassandra
Main Taboola’s framework classes:
– CassandraTableSchemaProvider
– CassandraDataLoader
Cassandra Table  Spark DataFrame
CassandraTableSchemaProvider
Getting Cassandra Metadata
DF From Cassandra
CassandraDataLoader
CassandraDataLoader
Main Taboola’s framework classes:
– CassandraTableSchemaProvider
– CassandraDataLoader
Cassandra Table  Spark DataFrame
Mysql Table  Spark DataFrame
Zeppelin
Loading Our Code Into Zeppelin
SparkContext, SQLContext, ZeppelinContext are
automatically created and exposed as variable names
'sc', 'sqlContext' and 'z', respectively, both in scala and
python environments.
General Variables In Zeppelin
Executing Staging Code
DEMO
• Connect Zeppelin to the cluster (not
standalone)
• Load raw sessions data
• Run code (python/scala) for algorithmic
analysis
Zeppelin @Taboola - What’s next?
tal@taboola.com
ruthy@taboola.com
Thank You!

More Related Content

What's hot

Monitoring kubernetes wwith prometheus and grafana azure singapore - 19 aug...
Monitoring kubernetes wwith prometheus and grafana   azure singapore - 19 aug...Monitoring kubernetes wwith prometheus and grafana   azure singapore - 19 aug...
Monitoring kubernetes wwith prometheus and grafana azure singapore - 19 aug...Nilesh Gule
 
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...HostedbyConfluent
 
WSO2Con ASIA 2016: API Driven Innovation Within the Enterprise
WSO2Con ASIA 2016: API Driven Innovation Within the EnterpriseWSO2Con ASIA 2016: API Driven Innovation Within the Enterprise
WSO2Con ASIA 2016: API Driven Innovation Within the EnterpriseWSO2
 
Icinga Camp Bangalore - Enterprise exceptions
Icinga Camp Bangalore - Enterprise exceptions Icinga Camp Bangalore - Enterprise exceptions
Icinga Camp Bangalore - Enterprise exceptions Icinga
 
Build Your Own Recommendation Engine
Build Your Own Recommendation EngineBuild Your Own Recommendation Engine
Build Your Own Recommendation EngineSri Ambati
 
JEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldJEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldSerg Masyutin
 
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...HostedbyConfluent
 
Extending KEDA with External Scalers
Extending KEDA with External ScalersExtending KEDA with External Scalers
Extending KEDA with External ScalersBaltazar Chua
 
Google Charts for native Android apps
Google Charts for native Android appsGoogle Charts for native Android apps
Google Charts for native Android appsChuck Greb
 
Cis 528presentation final
Cis 528presentation finalCis 528presentation final
Cis 528presentation finalpriyalmistry4
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...Databricks
 
01 supermapiportaloverview
01 supermapiportaloverview01 supermapiportaloverview
01 supermapiportaloverviewGeoMedeelel
 
Kafka Summit NYC 2017 - The Rise of the Streaming Platform
Kafka Summit NYC 2017 - The Rise of the Streaming PlatformKafka Summit NYC 2017 - The Rise of the Streaming Platform
Kafka Summit NYC 2017 - The Rise of the Streaming Platformconfluent
 
Up and Running with firebase
Up and Running with firebaseUp and Running with firebase
Up and Running with firebaseMd. Sadhan Sarker
 
Real Time API delivering data @ Scale
Real Time API delivering data @ ScaleReal Time API delivering data @ Scale
Real Time API delivering data @ ScaleAkash Mishra
 
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDealIcinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDealIcinga
 
02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroductionGeoMedeelel
 

What's hot (20)

Monitoring kubernetes wwith prometheus and grafana azure singapore - 19 aug...
Monitoring kubernetes wwith prometheus and grafana   azure singapore - 19 aug...Monitoring kubernetes wwith prometheus and grafana   azure singapore - 19 aug...
Monitoring kubernetes wwith prometheus and grafana azure singapore - 19 aug...
 
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
 
WSO2Con ASIA 2016: API Driven Innovation Within the Enterprise
WSO2Con ASIA 2016: API Driven Innovation Within the EnterpriseWSO2Con ASIA 2016: API Driven Innovation Within the Enterprise
WSO2Con ASIA 2016: API Driven Innovation Within the Enterprise
 
Icinga Camp Bangalore - Enterprise exceptions
Icinga Camp Bangalore - Enterprise exceptions Icinga Camp Bangalore - Enterprise exceptions
Icinga Camp Bangalore - Enterprise exceptions
 
Monitoring cloud applications and containers
Monitoring cloud applications and containersMonitoring cloud applications and containers
Monitoring cloud applications and containers
 
Build Your Own Recommendation Engine
Build Your Own Recommendation EngineBuild Your Own Recommendation Engine
Build Your Own Recommendation Engine
 
Big data and non relational database
Big data and non relational databaseBig data and non relational database
Big data and non relational database
 
JEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldJEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java World
 
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
 
Extending KEDA with External Scalers
Extending KEDA with External ScalersExtending KEDA with External Scalers
Extending KEDA with External Scalers
 
Google Charts for native Android apps
Google Charts for native Android appsGoogle Charts for native Android apps
Google Charts for native Android apps
 
Cis 528presentation final
Cis 528presentation finalCis 528presentation final
Cis 528presentation final
 
Cis 528 big data
Cis 528 big dataCis 528 big data
Cis 528 big data
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 
01 supermapiportaloverview
01 supermapiportaloverview01 supermapiportaloverview
01 supermapiportaloverview
 
Kafka Summit NYC 2017 - The Rise of the Streaming Platform
Kafka Summit NYC 2017 - The Rise of the Streaming PlatformKafka Summit NYC 2017 - The Rise of the Streaming Platform
Kafka Summit NYC 2017 - The Rise of the Streaming Platform
 
Up and Running with firebase
Up and Running with firebaseUp and Running with firebase
Up and Running with firebase
 
Real Time API delivering data @ Scale
Real Time API delivering data @ ScaleReal Time API delivering data @ Scale
Real Time API delivering data @ Scale
 
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDealIcinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
 
02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction
 

Similar to Spark meetup2 final (Taboola)

Spark Magic Building and Deploying a High Scale Product in 4 Months
Spark Magic Building and Deploying a High Scale Product in 4 MonthsSpark Magic Building and Deploying a High Scale Product in 4 Months
Spark Magic Building and Deploying a High Scale Product in 4 Monthstsliwowicz
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkBurak Yavuz
 
Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...
Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...
Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...HostedbyConfluent
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
Spark Hsinchu meetup
Spark Hsinchu meetupSpark Hsinchu meetup
Spark Hsinchu meetupYung-An He
 
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...Lillian Pierson
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformYao Yao
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operationsStepan Pushkarev
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Pavel Hardak
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Eren Avşaroğulları
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL PerformanceTakuya UESHIN
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architectureStepan Pushkarev
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Christopher Gutknecht
 
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in dayCherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in dayVishal Pawar
 
Desenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open Source
Desenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open SourceDesenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open Source
Desenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open SourceRodrigo Kono
 
SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?Nicolas Georgeault
 
Knowage roadmap-2022 (1)
Knowage roadmap-2022 (1)Knowage roadmap-2022 (1)
Knowage roadmap-2022 (1)KNOWAGE
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSpark Summit
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkTaras Matyashovsky
 

Similar to Spark meetup2 final (Taboola) (20)

Spark Magic Building and Deploying a High Scale Product in 4 Months
Spark Magic Building and Deploying a High Scale Product in 4 MonthsSpark Magic Building and Deploying a High Scale Product in 4 Months
Spark Magic Building and Deploying a High Scale Product in 4 Months
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
 
Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...
Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...
Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Spark Hsinchu meetup
Spark Hsinchu meetupSpark Hsinchu meetup
Spark Hsinchu meetup
 
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL Performance
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
 
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in dayCherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
 
Desenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open Source
Desenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open SourceDesenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open Source
Desenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open Source
 
SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?
 
Knowage roadmap-2022 (1)
Knowage roadmap-2022 (1)Knowage roadmap-2022 (1)
Knowage roadmap-2022 (1)
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
 

More from tsliwowicz

Spark war stories taboola
Spark war stories taboolaSpark war stories taboola
Spark war stories taboolatsliwowicz
 
Spark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboolaSpark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboolatsliwowicz
 
Using apache spark to fight world hunger - Israel spark meetup at taboola
Using apache spark to fight world hunger - Israel spark meetup at taboolaUsing apache spark to fight world hunger - Israel spark meetup at taboola
Using apache spark to fight world hunger - Israel spark meetup at taboolatsliwowicz
 
Inneractive - Spark meetup2
Inneractive - Spark meetup2Inneractive - Spark meetup2
Inneractive - Spark meetup2tsliwowicz
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)tsliwowicz
 

More from tsliwowicz (6)

Spark war stories taboola
Spark war stories taboolaSpark war stories taboola
Spark war stories taboola
 
Spark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboolaSpark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboola
 
Using apache spark to fight world hunger - Israel spark meetup at taboola
Using apache spark to fight world hunger - Israel spark meetup at taboolaUsing apache spark to fight world hunger - Israel spark meetup at taboola
Using apache spark to fight world hunger - Israel spark meetup at taboola
 
Inneractive - Spark meetup2
Inneractive - Spark meetup2Inneractive - Spark meetup2
Inneractive - Spark meetup2
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
 

Recently uploaded

CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 

Recently uploaded (20)

CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

Spark meetup2 final (Taboola)

  • 2. Tal Sliwowicz Director, R&D tal@taboola.com Who are we? Ruthy Goldberg Sr. Software Engineer ruthy@taboola.com
  • 3. Collaborative Filtering Bucketed Consumption Groups Geo Region-based Recommendations Context Metadata Social Facebook/Twitter API User Behavior Cookie Data Engine Focused on Maximizing CTR & Post Click Engagement
  • 4. Largest Content Discovery and Monetization Network 550MMonthly Unique Users 240BMonthly Recommendations 10B+Daily User Events 5TB+Incoming Daily Data
  • 5. • Using Spark in production since v0.8 • 6 Data Centers across the globe • Dedicated Spark & Cassandra (for spark) cluster consists of – 5000+ cores with 35TB of RAM memory and ~1PB of SSD local storage, across 2 Data Centers. • Data must be processed and analyzed in real time, for example: – Real-time, per user content recommendations – Real-time expenditure reports – Automated campaign management – Automated recommendation algorithms calibration – Real-time analytics What Does it Mean?
  • 6. SPARK SUMMIT SF 2015 Highlights
  • 7. • Spark DataFrames: Simple and Fast Analysis of Structured Data https://spark-summit.org/2015/events/spark-dataframes-simple-and-fast-analysis- of-structured-data/ DataFrames
  • 8.
  • 9.
  • 10. • From DataFrames to Tungsten: A Peek into Spark's Future https://spark-summit.org/2015/events/keynote-9/ • Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal https://spark-summit.org/2015/events/deep-dive-into-project-tungsten-bringing- spark-closer-to-bare-metal/ Tungsten
  • 11.
  • 12.
  • 13.
  • 14. • Spark and Spark Streaming at Netflix https://spark-summit.org/2015/events/spark-and-spark-streaming-at-netflix/ Interesting Users’ Experience - Netflix
  • 15.
  • 16. • How Spark Fits into Baidu's Scale https://spark-summit.org/2015/events/keynote-10/ Interesting Users’ Experience - Baidu
  • 17.
  • 18. • Recipes for Running Spark Streaming Applications in Production https://spark-summit.org/2015/events/recipes-for-running-spark-streaming- applications-in-production/ Databricks Practical Talks – Spark Streaming
  • 19.
  • 20. • Building, Debugging, and Tuning Spark Machine Learning Pipelines https://spark-summit.org/2015/events/practical-machine-learning-pipelines-with- mllib-2/ Databricks Practical Talks – Machine Learning
  • 21.
  • 22. • Making Sense of Spark Performance https://spark-summit.org/2015/events/making-sense-of-spark-performance/ • Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing https://spark-summit.org/2015/events/taming-gc-pauses-for-humongous-java- heaps-in-spark-graph-computing/ • IndexedRDD: Efficient Fine-Grained Updates for RDDs https://spark-summit.org/2015/events/indexedrdd-efficient-fine-grained-updates- for-rdds/ Performance
  • 23. • All Spark summit videos and presentations can be found here https://spark-summit.org/2015/ Summary
  • 24. USING SPARK AND C* TOGETHER FOR DATA ANALYSIS USING DATA FRAMES AND ZEPPELIN
  • 27. Main Taboola’s framework classes: – CassandraTableSchemaProvider – CassandraDataLoader Cassandra Table  Spark DataFrame
  • 32. Main Taboola’s framework classes: – CassandraTableSchemaProvider – CassandraDataLoader Cassandra Table  Spark DataFrame
  • 33. Mysql Table  Spark DataFrame
  • 35. Loading Our Code Into Zeppelin
  • 36. SparkContext, SQLContext, ZeppelinContext are automatically created and exposed as variable names 'sc', 'sqlContext' and 'z', respectively, both in scala and python environments. General Variables In Zeppelin
  • 38. DEMO
  • 39. • Connect Zeppelin to the cluster (not standalone) • Load raw sessions data • Run code (python/scala) for algorithmic analysis Zeppelin @Taboola - What’s next?

Editor's Notes

  1. Tungsten motivation – CPU stayed the same for the last 10 years, so need to optimize code (1) Runtime code generation (2)  Exploiting cache locality (3)  Off-heap memory management
  2. ----- Meeting Notes (2/8/15 22:08) ----- Dont forget Spark 1.4 + Kafka
  3. ----- Meeting Notes (3/8/15 14:00) ----- Lior Chaga