SlideShare a Scribd company logo
1 of 40
Spark Meetup #2
Tal Sliwowicz
Director, R&D
tal@taboola.com
Who are we?
Ruthy Goldberg
Sr. Software Engineer
ruthy@taboola.com
Collaborative Filtering
Bucketed Consumption Groups
Geo
Region-based
Recommendations
Context
Metadata
Social
Facebook/Twitter API
User Behavior
Cookie Data
Engine Focused on Maximizing CTR & Post Click Engagement
Largest Content Discovery and
Monetization Network
550MMonthly Unique
Users
240BMonthly
Recommendations
10B+Daily User Events
5TB+Incoming Daily Data
• Using Spark in production since v0.8
• 6 Data Centers across the globe
• Dedicated Spark & Cassandra (for spark) cluster consists of
– 5000+ cores with 35TB of RAM memory and ~1PB of SSD local
storage, across 2 Data Centers.
• Data must be processed and analyzed in real time, for example:
– Real-time, per user content recommendations
– Real-time expenditure reports
– Automated campaign management
– Automated recommendation algorithms calibration
– Real-time analytics
What Does it Mean?
SPARK SUMMIT SF 2015
Highlights
• Spark DataFrames: Simple and Fast Analysis of
Structured Data
https://spark-summit.org/2015/events/spark-dataframes-simple-and-fast-analysis-
of-structured-data/
DataFrames
• From DataFrames to Tungsten: A Peek into Spark's
Future
https://spark-summit.org/2015/events/keynote-9/
• Deep Dive into Project Tungsten: Bringing Spark
Closer to Bare Metal
https://spark-summit.org/2015/events/deep-dive-into-project-tungsten-bringing-
spark-closer-to-bare-metal/
Tungsten
• Spark and Spark Streaming at Netflix
https://spark-summit.org/2015/events/spark-and-spark-streaming-at-netflix/
Interesting Users’ Experience - Netflix
• How Spark Fits into Baidu's Scale
https://spark-summit.org/2015/events/keynote-10/
Interesting Users’ Experience - Baidu
• Recipes for Running Spark Streaming Applications in
Production
https://spark-summit.org/2015/events/recipes-for-running-spark-streaming-
applications-in-production/
Databricks Practical Talks – Spark Streaming
• Building, Debugging, and Tuning Spark Machine
Learning Pipelines
https://spark-summit.org/2015/events/practical-machine-learning-pipelines-with-
mllib-2/
Databricks Practical Talks – Machine Learning
• Making Sense of Spark Performance
https://spark-summit.org/2015/events/making-sense-of-spark-performance/
• Taming GC Pauses for Humongous Java Heaps in
Spark Graph Computing
https://spark-summit.org/2015/events/taming-gc-pauses-for-humongous-java-
heaps-in-spark-graph-computing/
• IndexedRDD: Efficient Fine-Grained Updates for
RDDs
https://spark-summit.org/2015/events/indexedrdd-efficient-fine-grained-updates-
for-rdds/
Performance
• All Spark summit videos and presentations can be
found here https://spark-summit.org/2015/
Summary
USING SPARK AND C* TOGETHER
FOR DATA ANALYSIS USING DATA
FRAMES AND ZEPPELIN
Newsroom Dashboard
Cassandra
Main Taboola’s framework classes:
– CassandraTableSchemaProvider
– CassandraDataLoader
Cassandra Table  Spark DataFrame
CassandraTableSchemaProvider
Getting Cassandra Metadata
DF From Cassandra
CassandraDataLoader
CassandraDataLoader
Main Taboola’s framework classes:
– CassandraTableSchemaProvider
– CassandraDataLoader
Cassandra Table  Spark DataFrame
Mysql Table  Spark DataFrame
Zeppelin
Loading Our Code Into Zeppelin
SparkContext, SQLContext, ZeppelinContext are
automatically created and exposed as variable names
'sc', 'sqlContext' and 'z', respectively, both in scala and
python environments.
General Variables In Zeppelin
Executing Staging Code
DEMO
• Connect Zeppelin to the cluster (not
standalone)
• Load raw sessions data
• Run code (python/scala) for algorithmic
analysis
Zeppelin @Taboola - What’s next?
tal@taboola.com
ruthy@taboola.com
Thank You!

More Related Content

What's hot

Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
HostedbyConfluent
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Databricks
 

What's hot (20)

Monitoring kubernetes wwith prometheus and grafana azure singapore - 19 aug...
Monitoring kubernetes wwith prometheus and grafana   azure singapore - 19 aug...Monitoring kubernetes wwith prometheus and grafana   azure singapore - 19 aug...
Monitoring kubernetes wwith prometheus and grafana azure singapore - 19 aug...
 
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
Server Sent Events using Reactive Kafka and Spring Web flux | Gagan Solur Ven...
 
WSO2Con ASIA 2016: API Driven Innovation Within the Enterprise
WSO2Con ASIA 2016: API Driven Innovation Within the EnterpriseWSO2Con ASIA 2016: API Driven Innovation Within the Enterprise
WSO2Con ASIA 2016: API Driven Innovation Within the Enterprise
 
Icinga Camp Bangalore - Enterprise exceptions
Icinga Camp Bangalore - Enterprise exceptions Icinga Camp Bangalore - Enterprise exceptions
Icinga Camp Bangalore - Enterprise exceptions
 
Monitoring cloud applications and containers
Monitoring cloud applications and containersMonitoring cloud applications and containers
Monitoring cloud applications and containers
 
Build Your Own Recommendation Engine
Build Your Own Recommendation EngineBuild Your Own Recommendation Engine
Build Your Own Recommendation Engine
 
Big data and non relational database
Big data and non relational databaseBig data and non relational database
Big data and non relational database
 
JEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldJEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java World
 
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
 
Extending KEDA with External Scalers
Extending KEDA with External ScalersExtending KEDA with External Scalers
Extending KEDA with External Scalers
 
Google Charts for native Android apps
Google Charts for native Android appsGoogle Charts for native Android apps
Google Charts for native Android apps
 
Cis 528presentation final
Cis 528presentation finalCis 528presentation final
Cis 528presentation final
 
Cis 528 big data
Cis 528 big dataCis 528 big data
Cis 528 big data
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 
01 supermapiportaloverview
01 supermapiportaloverview01 supermapiportaloverview
01 supermapiportaloverview
 
Kafka Summit NYC 2017 - The Rise of the Streaming Platform
Kafka Summit NYC 2017 - The Rise of the Streaming PlatformKafka Summit NYC 2017 - The Rise of the Streaming Platform
Kafka Summit NYC 2017 - The Rise of the Streaming Platform
 
Up and Running with firebase
Up and Running with firebaseUp and Running with firebase
Up and Running with firebase
 
Real Time API delivering data @ Scale
Real Time API delivering data @ ScaleReal Time API delivering data @ Scale
Real Time API delivering data @ Scale
 
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDealIcinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
 
02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction
 

Similar to Spark meetup2 final (Taboola)

Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
Pavel Hardak
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Eren Avşaroğulları
 

Similar to Spark meetup2 final (Taboola) (20)

Spark Magic Building and Deploying a High Scale Product in 4 Months
Spark Magic Building and Deploying a High Scale Product in 4 MonthsSpark Magic Building and Deploying a High Scale Product in 4 Months
Spark Magic Building and Deploying a High Scale Product in 4 Months
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
 
Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...
Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...
Stateful Microservices with Apache Kafka and Spring Cloud Stream with Jan Svo...
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Spark Hsinchu meetup
Spark Hsinchu meetupSpark Hsinchu meetup
Spark Hsinchu meetup
 
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL Performance
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
 
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in dayCherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
Cherokee nation 2 day AIAD & DIAD - App in a day and Dashboard in day
 
Desenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open Source
Desenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open SourceDesenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open Source
Desenvolvimento .NET no Linux. Veja porque a Microsoft ama Linux e Open Source
 
SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?
 
Knowage roadmap-2022 (1)
Knowage roadmap-2022 (1)Knowage roadmap-2022 (1)
Knowage roadmap-2022 (1)
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache SparkIntroduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
 

More from tsliwowicz

Spark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboolaSpark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboola
tsliwowicz
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
tsliwowicz
 

More from tsliwowicz (6)

Spark war stories taboola
Spark war stories taboolaSpark war stories taboola
Spark war stories taboola
 
Spark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboolaSpark on Dataproc - Israel Spark Meetup at taboola
Spark on Dataproc - Israel Spark Meetup at taboola
 
Using apache spark to fight world hunger - Israel spark meetup at taboola
Using apache spark to fight world hunger - Israel spark meetup at taboolaUsing apache spark to fight world hunger - Israel spark meetup at taboola
Using apache spark to fight world hunger - Israel spark meetup at taboola
 
Inneractive - Spark meetup2
Inneractive - Spark meetup2Inneractive - Spark meetup2
Inneractive - Spark meetup2
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
 

Recently uploaded

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 

Recently uploaded (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 

Spark meetup2 final (Taboola)

  • 2. Tal Sliwowicz Director, R&D tal@taboola.com Who are we? Ruthy Goldberg Sr. Software Engineer ruthy@taboola.com
  • 3. Collaborative Filtering Bucketed Consumption Groups Geo Region-based Recommendations Context Metadata Social Facebook/Twitter API User Behavior Cookie Data Engine Focused on Maximizing CTR & Post Click Engagement
  • 4. Largest Content Discovery and Monetization Network 550MMonthly Unique Users 240BMonthly Recommendations 10B+Daily User Events 5TB+Incoming Daily Data
  • 5. • Using Spark in production since v0.8 • 6 Data Centers across the globe • Dedicated Spark & Cassandra (for spark) cluster consists of – 5000+ cores with 35TB of RAM memory and ~1PB of SSD local storage, across 2 Data Centers. • Data must be processed and analyzed in real time, for example: – Real-time, per user content recommendations – Real-time expenditure reports – Automated campaign management – Automated recommendation algorithms calibration – Real-time analytics What Does it Mean?
  • 6. SPARK SUMMIT SF 2015 Highlights
  • 7. • Spark DataFrames: Simple and Fast Analysis of Structured Data https://spark-summit.org/2015/events/spark-dataframes-simple-and-fast-analysis- of-structured-data/ DataFrames
  • 8.
  • 9.
  • 10. • From DataFrames to Tungsten: A Peek into Spark's Future https://spark-summit.org/2015/events/keynote-9/ • Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal https://spark-summit.org/2015/events/deep-dive-into-project-tungsten-bringing- spark-closer-to-bare-metal/ Tungsten
  • 11.
  • 12.
  • 13.
  • 14. • Spark and Spark Streaming at Netflix https://spark-summit.org/2015/events/spark-and-spark-streaming-at-netflix/ Interesting Users’ Experience - Netflix
  • 15.
  • 16. • How Spark Fits into Baidu's Scale https://spark-summit.org/2015/events/keynote-10/ Interesting Users’ Experience - Baidu
  • 17.
  • 18. • Recipes for Running Spark Streaming Applications in Production https://spark-summit.org/2015/events/recipes-for-running-spark-streaming- applications-in-production/ Databricks Practical Talks – Spark Streaming
  • 19.
  • 20. • Building, Debugging, and Tuning Spark Machine Learning Pipelines https://spark-summit.org/2015/events/practical-machine-learning-pipelines-with- mllib-2/ Databricks Practical Talks – Machine Learning
  • 21.
  • 22. • Making Sense of Spark Performance https://spark-summit.org/2015/events/making-sense-of-spark-performance/ • Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing https://spark-summit.org/2015/events/taming-gc-pauses-for-humongous-java- heaps-in-spark-graph-computing/ • IndexedRDD: Efficient Fine-Grained Updates for RDDs https://spark-summit.org/2015/events/indexedrdd-efficient-fine-grained-updates- for-rdds/ Performance
  • 23. • All Spark summit videos and presentations can be found here https://spark-summit.org/2015/ Summary
  • 24. USING SPARK AND C* TOGETHER FOR DATA ANALYSIS USING DATA FRAMES AND ZEPPELIN
  • 27. Main Taboola’s framework classes: – CassandraTableSchemaProvider – CassandraDataLoader Cassandra Table  Spark DataFrame
  • 32. Main Taboola’s framework classes: – CassandraTableSchemaProvider – CassandraDataLoader Cassandra Table  Spark DataFrame
  • 33. Mysql Table  Spark DataFrame
  • 35. Loading Our Code Into Zeppelin
  • 36. SparkContext, SQLContext, ZeppelinContext are automatically created and exposed as variable names 'sc', 'sqlContext' and 'z', respectively, both in scala and python environments. General Variables In Zeppelin
  • 38. DEMO
  • 39. • Connect Zeppelin to the cluster (not standalone) • Load raw sessions data • Run code (python/scala) for algorithmic analysis Zeppelin @Taboola - What’s next?

Editor's Notes

  1. Tungsten motivation – CPU stayed the same for the last 10 years, so need to optimize code (1) Runtime code generation (2)  Exploiting cache locality (3)  Off-heap memory management
  2. ----- Meeting Notes (2/8/15 22:08) ----- Dont forget Spark 1.4 + Kafka
  3. ----- Meeting Notes (3/8/15 14:00) ----- Lior Chaga