SlideShare a Scribd company logo
1 of 28
Download to read offline
Michael Zargham, Director Data Science
Stefan Panayotov, Senior Data Engineer
Cadent
SPARK ML WITH HIGH
DIMENSIONAL LABELS
2
Vision
Unified Self Service Media Monetization
Platform for all TV inventory
Our Team has cutting edge expertise
• Hybrid cloud Apache Spark infrastructure
• Analytical rather than rule driven algorithms
• Experience with Machine Learning APIs and
custom mathematics in decision optimization
Cadent has built a
bicoastal
data science and
engineering team
Cadent: Data Empowered Television Advertising
Data Technology Company specializing in Television Advertising
3
Cadent: Data Empowered Television Advertising
Data Technology Company specializing in Television Advertising
Infrastructure
Engineering
Science
Analytics
Decisions
D
A
T
A
4© 2017 Cadent. All rights reserved.
Cadent TV Buying Platform
Business Intelligence Analytic Dashboards
Business Level KPIsCampaign Delivery Reporting
Advanced	Platform	API
Addressable TV
Buying/Planning	UI
Broadcast
3rd Party	DSPs
Agencies	/	Programmatic	Clients
- Audience	Indexing
- Audience	Targeting
Linear Cable
Cable	APIBroadcast	API
Master	API
Audience	Planning	API
Inventory	Availability	Forecasting	and	Pricing	
3rd	party	Data	Services
- Inventory	Forecasting
- Yield	Management
- Audience	Accounting
- Delivery	Projections
- Integrated	Data	Hub
Data	Science
5© 2017 Cadent. All rights reserved.
Zoom In: Local Linear Cable
§ Agency
− Buys Media to Run ads for National advertisers
§ Impressions
− “eyeballs” currency of brand advertising
§ Cable Operator
− Media Companies providing TV service
− Loosely segregated Geographically
§ Subscribers
− Consumers of Cable Operator Services
§ Ratings
− Fraction of Subscribers tuned in
− Not known until after the fact
− Ill conditioned: log-scale variance
− O(10k) dimensions: variation in Demographics,
geography and television content
𝐼" = $ $ $ 𝑅&,(,)," ∗ 𝑆&,(,),"
)∈)./0(∈10)&∈203
6© 2017 Cadent. All rights reserved.
§ Relevant Time Scales
− Weather-like View
§ Shows
§ Twitter trends
§ Spectacle Events
− Climate-like View
§ Seasonality
§ Subscriber trends
§ Daypart Variation
§ Why High Dimensional?
− In the climate view the features represent
days but there significant intraday patterns
− Pivot the daily pattern into vectors so that
ML models can directly capture statistical
correlations
Models for Television Ratings
7© 2017 Cadent. All rights reserved.
A sense of Daily Patterns: Log Mean & Variance
Values shown in Log-like coordinate system:
value 0 = rating 0
value 3 = rating 10^(-5)
value 5 = rating 10^(-3)
mean variance
8© 2017 Cadent. All rights reserved.
An Intuitive Coordinate System for Human
Interpretation of Ratings Data
This coordinate system is used to eliminate bias in error metrics,
In the domain the errors in large value ratings swamp those of small value ratings
DISCUSSION OUTLINE
01 THEORY
• Math
02 PRACTICE
• Code
THEORY: Reframe the Problem with Math
11© 2017 Cadent. All rights reserved.
Mathemagic… AKA Linear Algebra
LocalMeansasaLinearTransform
PCA reference: https://arxiv.org/pdf/1404.1100.pdf
InterpretingPCAasChangeofBasis
12© 2017 Cadent. All rights reserved.
Principal Components
& Captured Variance
Warning:
Uncaptured Variance is strictly lost from
the predictive model
13© 2017 Cadent. All rights reserved.
§ The noise reduction and correlations between values captured by
reducing to principal components adds more value than variance lost
§ Apache Spark ML API doesn’t support n-Dimensional regression so k
dimensional regression is computationally efficient for k<<n
§ Since the 95% of the variance is captured by only the first few principal
components there is little to no loss in modelling accuracy (we’ll come
back to this)
§ Independent Component Analysis (ICA) would be even better than PCA
because value chained regressors are treated as independent variables.
ICA not yet available in Spark ML
Why Reduce Label Dimension?
14© 2017 Cadent. All rights reserved.
r𝑐6
Local Means Coordinate Transform
Features
• unique combinations of
targetable
characteristics
• Network, Age, Gender,
Category, Season, etc
QuarterHourofDay
R:RatingforQuarterHour
𝑅7 = 𝑇 𝑅 :		
;<r 𝑐
r 𝑐
∀𝑟
Features
• unique combinations of
targetable
characteristics
• Network, Age, Gender,
Category, Season, etc
QuarterHourofDay
R’:RatingforQuarterHour
Chosen
Feature
based
Context
“c”
QuarterHour
MeanRating
GB:
c
Store values 𝑐: r𝑐6
15© 2017 Cadent. All rights reserved.
Pivot our Data into to Vectors (day profiles)
Features
• unique combinations of
targetable
characteristics
• Network, Age, Gender,
Category, Season, etc
QuarterHourofDay
R’:RatingforQuarterHour
Features
• unique combinations of
targetable
characteristics
• Network, Age, Gender,
Category, Season, etc
Quarter Hour of Day
Rating Vectors
• 96 positive real values
16© 2017 Cadent. All rights reserved.
Label Dimensionality Reduction
Features
• unique combinations of
targetable
characteristics
• Network, Age, Gender,
Category, Season, etc
Quarter Hour of Day
Rating Vectors
• 96 positive real values PCA(k)
Features
• unique combinations of
targetable
characteristics
• Network, Age, Gender,
Category, Season, etc
Coef of
Principals
• k real values
Component
17© 2017 Cadent. All rights reserved.
Principal Component Analysis (PCA): to reduce the
dimensionality of the problem
Features
• unique combinations of
targetable
characteristics
• Network, Age, Gender,
Category, Season, etc.
Features
• unique combinations of
targetable
characteristics
• Network, Age, Gender,
Category, Season, etc.
Component1
Componentk
…
Train Regressor 1 Train Regressor k
PCA
Transform
PCA
pseudo-
inverse
Pipeline of k single label regression models
Vector
Disassemble
Vector
Assembler
PRACTICE: Code Demonstration
19© 2017 Cadent. All rights reserved.
§ Data	preparation
− Ratings	Local	Coordinate	Transformation
− Vectorization
§ ML	Pipeline	creation	and	Execution
− Custom Transformers and Estimators
§ DropColumnsStage
§ PCA2 (show the Scala and pyspark versions)
− Custom UDFs
§ Used for Vector Disassembler
§ Used for Pseudo-inverse PCA
− Train	&	Test
§ Undo	custom	Coordinate	transforms	to	evaluate
§ Results
− Show	Code	for	Model	evaluation
− Review	some	Graphical	results
§ Aside:
− PySpark Version
Technical Breakout to DataBricks Notebook
20© 2017 Cadent. All rights reserved.
Preliminary Experimental Results
§ 20 workers each
with 60 GB RAM
and 8 cores
§ Auto-scaling on
Data= ~6 Million (96 dimensional Vectors) and associated features
Data Size: 6.6 GB
21© 2017 Cadent. All rights reserved.
§ Support for arbitrary Regression model class
− Comparable results with Gradient Boosted Trees and Random
Forests
§ Daily Ratings Phenomenon is actually low Rank
− No Measureable loss in accuracy as we reduce from K=12 to
K=5 for the n=96 dimensional version
− No Measureable loss in accuracy as we reduce from K=12 to
K=2 for the n=12 dimensional version
§ Reducing the number of dimensions saves
runtime
− More experiments are needed but preliminary results are still
significant
§ Inclusion of PCA when K=n provides no
measureable improvement
− Tested with K=n=12
− Would like to test with ICA instead of PCA
§ Next Steps:
− More exhaustive repeated trials for run times
− Further reduce K until we start to see performance degrade
Insights from Initial Results
22© 2017 Cadent. All rights reserved.
Monitoring & Maintaining
Machine Learning Models
22
23© 2017 Cadent. All rights reserved.
Validation and Verification
IEEE 1012-2012 standard
definition of Validation and Verification
Validation: The assurance that a product, service, or
system meets the needs of the customer and other
identified stakeholders. It often involves acceptance
and suitability with external customers.
Verification: The evaluation of whether or not a
product, service, or system complies with a regulation,
requirement, specification, or imposed condition.
It is often an internal process.
System design, implementation and test (QA) are only
part of the system acquisition process
What did it do?
How well did it do it?
24© 2017 Cadent. All rights reserved.
Validation of Ratings via Performance Dashboard
25© 2017 Cadent. All rights reserved.
Verification of Ratings via Delivery KPI Dashboard
WRAP UP
27© 2017 Cadent. All rights reserved.
§ Video on Demand Campaign
Management
− Forecasting Supply, Demand and Competition
− Yield Management: Dynamic Inflight
Optimization (feedback controller)
− Pricing and Packaging of Inventory
§ Extending Linear Advertising
− Targeted Advertising Insertions in Linear
Cable
− Addressable Backfill for Linear Cable
− Multicast Advertising on Broadcast Stations
§ Unified Unicast/Multicast Advertising
− Cross Platform Audience Based Planning
− Flexible Hybrid-cloud Data Platform
Other Projects
Thank You.
Michael Zargham mzargham@cadent.tv
Stefan Panayotov spanayotov@cadent.tv

More Related Content

What's hot

Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsDatabricks
 
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...Databricks
 
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...Databricks
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Spark Summit
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Databricks
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataDatabricks
 
DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache SparkDatabricks
 
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...Spark Summit
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkLi Jin
 
Spark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena LazovikSpark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena LazovikSpark Summit
 
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...Databricks
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopDataWorks Summit
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeDatabricks
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkDatabricks
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkDatabricks
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark Summit
 
Spark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick PentreathSpark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick PentreathSpark Summit
 
Extending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkExtending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkDatabricks
 

What's hot (20)

Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new Directions
 
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
 
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache Spark
 
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
Parallelizing Large Simulations with Apache SparkR with Daniel Jeavons and Wa...
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
 
Spark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena LazovikSpark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena Lazovik
 
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van Hovell
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
 
Spark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick PentreathSpark Summit EU talk by Nick Pentreath
Spark Summit EU talk by Nick Pentreath
 
Extending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkExtending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySpark
 

Similar to Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov

1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterprisePivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterpriseVMware Tanzu
 
Resume anh chu data analyst
Resume anh chu data analystResume anh chu data analyst
Resume anh chu data analystANH CHU
 
A Journey to Profitability with Oracle PCMCS
A Journey to Profitability with Oracle PCMCSA Journey to Profitability with Oracle PCMCS
A Journey to Profitability with Oracle PCMCSAlithya
 
ODTUG NYC Meetup 2017 – PCMCS and ITFM
ODTUG NYC Meetup 2017 – PCMCS and ITFMODTUG NYC Meetup 2017 – PCMCS and ITFM
ODTUG NYC Meetup 2017 – PCMCS and ITFMJoseph Alaimo Jr
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solutionRevolution Analytics
 
David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...
David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...
David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...Codemotion
 
Anirban chakraborty(m eng., cssgb)
Anirban chakraborty(m eng., cssgb)Anirban chakraborty(m eng., cssgb)
Anirban chakraborty(m eng., cssgb)Anirban Chakraborty
 
Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...
Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...
Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...Nesma
 
Sr. QA Eng. with 8.2+ Yrs of Exp.in ERP_Manual_ Functional _System_Integrat...
Sr. QA Eng. with  8.2+  Yrs of Exp.in ERP_Manual_ Functional _System_Integrat...Sr. QA Eng. with  8.2+  Yrs of Exp.in ERP_Manual_ Functional _System_Integrat...
Sr. QA Eng. with 8.2+ Yrs of Exp.in ERP_Manual_ Functional _System_Integrat...vaibhav pawar
 
Smashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and SnowflakeSmashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and SnowflakeSamanthaBerlant
 
The Future of Design-to-Deliver Supply Chains - 56560
The Future of Design-to-Deliver Supply Chains - 56560The Future of Design-to-Deliver Supply Chains - 56560
The Future of Design-to-Deliver Supply Chains - 56560SAP Ariba Live 2018
 
3DCS Dimensional Variation Analysis Integrated in Siemens NX CAD
3DCS Dimensional Variation Analysis Integrated in Siemens NX CAD3DCS Dimensional Variation Analysis Integrated in Siemens NX CAD
3DCS Dimensional Variation Analysis Integrated in Siemens NX CADBenjamin Reese
 
Case Study: How Caixa Econômica in Brazil Uses IBM® Rational® Insight and Per...
Case Study: How Caixa Econômica in Brazil Uses IBM® Rational® Insight and Per...Case Study: How Caixa Econômica in Brazil Uses IBM® Rational® Insight and Per...
Case Study: How Caixa Econômica in Brazil Uses IBM® Rational® Insight and Per...Paulo Lacerda
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Paul Brebner
 
Resume anh chu
Resume anh chuResume anh chu
Resume anh chuANH CHU
 
Wayne State University & DataStax: World's best data modeling tool for Apache...
Wayne State University & DataStax: World's best data modeling tool for Apache...Wayne State University & DataStax: World's best data modeling tool for Apache...
Wayne State University & DataStax: World's best data modeling tool for Apache...DataStax Academy
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computingBAINIDA
 

Similar to Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov (20)

1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterprisePivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
 
Resume anh chu data analyst
Resume anh chu data analystResume anh chu data analyst
Resume anh chu data analyst
 
A Journey to Profitability with Oracle PCMCS
A Journey to Profitability with Oracle PCMCSA Journey to Profitability with Oracle PCMCS
A Journey to Profitability with Oracle PCMCS
 
ODTUG NYC Meetup 2017 – PCMCS and ITFM
ODTUG NYC Meetup 2017 – PCMCS and ITFMODTUG NYC Meetup 2017 – PCMCS and ITFM
ODTUG NYC Meetup 2017 – PCMCS and ITFM
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solution
 
David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...
David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...
David García, Rubén Aguilera Díaz-Heredero | A microservices experience in th...
 
Anirban chakraborty(m eng., cssgb)
Anirban chakraborty(m eng., cssgb)Anirban chakraborty(m eng., cssgb)
Anirban chakraborty(m eng., cssgb)
 
Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...
Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...
Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...
 
Sr. QA Eng. with 8.2+ Yrs of Exp.in ERP_Manual_ Functional _System_Integrat...
Sr. QA Eng. with  8.2+  Yrs of Exp.in ERP_Manual_ Functional _System_Integrat...Sr. QA Eng. with  8.2+  Yrs of Exp.in ERP_Manual_ Functional _System_Integrat...
Sr. QA Eng. with 8.2+ Yrs of Exp.in ERP_Manual_ Functional _System_Integrat...
 
Smashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and SnowflakeSmashing Through Big Data Barriers with Tableau and Snowflake
Smashing Through Big Data Barriers with Tableau and Snowflake
 
Validation
ValidationValidation
Validation
 
The Future of Design-to-Deliver Supply Chains - 56560
The Future of Design-to-Deliver Supply Chains - 56560The Future of Design-to-Deliver Supply Chains - 56560
The Future of Design-to-Deliver Supply Chains - 56560
 
3DCS Dimensional Variation Analysis Integrated in Siemens NX CAD
3DCS Dimensional Variation Analysis Integrated in Siemens NX CAD3DCS Dimensional Variation Analysis Integrated in Siemens NX CAD
3DCS Dimensional Variation Analysis Integrated in Siemens NX CAD
 
Case Study: How Caixa Econômica in Brazil Uses IBM® Rational® Insight and Per...
Case Study: How Caixa Econômica in Brazil Uses IBM® Rational® Insight and Per...Case Study: How Caixa Econômica in Brazil Uses IBM® Rational® Insight and Per...
Case Study: How Caixa Econômica in Brazil Uses IBM® Rational® Insight and Per...
 
PEnDAR webinar 2 with notes
PEnDAR webinar 2 with notesPEnDAR webinar 2 with notes
PEnDAR webinar 2 with notes
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...
 
Resume anh chu
Resume anh chuResume anh chu
Resume anh chu
 
Wayne State University & DataStax: World's best data modeling tool for Apache...
Wayne State University & DataStax: World's best data modeling tool for Apache...Wayne State University & DataStax: World's best data modeling tool for Apache...
Wayne State University & DataStax: World's best data modeling tool for Apache...
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 

Recently uploaded (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov

  • 1. Michael Zargham, Director Data Science Stefan Panayotov, Senior Data Engineer Cadent SPARK ML WITH HIGH DIMENSIONAL LABELS
  • 2. 2 Vision Unified Self Service Media Monetization Platform for all TV inventory Our Team has cutting edge expertise • Hybrid cloud Apache Spark infrastructure • Analytical rather than rule driven algorithms • Experience with Machine Learning APIs and custom mathematics in decision optimization Cadent has built a bicoastal data science and engineering team Cadent: Data Empowered Television Advertising Data Technology Company specializing in Television Advertising
  • 3. 3 Cadent: Data Empowered Television Advertising Data Technology Company specializing in Television Advertising Infrastructure Engineering Science Analytics Decisions D A T A
  • 4. 4© 2017 Cadent. All rights reserved. Cadent TV Buying Platform Business Intelligence Analytic Dashboards Business Level KPIsCampaign Delivery Reporting Advanced Platform API Addressable TV Buying/Planning UI Broadcast 3rd Party DSPs Agencies / Programmatic Clients - Audience Indexing - Audience Targeting Linear Cable Cable APIBroadcast API Master API Audience Planning API Inventory Availability Forecasting and Pricing 3rd party Data Services - Inventory Forecasting - Yield Management - Audience Accounting - Delivery Projections - Integrated Data Hub Data Science
  • 5. 5© 2017 Cadent. All rights reserved. Zoom In: Local Linear Cable § Agency − Buys Media to Run ads for National advertisers § Impressions − “eyeballs” currency of brand advertising § Cable Operator − Media Companies providing TV service − Loosely segregated Geographically § Subscribers − Consumers of Cable Operator Services § Ratings − Fraction of Subscribers tuned in − Not known until after the fact − Ill conditioned: log-scale variance − O(10k) dimensions: variation in Demographics, geography and television content 𝐼" = $ $ $ 𝑅&,(,)," ∗ 𝑆&,(,)," )∈)./0(∈10)&∈203
  • 6. 6© 2017 Cadent. All rights reserved. § Relevant Time Scales − Weather-like View § Shows § Twitter trends § Spectacle Events − Climate-like View § Seasonality § Subscriber trends § Daypart Variation § Why High Dimensional? − In the climate view the features represent days but there significant intraday patterns − Pivot the daily pattern into vectors so that ML models can directly capture statistical correlations Models for Television Ratings
  • 7. 7© 2017 Cadent. All rights reserved. A sense of Daily Patterns: Log Mean & Variance Values shown in Log-like coordinate system: value 0 = rating 0 value 3 = rating 10^(-5) value 5 = rating 10^(-3) mean variance
  • 8. 8© 2017 Cadent. All rights reserved. An Intuitive Coordinate System for Human Interpretation of Ratings Data This coordinate system is used to eliminate bias in error metrics, In the domain the errors in large value ratings swamp those of small value ratings
  • 9. DISCUSSION OUTLINE 01 THEORY • Math 02 PRACTICE • Code
  • 10. THEORY: Reframe the Problem with Math
  • 11. 11© 2017 Cadent. All rights reserved. Mathemagic… AKA Linear Algebra LocalMeansasaLinearTransform PCA reference: https://arxiv.org/pdf/1404.1100.pdf InterpretingPCAasChangeofBasis
  • 12. 12© 2017 Cadent. All rights reserved. Principal Components & Captured Variance Warning: Uncaptured Variance is strictly lost from the predictive model
  • 13. 13© 2017 Cadent. All rights reserved. § The noise reduction and correlations between values captured by reducing to principal components adds more value than variance lost § Apache Spark ML API doesn’t support n-Dimensional regression so k dimensional regression is computationally efficient for k<<n § Since the 95% of the variance is captured by only the first few principal components there is little to no loss in modelling accuracy (we’ll come back to this) § Independent Component Analysis (ICA) would be even better than PCA because value chained regressors are treated as independent variables. ICA not yet available in Spark ML Why Reduce Label Dimension?
  • 14. 14© 2017 Cadent. All rights reserved. r𝑐6 Local Means Coordinate Transform Features • unique combinations of targetable characteristics • Network, Age, Gender, Category, Season, etc QuarterHourofDay R:RatingforQuarterHour 𝑅7 = 𝑇 𝑅 : ;<r 𝑐 r 𝑐 ∀𝑟 Features • unique combinations of targetable characteristics • Network, Age, Gender, Category, Season, etc QuarterHourofDay R’:RatingforQuarterHour Chosen Feature based Context “c” QuarterHour MeanRating GB: c Store values 𝑐: r𝑐6
  • 15. 15© 2017 Cadent. All rights reserved. Pivot our Data into to Vectors (day profiles) Features • unique combinations of targetable characteristics • Network, Age, Gender, Category, Season, etc QuarterHourofDay R’:RatingforQuarterHour Features • unique combinations of targetable characteristics • Network, Age, Gender, Category, Season, etc Quarter Hour of Day Rating Vectors • 96 positive real values
  • 16. 16© 2017 Cadent. All rights reserved. Label Dimensionality Reduction Features • unique combinations of targetable characteristics • Network, Age, Gender, Category, Season, etc Quarter Hour of Day Rating Vectors • 96 positive real values PCA(k) Features • unique combinations of targetable characteristics • Network, Age, Gender, Category, Season, etc Coef of Principals • k real values Component
  • 17. 17© 2017 Cadent. All rights reserved. Principal Component Analysis (PCA): to reduce the dimensionality of the problem Features • unique combinations of targetable characteristics • Network, Age, Gender, Category, Season, etc. Features • unique combinations of targetable characteristics • Network, Age, Gender, Category, Season, etc. Component1 Componentk … Train Regressor 1 Train Regressor k PCA Transform PCA pseudo- inverse Pipeline of k single label regression models Vector Disassemble Vector Assembler
  • 19. 19© 2017 Cadent. All rights reserved. § Data preparation − Ratings Local Coordinate Transformation − Vectorization § ML Pipeline creation and Execution − Custom Transformers and Estimators § DropColumnsStage § PCA2 (show the Scala and pyspark versions) − Custom UDFs § Used for Vector Disassembler § Used for Pseudo-inverse PCA − Train & Test § Undo custom Coordinate transforms to evaluate § Results − Show Code for Model evaluation − Review some Graphical results § Aside: − PySpark Version Technical Breakout to DataBricks Notebook
  • 20. 20© 2017 Cadent. All rights reserved. Preliminary Experimental Results § 20 workers each with 60 GB RAM and 8 cores § Auto-scaling on Data= ~6 Million (96 dimensional Vectors) and associated features Data Size: 6.6 GB
  • 21. 21© 2017 Cadent. All rights reserved. § Support for arbitrary Regression model class − Comparable results with Gradient Boosted Trees and Random Forests § Daily Ratings Phenomenon is actually low Rank − No Measureable loss in accuracy as we reduce from K=12 to K=5 for the n=96 dimensional version − No Measureable loss in accuracy as we reduce from K=12 to K=2 for the n=12 dimensional version § Reducing the number of dimensions saves runtime − More experiments are needed but preliminary results are still significant § Inclusion of PCA when K=n provides no measureable improvement − Tested with K=n=12 − Would like to test with ICA instead of PCA § Next Steps: − More exhaustive repeated trials for run times − Further reduce K until we start to see performance degrade Insights from Initial Results
  • 22. 22© 2017 Cadent. All rights reserved. Monitoring & Maintaining Machine Learning Models 22
  • 23. 23© 2017 Cadent. All rights reserved. Validation and Verification IEEE 1012-2012 standard definition of Validation and Verification Validation: The assurance that a product, service, or system meets the needs of the customer and other identified stakeholders. It often involves acceptance and suitability with external customers. Verification: The evaluation of whether or not a product, service, or system complies with a regulation, requirement, specification, or imposed condition. It is often an internal process. System design, implementation and test (QA) are only part of the system acquisition process What did it do? How well did it do it?
  • 24. 24© 2017 Cadent. All rights reserved. Validation of Ratings via Performance Dashboard
  • 25. 25© 2017 Cadent. All rights reserved. Verification of Ratings via Delivery KPI Dashboard
  • 27. 27© 2017 Cadent. All rights reserved. § Video on Demand Campaign Management − Forecasting Supply, Demand and Competition − Yield Management: Dynamic Inflight Optimization (feedback controller) − Pricing and Packaging of Inventory § Extending Linear Advertising − Targeted Advertising Insertions in Linear Cable − Addressable Backfill for Linear Cable − Multicast Advertising on Broadcast Stations § Unified Unicast/Multicast Advertising − Cross Platform Audience Based Planning − Flexible Hybrid-cloud Data Platform Other Projects
  • 28. Thank You. Michael Zargham mzargham@cadent.tv Stefan Panayotov spanayotov@cadent.tv