This talk is detailed extension of our Spark Summit East talk on the same topic. We will review the hurdles we faced and the work arounds we developed with the help from Databricks support in our attempt to build a custom machine learning model and use it to predict the TV ratings for different networks and demographics. Attendees should leave this session with enough knowledge to recognize situations where our method would be applicable to implement it.
Specifically, we dig into the details of the data characteristics that make our problem inherently challenging and how we compose existing tools in the ML and Dataframes APIs to create a machine learning pipeline capable of learning real valued vector labels despite relatively low dimensional feature spaces. Our deep dive will include feature engineering techniques employed, the reference architecture for our n-dimensional regression technique and the extra data formatting steps required for applying the built in evaluator tools to n-dimensional models.
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
1. Michael Zargham, Director Data Science
Stefan Panayotov, Senior Data Engineer
Cadent
SPARK ML WITH HIGH
DIMENSIONAL LABELS
2. 2
Vision
Unified Self Service Media Monetization
Platform for all TV inventory
Our Team has cutting edge expertise
• Hybrid cloud Apache Spark infrastructure
• Analytical rather than rule driven algorithms
• Experience with Machine Learning APIs and
custom mathematics in decision optimization
Cadent has built a
bicoastal
data science and
engineering team
Cadent: Data Empowered Television Advertising
Data Technology Company specializing in Television Advertising
3. 3
Cadent: Data Empowered Television Advertising
Data Technology Company specializing in Television Advertising
Infrastructure
Engineering
Science
Analytics
Decisions
D
A
T
A