SlideShare a Scribd company logo
1 of 25
Download to read offline
User Behavior Hashing
for Audience Expansion
Praveen Pratury
Yingnan Zhu
Agenda
Praveen Pratury
▪ Overview of Samsung
▪ Samsung Audience platform
▪ Lookalike modeling introduction
Yingnan Zhu
▪ Lookalike approaches
▪ Speed up with Pandas UDF
▪ Model performance
▪ Results
▪ Q & A
Director of Engineering, Samsung
Research America
Lead Data Scientist, Samsung Research
America
Samsung Overview
Samsung Electronics Today
Samsung Audience Platform
LookAlike Modeling – Samsung Context
Improve Incremental Reach and Improved Targeting for:
▪ TV Networks (Identify new audiences to promote new shows)
▪ Samsung New TV purchases (8K, QLED, Terrace etc)U ffhoffef
The goal is to improve Reach and increase conversion for TV shows and New TV purchases
Goals
Approach
By leveraging Samsung’s rich ACR viewership data on 50+ M TVs in US and by applying User Behavior Hashing techniques:
▪ Identify TV viewers similar to existing audiences based on user behavior
- Find audiences that will respond favorably to show-specific TV ads
▪ Identify existing premium TV owners to expand to future buyer
Look Alike Audience Expansion Example
A: seed segment
B: expanded segment
* *
*
*
* *
* *
*
*
* *
* *
*
*
* *
* *
*
* *
*
* *
* * *
*
*
++ +++
+
+
+ +
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + +
++
* *
*
*
* *
* *
*
*
* *
* *
*
*
* *
* *
*
*
* *
* *
*
*
* *
* * *
*
*
*
++ +++
+
+
+ +
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + +
++
A*
B
All TV Users’ hash code space All TV Users’ hash code space
*
+
8K TV users
non 8K TV users
A
LookALike example for 8K TV campaign. The goal is to identify users who look alike to 8K TV owners, but have not owned an 8K TV yet.
Targeted size to expand
Challenges and solutions
▪ Challenges
▪ Large-scale data:
▪ Search space is huge: Hundreds of millions of Smart TV and Mobile users
▪ Efficiency:
▪ Each device could generate thousands of logs per day
▪ Look alike user retrieval has time constraint
▪ Possible solutions
▪ LSH, K-nearest neighbor, similar user search in recommender system and etc.
▪ These solutions sacrifice accuracy, efficiency, do not consider contextual information, or not optimized for time sequence data
▪ Our solution
▪ Heterogeneous user behavior hash code
▪ Provides LSH like bucketized fast searching and maintain high accuracy of user similarity
Look Alike Work Flow
Online/Offline
Offline
Raw Data
Deep Binary
Hashing Models
(various bit length)
Deep Binary
Hashing Models
(various bit length)
User Hash
Codes
Lookalike ServiceSeed
Segments
Expended
Segments
Processed
User Behavior
Data
Hashing Model Training Flow
Df
User 1
Hash LayerNetwork LayersInput Similarity Label
1
0
Similar
Dissimilar
y
x
+1
-1
SGN
User 2
y
x
+1
-1
• By given two user pair, it first generates user embedding (continuous vector)
from Network Layers. After that we make K-bit dimension from Hash Layer.
Finally we apply SGN for binary representation. The output is 1: similar, 0:
dissimilar
Approach # 1: Time-Aware Attention CNN Model
Model Explained
▪ Input layer:
▪ The input layer is the data pre-processing layer. In this layer, we will map sequential behavior data input into a 3D structure that can
be processed by CNN.
▪ The first step in our data pre-processing step is to embed each item into a D dimension vector. The next step is to sessionize user’s
history by a specific time unit (e.g., hour). For each session, we aggregate all items that the user in consideration had interacted
with using the multi-hot encoding of the corresponding items. This will represent the summary of user’s behavior for the given
session. After sessionization, we map each user’s behavior input into the high dimensional space.
▪ Embedding layer:
▪ Since the multi-hot encoding scheme used during our pre-processing step is a sparse and hand-crafted encoding scheme, it carries
more conceptual information than similarity information itself. This would affect the overall performance of TAACNN, particularly its
ability to preserve similarity information at large scale. To overcome this limitation, we introduce an embedding layer as part of our
model.
▪ Time-Aware attention layer:
▪ The time-aware attention is used to abstract time-aware attention features in our TAACNN model. This layer separates attention
features into short-term and long-term features.
Approach #2: Categorical Attention Model
Distributed Inference
▪ Issues:
▪ Large data scales and hundreds of millions user’s profile need update within limited time and computation resource
▪ Current Spark UDF is processed row-at-a-time and it won’t satisfy the requirements
▪ Need efficient distributed inference methods
▪ Solution: Pandas UDF
▪ Scalar
▪ Scalar iterator
▪ Group map
▪ Group aggregate
Code Snippet
Model Performance
▪ We used the accuracy measure as the main performance metric for all
binary hashing algorithms because each user has the identical number
of similar and dissimilar user pairs.
Conclusion
▪ A novel deep binary hashing architecture to derive similarity
preserving binary hash codes for sequential behavior data.
▪ TAACNN explores evolving user’s attention preferences across
different time awareness level separately. Experiments results show
significant over-performance compared to other well-known hashing
methods
▪ Pandas UDF improved efficiency significantly. They have been
adopted in many of our projects.
Thank you !!
We are hiring:
www.sra.samsung.com/open-positions
Contact: Praveen.Pratury@Samsung.com
Yingnan.z@Samsung.com
https://www.linkedin.com/in/praveenpratury
https://www.linkedin.com/in/yingnan-zhu-66651113/
Q & A
User Behavior Hashing for Audience Expansion

More Related Content

What's hot

Machine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.pptMachine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.pptbutest
 
Object-oriented Programming-with C#
Object-oriented Programming-with C#Object-oriented Programming-with C#
Object-oriented Programming-with C#Doncho Minkov
 
2CPP09 - Encapsulation
2CPP09 - Encapsulation2CPP09 - Encapsulation
2CPP09 - EncapsulationMichael Heron
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learningbutest
 
SOLID - Principles of Object Oriented Design
SOLID - Principles of Object Oriented DesignSOLID - Principles of Object Oriented Design
SOLID - Principles of Object Oriented DesignRiccardo Cardin
 
Ghaziabad, India - Early Detection of Various Types of Skin Cancer Using Deep...
Ghaziabad, India - Early Detection of Various Types of Skin Cancer Using Deep...Ghaziabad, India - Early Detection of Various Types of Skin Cancer Using Deep...
Ghaziabad, India - Early Detection of Various Types of Skin Cancer Using Deep...Vidit Goyal
 
Toxic Comment Classification
Toxic Comment ClassificationToxic Comment Classification
Toxic Comment Classificationijtsrd
 
Circle generation algorithm
Circle generation algorithmCircle generation algorithm
Circle generation algorithmAnkit Garg
 
Realisation de controlleur VGA(VHDL)
Realisation de controlleur VGA(VHDL)Realisation de controlleur VGA(VHDL)
Realisation de controlleur VGA(VHDL)yahya ayari
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learningAntonio Rueda-Toicen
 
C++ - Constructors,Destructors, Operator overloading and Type conversion
C++ - Constructors,Destructors, Operator overloading and Type conversionC++ - Constructors,Destructors, Operator overloading and Type conversion
C++ - Constructors,Destructors, Operator overloading and Type conversionHashni T
 
Java: Inheritance
Java: InheritanceJava: Inheritance
Java: InheritanceTareq Hasan
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Prakhar Rastogi
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowOswald Campesato
 
Systèmes multi agents concepts et mise en oeuvre avec le middleware jade
Systèmes multi agents concepts et mise en oeuvre avec le middleware jadeSystèmes multi agents concepts et mise en oeuvre avec le middleware jade
Systèmes multi agents concepts et mise en oeuvre avec le middleware jadeENSET, Université Hassan II Casablanca
 
Machine model to classify dogs and cat
Machine model to classify dogs and catMachine model to classify dogs and cat
Machine model to classify dogs and catAkash Parui
 
Image encryption
Image encryptionImage encryption
Image encryptionrakshit2105
 
Abstraction in java
Abstraction in javaAbstraction in java
Abstraction in javasawarkar17
 

What's hot (20)

Machine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.pptMachine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.ppt
 
Object-oriented Programming-with C#
Object-oriented Programming-with C#Object-oriented Programming-with C#
Object-oriented Programming-with C#
 
2CPP09 - Encapsulation
2CPP09 - Encapsulation2CPP09 - Encapsulation
2CPP09 - Encapsulation
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
Intelligent agents
Intelligent agentsIntelligent agents
Intelligent agents
 
SOLID - Principles of Object Oriented Design
SOLID - Principles of Object Oriented DesignSOLID - Principles of Object Oriented Design
SOLID - Principles of Object Oriented Design
 
Ghaziabad, India - Early Detection of Various Types of Skin Cancer Using Deep...
Ghaziabad, India - Early Detection of Various Types of Skin Cancer Using Deep...Ghaziabad, India - Early Detection of Various Types of Skin Cancer Using Deep...
Ghaziabad, India - Early Detection of Various Types of Skin Cancer Using Deep...
 
Toxic Comment Classification
Toxic Comment ClassificationToxic Comment Classification
Toxic Comment Classification
 
Circle generation algorithm
Circle generation algorithmCircle generation algorithm
Circle generation algorithm
 
Realisation de controlleur VGA(VHDL)
Realisation de controlleur VGA(VHDL)Realisation de controlleur VGA(VHDL)
Realisation de controlleur VGA(VHDL)
 
Complexity
ComplexityComplexity
Complexity
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
C++ - Constructors,Destructors, Operator overloading and Type conversion
C++ - Constructors,Destructors, Operator overloading and Type conversionC++ - Constructors,Destructors, Operator overloading and Type conversion
C++ - Constructors,Destructors, Operator overloading and Type conversion
 
Java: Inheritance
Java: InheritanceJava: Inheritance
Java: Inheritance
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
 
Systèmes multi agents concepts et mise en oeuvre avec le middleware jade
Systèmes multi agents concepts et mise en oeuvre avec le middleware jadeSystèmes multi agents concepts et mise en oeuvre avec le middleware jade
Systèmes multi agents concepts et mise en oeuvre avec le middleware jade
 
Machine model to classify dogs and cat
Machine model to classify dogs and catMachine model to classify dogs and cat
Machine model to classify dogs and cat
 
Image encryption
Image encryptionImage encryption
Image encryption
 
Abstraction in java
Abstraction in javaAbstraction in java
Abstraction in java
 

Similar to User Behavior Hashing for Audience Expansion

A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVIntoTheMinds
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVFrancisco Couto
 
Dubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureDubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureHuxing Zhang
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Spark Summit
 
Out With the Old, in With the Open-source: Brainshark's Complete CMS Migration
Out With the Old, in With the Open-source: Brainshark's Complete CMS MigrationOut With the Old, in With the Open-source: Brainshark's Complete CMS Migration
Out With the Old, in With the Open-source: Brainshark's Complete CMS MigrationAcquia
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...Databricks
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEFeng Zhu
 
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioMáté Lang
 
Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...
Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...
Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...Happiest Minds Technologies
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkDatabricks
 
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...Joshwa Philip
 
Do I Need A Service Mesh.pptx
Do I Need A Service Mesh.pptxDo I Need A Service Mesh.pptx
Do I Need A Service Mesh.pptxPINGXIONG3
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreePradeeban Kathiravelu, Ph.D.
 
WDS trainer presentation - MLOps.pptx
WDS trainer presentation - MLOps.pptxWDS trainer presentation - MLOps.pptx
WDS trainer presentation - MLOps.pptxArthur240715
 
A Case for Outside-In Design
A Case for Outside-In DesignA Case for Outside-In Design
A Case for Outside-In DesignSandro Mancuso
 
OpenStack in the Enterprise - Interop Las Vegas 2014
OpenStack in the Enterprise - Interop Las Vegas 2014OpenStack in the Enterprise - Interop Las Vegas 2014
OpenStack in the Enterprise - Interop Las Vegas 2014Seth Fox
 
Software engineering lecture notes
Software engineering lecture notesSoftware engineering lecture notes
Software engineering lecture notesSiva Ayyakutti
 

Similar to User Behavior Hashing for Audience Expansion (20)

A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
 
Dubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureDubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architecture
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
1710 track3 zhu
1710 track3 zhu1710 track3 zhu
1710 track3 zhu
 
Out With the Old, in With the Open-source: Brainshark's Complete CMS Migration
Out With the Old, in With the Open-source: Brainshark's Complete CMS MigrationOut With the Old, in With the Open-source: Brainshark's Complete CMS Migration
Out With the Old, in With the Open-source: Brainshark's Complete CMS Migration
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
 
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.io
 
Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...
Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...
Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Arcadia overview nr2
Arcadia overview nr2Arcadia overview nr2
Arcadia overview nr2
 
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
 
Do I Need A Service Mesh.pptx
Do I Need A Service Mesh.pptxDo I Need A Service Mesh.pptx
Do I Need A Service Mesh.pptx
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
 
WDS trainer presentation - MLOps.pptx
WDS trainer presentation - MLOps.pptxWDS trainer presentation - MLOps.pptx
WDS trainer presentation - MLOps.pptx
 
A Case for Outside-In Design
A Case for Outside-In DesignA Case for Outside-In Design
A Case for Outside-In Design
 
OpenStack in the Enterprise - Interop Las Vegas 2014
OpenStack in the Enterprise - Interop Las Vegas 2014OpenStack in the Enterprise - Interop Las Vegas 2014
OpenStack in the Enterprise - Interop Las Vegas 2014
 
Software engineering lecture notes
Software engineering lecture notesSoftware engineering lecture notes
Software engineering lecture notes
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

User Behavior Hashing for Audience Expansion

  • 1.
  • 2. User Behavior Hashing for Audience Expansion Praveen Pratury Yingnan Zhu
  • 3. Agenda Praveen Pratury ▪ Overview of Samsung ▪ Samsung Audience platform ▪ Lookalike modeling introduction Yingnan Zhu ▪ Lookalike approaches ▪ Speed up with Pandas UDF ▪ Model performance ▪ Results ▪ Q & A Director of Engineering, Samsung Research America Lead Data Scientist, Samsung Research America
  • 6.
  • 8.
  • 9.
  • 10.
  • 11. LookAlike Modeling – Samsung Context Improve Incremental Reach and Improved Targeting for: ▪ TV Networks (Identify new audiences to promote new shows) ▪ Samsung New TV purchases (8K, QLED, Terrace etc)U ffhoffef The goal is to improve Reach and increase conversion for TV shows and New TV purchases Goals Approach By leveraging Samsung’s rich ACR viewership data on 50+ M TVs in US and by applying User Behavior Hashing techniques: ▪ Identify TV viewers similar to existing audiences based on user behavior - Find audiences that will respond favorably to show-specific TV ads ▪ Identify existing premium TV owners to expand to future buyer
  • 12. Look Alike Audience Expansion Example A: seed segment B: expanded segment * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ++ +++ + + + + + + + + + + + + + + + + + + + + + + + ++ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ++ +++ + + + + + + + + + + + + + + + + + + + + + + + ++ A* B All TV Users’ hash code space All TV Users’ hash code space * + 8K TV users non 8K TV users A LookALike example for 8K TV campaign. The goal is to identify users who look alike to 8K TV owners, but have not owned an 8K TV yet. Targeted size to expand
  • 13. Challenges and solutions ▪ Challenges ▪ Large-scale data: ▪ Search space is huge: Hundreds of millions of Smart TV and Mobile users ▪ Efficiency: ▪ Each device could generate thousands of logs per day ▪ Look alike user retrieval has time constraint ▪ Possible solutions ▪ LSH, K-nearest neighbor, similar user search in recommender system and etc. ▪ These solutions sacrifice accuracy, efficiency, do not consider contextual information, or not optimized for time sequence data ▪ Our solution ▪ Heterogeneous user behavior hash code ▪ Provides LSH like bucketized fast searching and maintain high accuracy of user similarity
  • 14. Look Alike Work Flow Online/Offline Offline Raw Data Deep Binary Hashing Models (various bit length) Deep Binary Hashing Models (various bit length) User Hash Codes Lookalike ServiceSeed Segments Expended Segments Processed User Behavior Data
  • 15. Hashing Model Training Flow Df User 1 Hash LayerNetwork LayersInput Similarity Label 1 0 Similar Dissimilar y x +1 -1 SGN User 2 y x +1 -1 • By given two user pair, it first generates user embedding (continuous vector) from Network Layers. After that we make K-bit dimension from Hash Layer. Finally we apply SGN for binary representation. The output is 1: similar, 0: dissimilar
  • 16. Approach # 1: Time-Aware Attention CNN Model
  • 17. Model Explained ▪ Input layer: ▪ The input layer is the data pre-processing layer. In this layer, we will map sequential behavior data input into a 3D structure that can be processed by CNN. ▪ The first step in our data pre-processing step is to embed each item into a D dimension vector. The next step is to sessionize user’s history by a specific time unit (e.g., hour). For each session, we aggregate all items that the user in consideration had interacted with using the multi-hot encoding of the corresponding items. This will represent the summary of user’s behavior for the given session. After sessionization, we map each user’s behavior input into the high dimensional space. ▪ Embedding layer: ▪ Since the multi-hot encoding scheme used during our pre-processing step is a sparse and hand-crafted encoding scheme, it carries more conceptual information than similarity information itself. This would affect the overall performance of TAACNN, particularly its ability to preserve similarity information at large scale. To overcome this limitation, we introduce an embedding layer as part of our model. ▪ Time-Aware attention layer: ▪ The time-aware attention is used to abstract time-aware attention features in our TAACNN model. This layer separates attention features into short-term and long-term features.
  • 18. Approach #2: Categorical Attention Model
  • 19. Distributed Inference ▪ Issues: ▪ Large data scales and hundreds of millions user’s profile need update within limited time and computation resource ▪ Current Spark UDF is processed row-at-a-time and it won’t satisfy the requirements ▪ Need efficient distributed inference methods ▪ Solution: Pandas UDF ▪ Scalar ▪ Scalar iterator ▪ Group map ▪ Group aggregate
  • 21. Model Performance ▪ We used the accuracy measure as the main performance metric for all binary hashing algorithms because each user has the identical number of similar and dissimilar user pairs.
  • 22. Conclusion ▪ A novel deep binary hashing architecture to derive similarity preserving binary hash codes for sequential behavior data. ▪ TAACNN explores evolving user’s attention preferences across different time awareness level separately. Experiments results show significant over-performance compared to other well-known hashing methods ▪ Pandas UDF improved efficiency significantly. They have been adopted in many of our projects.
  • 23. Thank you !! We are hiring: www.sra.samsung.com/open-positions Contact: Praveen.Pratury@Samsung.com Yingnan.z@Samsung.com https://www.linkedin.com/in/praveenpratury https://www.linkedin.com/in/yingnan-zhu-66651113/
  • 24. Q & A