SlideShare a Scribd company logo
1 of 25
Download to read offline
User Behavior Hashing
for Audience Expansion
Praveen Pratury
Yingnan Zhu
Agenda
Praveen Pratury
▪ Overview of Samsung
▪ Samsung Audience platform
▪ Lookalike modeling introduction
Yingnan Zhu
▪ Lookalike approaches
▪ Speed up with Pandas UDF
▪ Model performance
▪ Results
▪ Q & A
Director of Engineering, Samsung
Research America
Lead Data Scientist, Samsung Research
America
Samsung Overview
Samsung Electronics Today
Samsung Audience Platform
LookAlike Modeling – Samsung Context
Improve Incremental Reach and Improved Targeting for:
▪ TV Networks (Identify new audiences to promote new shows)
▪ Samsung New TV purchases (8K, QLED, Terrace etc)U ffhoffef
The goal is to improve Reach and increase conversion for TV shows and New TV purchases
Goals
Approach
By leveraging Samsung’s rich ACR viewership data on 50+ M TVs in US and by applying User Behavior Hashing techniques:
▪ Identify TV viewers similar to existing audiences based on user behavior
- Find audiences that will respond favorably to show-specific TV ads
▪ Identify existing premium TV owners to expand to future buyer
Look Alike Audience Expansion Example
A: seed segment
B: expanded segment
* *
*
*
* *
* *
*
*
* *
* *
*
*
* *
* *
*
* *
*
* *
* * *
*
*
++ +++
+
+
+ +
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + +
++
* *
*
*
* *
* *
*
*
* *
* *
*
*
* *
* *
*
*
* *
* *
*
*
* *
* * *
*
*
*
++ +++
+
+
+ +
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + +
++
A*
B
All TV Users’ hash code space All TV Users’ hash code space
*
+
8K TV users
non 8K TV users
A
LookALike example for 8K TV campaign. The goal is to identify users who look alike to 8K TV owners, but have not owned an 8K TV yet.
Targeted size to expand
Challenges and solutions
▪ Challenges
▪ Large-scale data:
▪ Search space is huge: Hundreds of millions of Smart TV and Mobile users
▪ Efficiency:
▪ Each device could generate thousands of logs per day
▪ Look alike user retrieval has time constraint
▪ Possible solutions
▪ LSH, K-nearest neighbor, similar user search in recommender system and etc.
▪ These solutions sacrifice accuracy, efficiency, do not consider contextual information, or not optimized for time sequence data
▪ Our solution
▪ Heterogeneous user behavior hash code
▪ Provides LSH like bucketized fast searching and maintain high accuracy of user similarity
Look Alike Work Flow
Online/Offline
Offline
Raw Data
Deep Binary
Hashing Models
(various bit length)
Deep Binary
Hashing Models
(various bit length)
User Hash
Codes
Lookalike ServiceSeed
Segments
Expended
Segments
Processed
User Behavior
Data
Hashing Model Training Flow
Df
User 1
Hash LayerNetwork LayersInput Similarity Label
1
0
Similar
Dissimilar
y
x
+1
-1
SGN
User 2
y
x
+1
-1
• By given two user pair, it first generates user embedding (continuous vector)
from Network Layers. After that we make K-bit dimension from Hash Layer.
Finally we apply SGN for binary representation. The output is 1: similar, 0:
dissimilar
Approach # 1: Time-Aware Attention CNN Model
Model Explained
▪ Input layer:
▪ The input layer is the data pre-processing layer. In this layer, we will map sequential behavior data input into a 3D structure that can
be processed by CNN.
▪ The first step in our data pre-processing step is to embed each item into a D dimension vector. The next step is to sessionize user’s
history by a specific time unit (e.g., hour). For each session, we aggregate all items that the user in consideration had interacted
with using the multi-hot encoding of the corresponding items. This will represent the summary of user’s behavior for the given
session. After sessionization, we map each user’s behavior input into the high dimensional space.
▪ Embedding layer:
▪ Since the multi-hot encoding scheme used during our pre-processing step is a sparse and hand-crafted encoding scheme, it carries
more conceptual information than similarity information itself. This would affect the overall performance of TAACNN, particularly its
ability to preserve similarity information at large scale. To overcome this limitation, we introduce an embedding layer as part of our
model.
▪ Time-Aware attention layer:
▪ The time-aware attention is used to abstract time-aware attention features in our TAACNN model. This layer separates attention
features into short-term and long-term features.
Approach #2: Categorical Attention Model
Distributed Inference
▪ Issues:
▪ Large data scales and hundreds of millions user’s profile need update within limited time and computation resource
▪ Current Spark UDF is processed row-at-a-time and it won’t satisfy the requirements
▪ Need efficient distributed inference methods
▪ Solution: Pandas UDF
▪ Scalar
▪ Scalar iterator
▪ Group map
▪ Group aggregate
Code Snippet
Model Performance
▪ We used the accuracy measure as the main performance metric for all
binary hashing algorithms because each user has the identical number
of similar and dissimilar user pairs.
Conclusion
▪ A novel deep binary hashing architecture to derive similarity
preserving binary hash codes for sequential behavior data.
▪ TAACNN explores evolving user’s attention preferences across
different time awareness level separately. Experiments results show
significant over-performance compared to other well-known hashing
methods
▪ Pandas UDF improved efficiency significantly. They have been
adopted in many of our projects.
Thank you !!
We are hiring:
www.sra.samsung.com/open-positions
Contact: Praveen.Pratury@Samsung.com
Yingnan.z@Samsung.com
https://www.linkedin.com/in/praveenpratury
https://www.linkedin.com/in/yingnan-zhu-66651113/
Q & A
User Behavior Hashing for Audience Expansion

More Related Content

What's hot

[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례
[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례
[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례BJ Jang
 
Introduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraIntroduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraJohnny Miller
 
JanusGraph DataBase Concepts
JanusGraph DataBase ConceptsJanusGraph DataBase Concepts
JanusGraph DataBase ConceptsSanil Bagzai
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkSpark Summit
 
Starbase: Graph-Based Security Analysis for Everyone
Starbase: Graph-Based Security Analysis for EveryoneStarbase: Graph-Based Security Analysis for Everyone
Starbase: Graph-Based Security Analysis for EveryoneNeo4j
 
SLF4J (Simple Logging Facade for Java)
SLF4J (Simple Logging Facade for Java)SLF4J (Simple Logging Facade for Java)
SLF4J (Simple Logging Facade for Java)Guo Albert
 
The Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedInThe Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedInrajappaiyer
 
網路廣告的基本架構
網路廣告的基本架構網路廣告的基本架構
網路廣告的基本架構Chen-en Lu
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Modelebenhewitt
 
Let's talk about Garbage Collection
Let's talk about Garbage CollectionLet's talk about Garbage Collection
Let's talk about Garbage CollectionHaim Yadid
 
Base de données graphe et Neo4j
Base de données graphe et Neo4jBase de données graphe et Neo4j
Base de données graphe et Neo4jBoris Guarisma
 
Applying Network Analytics in KYC
Applying Network Analytics in KYCApplying Network Analytics in KYC
Applying Network Analytics in KYCNeo4j
 

What's hot (14)

[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례
[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례
[Foss4 g2013 korea]postgis와 geoserver를 이용한 대용량 공간데이터 기반 일기도 서비스 구축 사례
 
Introduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraIntroduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache Cassandra
 
JanusGraph DataBase Concepts
JanusGraph DataBase ConceptsJanusGraph DataBase Concepts
JanusGraph DataBase Concepts
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
 
Starbase: Graph-Based Security Analysis for Everyone
Starbase: Graph-Based Security Analysis for EveryoneStarbase: Graph-Based Security Analysis for Everyone
Starbase: Graph-Based Security Analysis for Everyone
 
SLF4J (Simple Logging Facade for Java)
SLF4J (Simple Logging Facade for Java)SLF4J (Simple Logging Facade for Java)
SLF4J (Simple Logging Facade for Java)
 
The Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedInThe Big Data Analytics Ecosystem at LinkedIn
The Big Data Analytics Ecosystem at LinkedIn
 
Cassandra ppt 1
Cassandra ppt 1Cassandra ppt 1
Cassandra ppt 1
 
Apache Helix presentation at Vmware
Apache Helix presentation at VmwareApache Helix presentation at Vmware
Apache Helix presentation at Vmware
 
網路廣告的基本架構
網路廣告的基本架構網路廣告的基本架構
網路廣告的基本架構
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Let's talk about Garbage Collection
Let's talk about Garbage CollectionLet's talk about Garbage Collection
Let's talk about Garbage Collection
 
Base de données graphe et Neo4j
Base de données graphe et Neo4jBase de données graphe et Neo4j
Base de données graphe et Neo4j
 
Applying Network Analytics in KYC
Applying Network Analytics in KYCApplying Network Analytics in KYC
Applying Network Analytics in KYC
 

Similar to User Behavior Hashing for Audience Expansion

A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVIntoTheMinds
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVFrancisco Couto
 
Dubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureDubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureHuxing Zhang
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Spark Summit
 
Out With the Old, in With the Open-source: Brainshark's Complete CMS Migration
Out With the Old, in With the Open-source: Brainshark's Complete CMS MigrationOut With the Old, in With the Open-source: Brainshark's Complete CMS Migration
Out With the Old, in With the Open-source: Brainshark's Complete CMS MigrationAcquia
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...Databricks
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEFeng Zhu
 
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioMáté Lang
 
Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...
Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...
Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...Happiest Minds Technologies
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkDatabricks
 
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...Joshwa Philip
 
Do I Need A Service Mesh.pptx
Do I Need A Service Mesh.pptxDo I Need A Service Mesh.pptx
Do I Need A Service Mesh.pptxPINGXIONG3
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreePradeeban Kathiravelu, Ph.D.
 
WDS trainer presentation - MLOps.pptx
WDS trainer presentation - MLOps.pptxWDS trainer presentation - MLOps.pptx
WDS trainer presentation - MLOps.pptxArthur240715
 
A Case for Outside-In Design
A Case for Outside-In DesignA Case for Outside-In Design
A Case for Outside-In DesignSandro Mancuso
 
OpenStack in the Enterprise - Interop Las Vegas 2014
OpenStack in the Enterprise - Interop Las Vegas 2014OpenStack in the Enterprise - Interop Las Vegas 2014
OpenStack in the Enterprise - Interop Las Vegas 2014Seth Fox
 
Software engineering lecture notes
Software engineering lecture notesSoftware engineering lecture notes
Software engineering lecture notesSiva Ayyakutti
 

Similar to User Behavior Hashing for Audience Expansion (20)

A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
 
Dubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureDubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architecture
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
1710 track3 zhu
1710 track3 zhu1710 track3 zhu
1710 track3 zhu
 
Out With the Old, in With the Open-source: Brainshark's Complete CMS Migration
Out With the Old, in With the Open-source: Brainshark's Complete CMS MigrationOut With the Old, in With the Open-source: Brainshark's Complete CMS Migration
Out With the Old, in With the Open-source: Brainshark's Complete CMS Migration
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
 
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.io
 
Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...
Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...
Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Arcadia overview nr2
Arcadia overview nr2Arcadia overview nr2
Arcadia overview nr2
 
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
 
Do I Need A Service Mesh.pptx
Do I Need A Service Mesh.pptxDo I Need A Service Mesh.pptx
Do I Need A Service Mesh.pptx
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
 
WDS trainer presentation - MLOps.pptx
WDS trainer presentation - MLOps.pptxWDS trainer presentation - MLOps.pptx
WDS trainer presentation - MLOps.pptx
 
A Case for Outside-In Design
A Case for Outside-In DesignA Case for Outside-In Design
A Case for Outside-In Design
 
OpenStack in the Enterprise - Interop Las Vegas 2014
OpenStack in the Enterprise - Interop Las Vegas 2014OpenStack in the Enterprise - Interop Las Vegas 2014
OpenStack in the Enterprise - Interop Las Vegas 2014
 
Software engineering lecture notes
Software engineering lecture notesSoftware engineering lecture notes
Software engineering lecture notes
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 

Recently uploaded (20)

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 

User Behavior Hashing for Audience Expansion

  • 1.
  • 2. User Behavior Hashing for Audience Expansion Praveen Pratury Yingnan Zhu
  • 3. Agenda Praveen Pratury ▪ Overview of Samsung ▪ Samsung Audience platform ▪ Lookalike modeling introduction Yingnan Zhu ▪ Lookalike approaches ▪ Speed up with Pandas UDF ▪ Model performance ▪ Results ▪ Q & A Director of Engineering, Samsung Research America Lead Data Scientist, Samsung Research America
  • 6.
  • 8.
  • 9.
  • 10.
  • 11. LookAlike Modeling – Samsung Context Improve Incremental Reach and Improved Targeting for: ▪ TV Networks (Identify new audiences to promote new shows) ▪ Samsung New TV purchases (8K, QLED, Terrace etc)U ffhoffef The goal is to improve Reach and increase conversion for TV shows and New TV purchases Goals Approach By leveraging Samsung’s rich ACR viewership data on 50+ M TVs in US and by applying User Behavior Hashing techniques: ▪ Identify TV viewers similar to existing audiences based on user behavior - Find audiences that will respond favorably to show-specific TV ads ▪ Identify existing premium TV owners to expand to future buyer
  • 12. Look Alike Audience Expansion Example A: seed segment B: expanded segment * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ++ +++ + + + + + + + + + + + + + + + + + + + + + + + ++ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ++ +++ + + + + + + + + + + + + + + + + + + + + + + + ++ A* B All TV Users’ hash code space All TV Users’ hash code space * + 8K TV users non 8K TV users A LookALike example for 8K TV campaign. The goal is to identify users who look alike to 8K TV owners, but have not owned an 8K TV yet. Targeted size to expand
  • 13. Challenges and solutions ▪ Challenges ▪ Large-scale data: ▪ Search space is huge: Hundreds of millions of Smart TV and Mobile users ▪ Efficiency: ▪ Each device could generate thousands of logs per day ▪ Look alike user retrieval has time constraint ▪ Possible solutions ▪ LSH, K-nearest neighbor, similar user search in recommender system and etc. ▪ These solutions sacrifice accuracy, efficiency, do not consider contextual information, or not optimized for time sequence data ▪ Our solution ▪ Heterogeneous user behavior hash code ▪ Provides LSH like bucketized fast searching and maintain high accuracy of user similarity
  • 14. Look Alike Work Flow Online/Offline Offline Raw Data Deep Binary Hashing Models (various bit length) Deep Binary Hashing Models (various bit length) User Hash Codes Lookalike ServiceSeed Segments Expended Segments Processed User Behavior Data
  • 15. Hashing Model Training Flow Df User 1 Hash LayerNetwork LayersInput Similarity Label 1 0 Similar Dissimilar y x +1 -1 SGN User 2 y x +1 -1 • By given two user pair, it first generates user embedding (continuous vector) from Network Layers. After that we make K-bit dimension from Hash Layer. Finally we apply SGN for binary representation. The output is 1: similar, 0: dissimilar
  • 16. Approach # 1: Time-Aware Attention CNN Model
  • 17. Model Explained ▪ Input layer: ▪ The input layer is the data pre-processing layer. In this layer, we will map sequential behavior data input into a 3D structure that can be processed by CNN. ▪ The first step in our data pre-processing step is to embed each item into a D dimension vector. The next step is to sessionize user’s history by a specific time unit (e.g., hour). For each session, we aggregate all items that the user in consideration had interacted with using the multi-hot encoding of the corresponding items. This will represent the summary of user’s behavior for the given session. After sessionization, we map each user’s behavior input into the high dimensional space. ▪ Embedding layer: ▪ Since the multi-hot encoding scheme used during our pre-processing step is a sparse and hand-crafted encoding scheme, it carries more conceptual information than similarity information itself. This would affect the overall performance of TAACNN, particularly its ability to preserve similarity information at large scale. To overcome this limitation, we introduce an embedding layer as part of our model. ▪ Time-Aware attention layer: ▪ The time-aware attention is used to abstract time-aware attention features in our TAACNN model. This layer separates attention features into short-term and long-term features.
  • 18. Approach #2: Categorical Attention Model
  • 19. Distributed Inference ▪ Issues: ▪ Large data scales and hundreds of millions user’s profile need update within limited time and computation resource ▪ Current Spark UDF is processed row-at-a-time and it won’t satisfy the requirements ▪ Need efficient distributed inference methods ▪ Solution: Pandas UDF ▪ Scalar ▪ Scalar iterator ▪ Group map ▪ Group aggregate
  • 21. Model Performance ▪ We used the accuracy measure as the main performance metric for all binary hashing algorithms because each user has the identical number of similar and dissimilar user pairs.
  • 22. Conclusion ▪ A novel deep binary hashing architecture to derive similarity preserving binary hash codes for sequential behavior data. ▪ TAACNN explores evolving user’s attention preferences across different time awareness level separately. Experiments results show significant over-performance compared to other well-known hashing methods ▪ Pandas UDF improved efficiency significantly. They have been adopted in many of our projects.
  • 23. Thank you !! We are hiring: www.sra.samsung.com/open-positions Contact: Praveen.Pratury@Samsung.com Yingnan.z@Samsung.com https://www.linkedin.com/in/praveenpratury https://www.linkedin.com/in/yingnan-zhu-66651113/
  • 24. Q & A