SlideShare a Scribd company logo
1 of 52
Download to read offline
HOANG ANH TUAN
CTO Admicro | VCCorp
tuanhoanganh@vccorp.vn
M L & A I A P P R O A C H
T O U S E R U N D E R S TA N D I N G E C O S Y S T E M
AT V C C O R P
A p p l i c a t i o n s t o N e w s , A d s & E - c o m m e r c e
A g e n d a
1. Overview
o VCCorp in a nutshell
o Milestones
2. Main Applications
o News Distribution
o News Recommendation Engine
o A/B Testing for Evaluation
o Advertising technology
1 . O ve r v iew
V C C O R P i n a n u t s h e l l
43M
PC audience
Media
PC, Mobile, Smart TV
38M
Mobile audience
Ad Network
Payment Platform | Cloud Computing & Big Data Analysis
VCCorp – Innovation. Non-stop!
OVERVIEW
o First mover DNA
o 50% YoY Growth
o 43M web audience
o 38M mobile audience
o 1700 employees
INVESTORS
Games
Mobile games, web games,
client games
E-commerce 12M
Visitors & Buyers
V C C O R P M i l e s t o n e s
V C C O R P C o m p u t i n g f a r m
Storage Engine
6 PB
Storage Engine
40TB
GPU Farm
120 GPUs GTX 1080 Ti
Compute Engine
20.000 CPUs
30 Tb processing
per day
1500 models selected
10 billion calculations
per day
2 billion records
processing per day
2. Main Applications
Cross Device
User Content
Consumption
Fraud Detection
Analytic Framework
Anomaly
Detection
Video Analytic
Cyber Security
Advertising Technology
CTR PredictionBidding
Strategy
Video Content Distribution
News Distribution
Breaking
news
Social news
Events
Detection
Recommendation Engine
News
Streaming
E-Commerce
User
Personalization
Text Mining
Sentiment Analysis
Core ML & AI Core NLP
Deep Learning
Computer Vision
Image
Processing
Voice
Recognition
Pattern
Recognition
Object
Detection
A next generation
of Digital magazine
VCCORP
News distribution & News recommendation engine
A I - p o w e r e d N e w s p l a t f o r m - E c o s y s t e m
Consumption
Articles
Feeds
Reader Logs
Reader Rating
Other
endpoint...
Hadoop
Deep learning cluster Neural net
Creation
Articles
Images
Video
Editors
Other
endpoint...
Process, Analyze, Data-Mine, Understand, Organize
Bigdata Platform
AI Infrastructure, Machine Learning Services, Algorithms
AI-assisted
Content
Consumption
AI-assisted
Content
Creation
AI INFRASTRUCTURE TO DISTRIBUTE AND PUBLISH NEWS
N ew s d i s t r i b u t i o n e n g i n e
( N D E )
I n t e r a t i v e V i d e o A d sN e w d i s t r i b u t i o n e n g i n e ( N D E )
VCCORP possesses many large online newspapers in Vietnam
• 14 sites in top 100 websites in various categories: news, finance, family, youth, auto, hi-tech…
AI-powered
News Platform
D i g i t a l m a g a z i n e e m p o w e r e d b y
A I P l a t f o r m
Broadsheets
• Dantri.com.vn
• SohaNews
• Vtv.vn
• Kenh14.vn
SYSTEM INFO
• > 3.6B request per day
• > 120,000 concurrent requests
CHALLENGES
• 50M readers
• 40% loyalty readers
• 20% subscribe readers
• Publish over 300 articles, blog
spots and interactive stories
per day
Tech, Game
• Gamek.vn
• Genk.vn
Finance
• Cafef.vn
• Cafebiz.vn
Family
• Afamily.vn
E d i t o r s ’ c h a l l e n g e s
1. On an average, publishes
over 300 articles, blog posts
and interactive stories a day
2. Workload of editors:
24h/day, seven days a week
sitting to review contents
3. Editors manually and and
intuitively selected articles.
Bad news Good news
What kinds of breaking news stories
will be selected?
My decision is good? How to
know that?
Editor Emotions
S o l u t i o n : A I - a s s i s t e d c o n t e n t c r e a t i o n
RESULTSACTIONS
Bob Select article per hour depending on
his knowledge and mood!
Few knowledge articles, limited content,
poor curation, lots of dead links, and no
semantic relationships
EDITOR
Bob
EDITOR
with
AI-assisted
Content Creation
Bob’s replacement generates hundred of
higher quality, highly curated, and
semantically inter-linked articles,
in the time it takes Bob to create just
one… at a fraction of the cost
Good content, greater
curation, almost no dead links,
and semantic relationships
I n t e r a t i v e V i d e o A d s
Features:
• Measure reaction using Sampling technique
• Select the news stories for the targeted audience
• Select the high quality news stories
• Breaking news, trending news and story development supported
• Automatically ranking and replace low quality news stories by good ones
We applied:
User profile
● Demographic (age,
gender, location)
● User Interest
● User Lookalike
Content consumption
● Relation extraction
● Semantic representation
● Content Profile
Context understanding
● Trending (long, short)
● Breaking news
● Timeliness
S o l u t i o n : A I - a s s i s t e d c o n t e n t c r e a t i o n
N D E – T e c h n i c a l A r c h i t e c t u r e
N D E – A l g o r i t h m s w o r k i n g f l o w
Event detection
Feedback
NewsDB
Social
Resources
Analytic and
Aggregation Engine
Context Understanding
breaking
news
Trending
news
Content Consumption
Semantic
representation
Relation extraction
User (Mass Reader)
User Demographic
Optimize
algorithms
and model
Feedback
Online Compute Service
Selection and ranking
Sampling
News Algorithms
Service
Sampling box
Streaming Box
Algorithms Selection
N D E – S a m p l i n g t e c h n i q u e
XXXX
XXXX
XXX
viral news stories
Breaking news
stories
...
In-depth and
story
development
news stories
News
Sampling
News
News
News
News
News
News
News
User Profiles
Algorithms
Selection
- Time on read
- Scroll & mouse
movement
- Social engagement
N D E – R a n k i n g s c o r e
Score time decaySampling
Monte Carlo sampling
Viral score
Hot news stories ranking
Total score
Learning to rank algorithms
I n t e r a t i v e V i d e o A d sN D E – C o n t e x t u a l u n d e r s t a n d i n g
Problems:
• Hot news detection
• Breaking news detection
• Trending detection
• Event detection
Contents
Timeline
Articles - 1
Articles - 2
Breaking period
Trending news
Emotion, Facts
Location
Text Base
Event detection and Tracking new stories
• Event detection & Tracking in News stream
N D E – D e v e l o p i n g S t o r y
News’
timeline
Image Source: Breaking News Detection and Tracking in Twitter
Editor setting
Sampling Box
Hot trending new
Event and average
trending news
stories
I n t e r a t i v e V i d e o A d sA / B T e s t i n g – F r a m e w o r k f o r E v a l u a t i o n
Is Option A better than Option B? Let’s test:
• Increase the rate of learning
• Accelerate ranking algorithm
• Evaluate many algorithms and models
TRAFFIC
A
B
Old
Model
New
Model
I n t e r a t i v e V i d e o A d sA / B T e s t i n g – A l g o r i t h m A s s e s s m e n t s
Base-line Algorithm
Experimental Algorithm
ALG No.1
Group Users A
ALG No.2
Group Users B
I n t e r a t i v e V i d e o A d sN D E – E v a l u a t i o n
CTR ( Click-Through Rate) Pageview
N ew s r e c o m m e n d a t i o n
e n g i n e ( N R E )
USER PREFERENCES AND HIS CONTEXT? GIVE IT TO
OUR RECOMMENDATION ENGINE
S o m e E v i d e n c e s
GOOGLE NEWS
38% more click-through-rate due to
recommendation
NEMO (VCCORP)
40% sales from recommendation
KENH14.VN | GENK.VN
• 30% page-views more
from recommendation
boxes
• 14% reading time more
NETFLIX
2/3 rented movies from
recommendation
AMAZON
35% sales from recommendation
I n t e r a t i v e V i d e o A d sN e w r e c o m m e n d a t i o n e n g i n e ( N R E )
Choose news that fits with many contexts and user preferences
• Increase user experiments
• Choose high quality-content for each user
• Discover new contents for each user
Relatedness News
● Relevant content
● Tag based recommender
● Semantic Entity Graph
based recommender
Personalized News
● Light-Personalization
● Deep Interest personalization
Trending and Popular News
● Trending (long, short)
● Breaking news
● Developing Story
N R E – G e n e r a l w o r k f l o w
Graph processor
• Aggregate edges relationship
• Trims irrelevant ones
• Find user look-alike
Digital News ranking engine
• Collect hot and breaking news
• Ranking event by time and context
Model Selection Algorithms
• Model optimization
MODEL
parameters/ graphs
User Graph DB
Push news
Machine Learning
• Model selection and optimization
• Optimize real time-decision
Learning Framework (tensorflow, theano,...)
Feedback
update
update
N R E – A r c h i t e c t u r e h i g h l e v e l
Offline Learning
Online Learning
Bigdata-
HDFS
User’s session
and Profile
query
Event
Distribution
logs
Learning
management
Services
Recommended
contents
feedback
Model Training
Online
Computeration
model
Online
machine learning
Machine learning
Algorithms
update
update
User Modeling
User Profiling
Behavior
Analyzing
User
Profile
load/
update
feeds
Content Analysing
Relevant content
Semantic Entity
Graph
N R E – H o w t o b u i l d u s e r m o d e l i n g
1. User’s network
profile
2. User’s GraphDB 3. User profile
representation
4. User behavior
prediction
N R E – U s e r ’ s n e t w o r k p r o f i l e
> 50M web logs
profiles
> 10M VietID profiles
>1M E-commerce
profiles
N R E – U s e r ’ s G r a p h D B
• dantri.com.vn
• kenh14.vn
• User Demographic (Sex,
Age)
User Interesting
• User Relationship
• User Behavior
• 100 M profiles and graph
• 10 billion transactions per day
• 30-35 MB/sec disk data-transfer
rate = 4 months to read the web
E-Commerce Sites
Other
sources
N R E – U s e r p r o f i l e r e p r e s e n t a t i o n
Brower’s
logs
Event’s
logs
Rating
Service
Feature
Extraction
User Interesting
User Representation
User Segmentation
User Demographic
Domain Model
User model
Inferencer
Semantic
Understanding
Personalization
User Graph
Model
Collaborative
Filtering
N R E – U s e r b e h a v i o r p r e d i c t i o n
User
Profile
Interactive Contents
LS t-1
LS t
LS t+1
Times
sequence
features
Observer(CNN)
Inferencer(RNN)
User
LS1 ,LS2 ,LS3...
L (Learning dimension| P)
I n t e r a t i v e V i d e o A d s
Purpose: Find new stories relevant to users, such as the right news at the
right time, personalized supplements news stories
N R E – C o n t e n t a n a l y s i n g
Personalised News Relatedness News
• Hybrid model
(User Interesting +
Collaborative Topic Modeling +
Hot Trending news)
• Relatedness new stories
• Automated keyword
tagging
N R E – P e r s o n a l i s e d n e w s
Article Representation: Convolution
Neural Network + word2vec/fasttext
User Representation: Temporal
Convolutional Networks
Figure: implemented and customized from paper: Embedding-based News Recommendation for Millions of Users - KDD 2017
Old model Customized Model
Article Representation: Autoencoder
User Representation: GRU
N R E – R e l a t e d n e s s n e w s
Image Title
Description
Feature extraction
NLP Core
Matching score
N R E – R e l a t e d n e s s n e w s
Box Relatedness news
Box Relatedness news
Keyword Extraction
N R E – R e a l E s t a t e
R e c o m m e n d a t i o n S y s t e m
● Recommend box based
○ Kind: apartments/house
○ Properties:
■ prices
■ location
■ F1/F2
Recommend Box
N R E – S i t e s
Box recommend
Click &
show popup
Box Relatedness news
I n t e r a t i v e V i d e o A d sN R E – E v a l u a t i o n f o r D i g i t a l m a g a z i n e
N R E – E - c o m m e r c e a n d m o r e …
Box recommend
Increase 45% traffic from the Recommend Engine boxes
A d ve r t i s i n g Te c h n o l o g y
A D M I C R O – o v e r v i e w
#1 adnetwork in Vietnam
200+ top publishers in Vietnam
10,000+ advertiser
Formats
Leading the marketshare with
22 formats (including display,
video, native)
Traffic
4B pageviews/month
1.5B impressions/day
Coverage
38% marketshare
43M internet user reach (97.6% VN internet user)
38M mobile user reach (95% VN smartphone user)
A I - p o w e r e d A d v e r t i s i n g T e c h n o l o g y
CHALLENGES
• 80 ms is the maximum time
of a transaction
• >1000 sites in Vietnam
• 4.5B request/day
• Number of transaction:
$300,000/day
Ad Optimization
Retargeting ads
User profiles
• Age
• Gender
• Location
• Behavior
Realtime Bidding
A d v e r t i s i n g T e c h n o l o g y –
R e t a r g e t i n g
Recommender System
o CF, Content-based Filtering
o Collaborative Topic
Modeling
Production Extraction
o Deep Knowledge Network
o Semantic Web and
Ontology Analysis
A d v e r t i s i n g T e c h n o l o g y –
R e a l t i m e b i d d i n g
A transaction (sell/purchase) of ad impressions is immediately
proceeded when an audience triggers the ad zones
CHALLENGES
• Serving more 40M users
• 80 ms is the max of a transaction
• >1000 sites in Vietnam
• 4.5B request/day
• Number of transaction: $300,000/day
• 430B estimated operations for each time
SOLUTIONS
• High load capacity web server
• Optimization algorithms
• Estimate and predict algorithms
A d v e r t i s i n g T e c h n o l o g y –
B i d d i n g a l g o r i t h m s ( a u t o m a t e d b u y i n g )
CTR/CVR Estimation
• Logistic Regression
• Factorization Machine
• Online Bayesian Logistic Regression
• Convolutional Click Prediction Model
• Neural Networks Prediction Model
Bid Landscape Forecasting
• Xgboost (Gradient Boosting Tree)
• Survival Tree Models
Bidding Strategy
Non-truthful Linear bidding
Optimal RTB Bidding Strategy
Contextual Bandits (Online Learning)
Image adapted from : Real-Time Bidding based Display Advertising: Mechanisms and Algorithms - Jun Wang
A d v e r t i s i n g T e c h n o l o g y –
P e r f o r m a n c e
Paid Network display
Ads
Targeted ads
through
Admarket DSP
Conversion Rate (CVR)
Before
Optimization
After
Optimization
Click-Through Rate (CTR) Optimization
CTR (Click-Through Rate)
The CTR was 41% higher than before. Thanks
to the machine learning mechanism of the
bidding engine.
CVR (Conversion Rate)
The CVR of targeted Ads through Admarket DSP
was 7.4 higher than that of display network ads
and 1.4 higher than that of paid search
Thank you

More Related Content

Similar to ML&AI APPROACH TO USER UNDERSTANDING ECOSYSTEM AT VCCORP Applications to News, Ads & E-commerce

VCCORP SoICT 2018
VCCORP SoICT 2018VCCORP SoICT 2018
VCCORP SoICT 2018Tuan Hoang
 
Turn Every Moment into Opportunity with Psychographic Segmentation
Turn Every Moment into Opportunity with Psychographic SegmentationTurn Every Moment into Opportunity with Psychographic Segmentation
Turn Every Moment into Opportunity with Psychographic SegmentationCleverTap
 
Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016Stanford University
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseSoftServe
 
Building a Cognitive Business – Josh Sutton, SapientRazorfish Data & Artifici...
Building a Cognitive Business – Josh Sutton, SapientRazorfish Data & Artifici...Building a Cognitive Business – Josh Sutton, SapientRazorfish Data & Artifici...
Building a Cognitive Business – Josh Sutton, SapientRazorfish Data & Artifici...Publicis Sapient
 
How AI is Impacting User Experience (UX)
How AI is Impacting User Experience (UX)How AI is Impacting User Experience (UX)
How AI is Impacting User Experience (UX)Vbout.com
 
Analytics what to look for sustaining your growing business-
Analytics   what to look for sustaining your growing business-Analytics   what to look for sustaining your growing business-
Analytics what to look for sustaining your growing business-Ajay Ohri
 
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jAI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jIvan Zoratti
 
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social MediaSeth Grimes
 
Impacto del Big Data en la empresa española
Impacto del Big Data en la empresa españolaImpacto del Big Data en la empresa española
Impacto del Big Data en la empresa españolaParadigma Digital
 
Introduction to Metrics - Tetuan Valley/CEU course, March 2014
Introduction to Metrics - Tetuan Valley/CEU course, March 2014Introduction to Metrics - Tetuan Valley/CEU course, March 2014
Introduction to Metrics - Tetuan Valley/CEU course, March 2014Justo Hidalgo
 
El impacto del big data en la estrategia de los medios de comunicacion by Osc...
El impacto del big data en la estrategia de los medios de comunicacion by Osc...El impacto del big data en la estrategia de los medios de comunicacion by Osc...
El impacto del big data en la estrategia de los medios de comunicacion by Osc...ACTUONDA
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
Productionalize content recommendation engine
Productionalize content recommendation engine Productionalize content recommendation engine
Productionalize content recommendation engine Kim Ming Teh
 
User Experience Portfolio. Sushmita Dutt
User Experience Portfolio. Sushmita DuttUser Experience Portfolio. Sushmita Dutt
User Experience Portfolio. Sushmita DuttSushmita Dutt
 
Kp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptxKp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptxCloudBusiness2
 
A Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in ProductionA Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in ProductionAggregage
 
Big Data Customer Experience Analytics -- The Next Big Opportunity for You
Big Data Customer Experience Analytics -- The Next Big Opportunity for You Big Data Customer Experience Analytics -- The Next Big Opportunity for You
Big Data Customer Experience Analytics -- The Next Big Opportunity for You Dr.Dinesh Chandrasekar PhD(hc)
 
Data Science, Personalisation & Product management
Data Science, Personalisation & Product managementData Science, Personalisation & Product management
Data Science, Personalisation & Product managementBhaskar Krishnan
 

Similar to ML&AI APPROACH TO USER UNDERSTANDING ECOSYSTEM AT VCCORP Applications to News, Ads & E-commerce (20)

VCCORP SoICT 2018
VCCORP SoICT 2018VCCORP SoICT 2018
VCCORP SoICT 2018
 
Turn Every Moment into Opportunity with Psychographic Segmentation
Turn Every Moment into Opportunity with Psychographic SegmentationTurn Every Moment into Opportunity with Psychographic Segmentation
Turn Every Moment into Opportunity with Psychographic Segmentation
 
Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Building a Cognitive Business – Josh Sutton, SapientRazorfish Data & Artifici...
Building a Cognitive Business – Josh Sutton, SapientRazorfish Data & Artifici...Building a Cognitive Business – Josh Sutton, SapientRazorfish Data & Artifici...
Building a Cognitive Business – Josh Sutton, SapientRazorfish Data & Artifici...
 
How AI is Impacting User Experience (UX)
How AI is Impacting User Experience (UX)How AI is Impacting User Experience (UX)
How AI is Impacting User Experience (UX)
 
Analytics what to look for sustaining your growing business-
Analytics   what to look for sustaining your growing business-Analytics   what to look for sustaining your growing business-
Analytics what to look for sustaining your growing business-
 
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jAI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
 
The Robot Marketeer
The Robot MarketeerThe Robot Marketeer
The Robot Marketeer
 
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social Media
 
Impacto del Big Data en la empresa española
Impacto del Big Data en la empresa españolaImpacto del Big Data en la empresa española
Impacto del Big Data en la empresa española
 
Introduction to Metrics - Tetuan Valley/CEU course, March 2014
Introduction to Metrics - Tetuan Valley/CEU course, March 2014Introduction to Metrics - Tetuan Valley/CEU course, March 2014
Introduction to Metrics - Tetuan Valley/CEU course, March 2014
 
El impacto del big data en la estrategia de los medios de comunicacion by Osc...
El impacto del big data en la estrategia de los medios de comunicacion by Osc...El impacto del big data en la estrategia de los medios de comunicacion by Osc...
El impacto del big data en la estrategia de los medios de comunicacion by Osc...
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Productionalize content recommendation engine
Productionalize content recommendation engine Productionalize content recommendation engine
Productionalize content recommendation engine
 
User Experience Portfolio. Sushmita Dutt
User Experience Portfolio. Sushmita DuttUser Experience Portfolio. Sushmita Dutt
User Experience Portfolio. Sushmita Dutt
 
Kp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptxKp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptx
 
A Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in ProductionA Tale of Two Case Studies: Using LLMs in Production
A Tale of Two Case Studies: Using LLMs in Production
 
Big Data Customer Experience Analytics -- The Next Big Opportunity for You
Big Data Customer Experience Analytics -- The Next Big Opportunity for You Big Data Customer Experience Analytics -- The Next Big Opportunity for You
Big Data Customer Experience Analytics -- The Next Big Opportunity for You
 
Data Science, Personalisation & Product management
Data Science, Personalisation & Product managementData Science, Personalisation & Product management
Data Science, Personalisation & Product management
 

Recently uploaded

DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 

Recently uploaded (17)

DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 

ML&AI APPROACH TO USER UNDERSTANDING ECOSYSTEM AT VCCORP Applications to News, Ads & E-commerce

  • 1. HOANG ANH TUAN CTO Admicro | VCCorp tuanhoanganh@vccorp.vn M L & A I A P P R O A C H T O U S E R U N D E R S TA N D I N G E C O S Y S T E M AT V C C O R P A p p l i c a t i o n s t o N e w s , A d s & E - c o m m e r c e
  • 2. A g e n d a 1. Overview o VCCorp in a nutshell o Milestones 2. Main Applications o News Distribution o News Recommendation Engine o A/B Testing for Evaluation o Advertising technology
  • 3. 1 . O ve r v iew
  • 4. V C C O R P i n a n u t s h e l l 43M PC audience Media PC, Mobile, Smart TV 38M Mobile audience Ad Network Payment Platform | Cloud Computing & Big Data Analysis VCCorp – Innovation. Non-stop! OVERVIEW o First mover DNA o 50% YoY Growth o 43M web audience o 38M mobile audience o 1700 employees INVESTORS Games Mobile games, web games, client games E-commerce 12M Visitors & Buyers
  • 5. V C C O R P M i l e s t o n e s
  • 6. V C C O R P C o m p u t i n g f a r m Storage Engine 6 PB Storage Engine 40TB GPU Farm 120 GPUs GTX 1080 Ti Compute Engine 20.000 CPUs 30 Tb processing per day 1500 models selected 10 billion calculations per day 2 billion records processing per day
  • 8. Cross Device User Content Consumption Fraud Detection Analytic Framework Anomaly Detection Video Analytic Cyber Security Advertising Technology CTR PredictionBidding Strategy Video Content Distribution News Distribution Breaking news Social news Events Detection Recommendation Engine News Streaming E-Commerce User Personalization Text Mining Sentiment Analysis Core ML & AI Core NLP Deep Learning Computer Vision Image Processing Voice Recognition Pattern Recognition Object Detection
  • 9. A next generation of Digital magazine VCCORP News distribution & News recommendation engine
  • 10. A I - p o w e r e d N e w s p l a t f o r m - E c o s y s t e m Consumption Articles Feeds Reader Logs Reader Rating Other endpoint... Hadoop Deep learning cluster Neural net Creation Articles Images Video Editors Other endpoint... Process, Analyze, Data-Mine, Understand, Organize Bigdata Platform AI Infrastructure, Machine Learning Services, Algorithms AI-assisted Content Consumption AI-assisted Content Creation
  • 11. AI INFRASTRUCTURE TO DISTRIBUTE AND PUBLISH NEWS N ew s d i s t r i b u t i o n e n g i n e ( N D E )
  • 12. I n t e r a t i v e V i d e o A d sN e w d i s t r i b u t i o n e n g i n e ( N D E ) VCCORP possesses many large online newspapers in Vietnam • 14 sites in top 100 websites in various categories: news, finance, family, youth, auto, hi-tech… AI-powered News Platform
  • 13. D i g i t a l m a g a z i n e e m p o w e r e d b y A I P l a t f o r m Broadsheets • Dantri.com.vn • SohaNews • Vtv.vn • Kenh14.vn SYSTEM INFO • > 3.6B request per day • > 120,000 concurrent requests CHALLENGES • 50M readers • 40% loyalty readers • 20% subscribe readers • Publish over 300 articles, blog spots and interactive stories per day Tech, Game • Gamek.vn • Genk.vn Finance • Cafef.vn • Cafebiz.vn Family • Afamily.vn
  • 14. E d i t o r s ’ c h a l l e n g e s 1. On an average, publishes over 300 articles, blog posts and interactive stories a day 2. Workload of editors: 24h/day, seven days a week sitting to review contents 3. Editors manually and and intuitively selected articles. Bad news Good news What kinds of breaking news stories will be selected? My decision is good? How to know that? Editor Emotions
  • 15. S o l u t i o n : A I - a s s i s t e d c o n t e n t c r e a t i o n RESULTSACTIONS Bob Select article per hour depending on his knowledge and mood! Few knowledge articles, limited content, poor curation, lots of dead links, and no semantic relationships EDITOR Bob EDITOR with AI-assisted Content Creation Bob’s replacement generates hundred of higher quality, highly curated, and semantically inter-linked articles, in the time it takes Bob to create just one… at a fraction of the cost Good content, greater curation, almost no dead links, and semantic relationships
  • 16. I n t e r a t i v e V i d e o A d s Features: • Measure reaction using Sampling technique • Select the news stories for the targeted audience • Select the high quality news stories • Breaking news, trending news and story development supported • Automatically ranking and replace low quality news stories by good ones We applied: User profile ● Demographic (age, gender, location) ● User Interest ● User Lookalike Content consumption ● Relation extraction ● Semantic representation ● Content Profile Context understanding ● Trending (long, short) ● Breaking news ● Timeliness S o l u t i o n : A I - a s s i s t e d c o n t e n t c r e a t i o n
  • 17. N D E – T e c h n i c a l A r c h i t e c t u r e
  • 18. N D E – A l g o r i t h m s w o r k i n g f l o w Event detection Feedback NewsDB Social Resources Analytic and Aggregation Engine Context Understanding breaking news Trending news Content Consumption Semantic representation Relation extraction User (Mass Reader) User Demographic Optimize algorithms and model Feedback Online Compute Service Selection and ranking Sampling News Algorithms Service Sampling box Streaming Box Algorithms Selection
  • 19. N D E – S a m p l i n g t e c h n i q u e XXXX XXXX XXX viral news stories Breaking news stories ... In-depth and story development news stories News Sampling News News News News News News News User Profiles Algorithms Selection - Time on read - Scroll & mouse movement - Social engagement
  • 20. N D E – R a n k i n g s c o r e Score time decaySampling Monte Carlo sampling Viral score Hot news stories ranking Total score Learning to rank algorithms
  • 21. I n t e r a t i v e V i d e o A d sN D E – C o n t e x t u a l u n d e r s t a n d i n g Problems: • Hot news detection • Breaking news detection • Trending detection • Event detection Contents Timeline Articles - 1 Articles - 2 Breaking period Trending news Emotion, Facts Location Text Base Event detection and Tracking new stories • Event detection & Tracking in News stream
  • 22. N D E – D e v e l o p i n g S t o r y News’ timeline Image Source: Breaking News Detection and Tracking in Twitter
  • 23. Editor setting Sampling Box Hot trending new Event and average trending news stories
  • 24. I n t e r a t i v e V i d e o A d sA / B T e s t i n g – F r a m e w o r k f o r E v a l u a t i o n Is Option A better than Option B? Let’s test: • Increase the rate of learning • Accelerate ranking algorithm • Evaluate many algorithms and models TRAFFIC A B Old Model New Model
  • 25. I n t e r a t i v e V i d e o A d sA / B T e s t i n g – A l g o r i t h m A s s e s s m e n t s Base-line Algorithm Experimental Algorithm ALG No.1 Group Users A ALG No.2 Group Users B
  • 26. I n t e r a t i v e V i d e o A d sN D E – E v a l u a t i o n CTR ( Click-Through Rate) Pageview
  • 27. N ew s r e c o m m e n d a t i o n e n g i n e ( N R E ) USER PREFERENCES AND HIS CONTEXT? GIVE IT TO OUR RECOMMENDATION ENGINE
  • 28. S o m e E v i d e n c e s GOOGLE NEWS 38% more click-through-rate due to recommendation NEMO (VCCORP) 40% sales from recommendation KENH14.VN | GENK.VN • 30% page-views more from recommendation boxes • 14% reading time more NETFLIX 2/3 rented movies from recommendation AMAZON 35% sales from recommendation
  • 29. I n t e r a t i v e V i d e o A d sN e w r e c o m m e n d a t i o n e n g i n e ( N R E ) Choose news that fits with many contexts and user preferences • Increase user experiments • Choose high quality-content for each user • Discover new contents for each user Relatedness News ● Relevant content ● Tag based recommender ● Semantic Entity Graph based recommender Personalized News ● Light-Personalization ● Deep Interest personalization Trending and Popular News ● Trending (long, short) ● Breaking news ● Developing Story
  • 30. N R E – G e n e r a l w o r k f l o w Graph processor • Aggregate edges relationship • Trims irrelevant ones • Find user look-alike Digital News ranking engine • Collect hot and breaking news • Ranking event by time and context Model Selection Algorithms • Model optimization MODEL parameters/ graphs User Graph DB Push news Machine Learning • Model selection and optimization • Optimize real time-decision Learning Framework (tensorflow, theano,...) Feedback update update
  • 31. N R E – A r c h i t e c t u r e h i g h l e v e l Offline Learning Online Learning Bigdata- HDFS User’s session and Profile query Event Distribution logs Learning management Services Recommended contents feedback Model Training Online Computeration model Online machine learning Machine learning Algorithms update update User Modeling User Profiling Behavior Analyzing User Profile load/ update feeds Content Analysing Relevant content Semantic Entity Graph
  • 32. N R E – H o w t o b u i l d u s e r m o d e l i n g 1. User’s network profile 2. User’s GraphDB 3. User profile representation 4. User behavior prediction
  • 33. N R E – U s e r ’ s n e t w o r k p r o f i l e > 50M web logs profiles > 10M VietID profiles >1M E-commerce profiles
  • 34. N R E – U s e r ’ s G r a p h D B • dantri.com.vn • kenh14.vn • User Demographic (Sex, Age) User Interesting • User Relationship • User Behavior • 100 M profiles and graph • 10 billion transactions per day • 30-35 MB/sec disk data-transfer rate = 4 months to read the web E-Commerce Sites Other sources
  • 35. N R E – U s e r p r o f i l e r e p r e s e n t a t i o n Brower’s logs Event’s logs Rating Service Feature Extraction User Interesting User Representation User Segmentation User Demographic Domain Model User model Inferencer Semantic Understanding Personalization User Graph Model Collaborative Filtering
  • 36. N R E – U s e r b e h a v i o r p r e d i c t i o n User Profile Interactive Contents LS t-1 LS t LS t+1 Times sequence features Observer(CNN) Inferencer(RNN) User LS1 ,LS2 ,LS3... L (Learning dimension| P)
  • 37. I n t e r a t i v e V i d e o A d s Purpose: Find new stories relevant to users, such as the right news at the right time, personalized supplements news stories N R E – C o n t e n t a n a l y s i n g Personalised News Relatedness News • Hybrid model (User Interesting + Collaborative Topic Modeling + Hot Trending news) • Relatedness new stories • Automated keyword tagging
  • 38. N R E – P e r s o n a l i s e d n e w s Article Representation: Convolution Neural Network + word2vec/fasttext User Representation: Temporal Convolutional Networks Figure: implemented and customized from paper: Embedding-based News Recommendation for Millions of Users - KDD 2017 Old model Customized Model Article Representation: Autoencoder User Representation: GRU
  • 39. N R E – R e l a t e d n e s s n e w s Image Title Description Feature extraction NLP Core Matching score
  • 40. N R E – R e l a t e d n e s s n e w s Box Relatedness news Box Relatedness news Keyword Extraction
  • 41. N R E – R e a l E s t a t e R e c o m m e n d a t i o n S y s t e m ● Recommend box based ○ Kind: apartments/house ○ Properties: ■ prices ■ location ■ F1/F2 Recommend Box
  • 42. N R E – S i t e s Box recommend Click & show popup Box Relatedness news
  • 43. I n t e r a t i v e V i d e o A d sN R E – E v a l u a t i o n f o r D i g i t a l m a g a z i n e
  • 44. N R E – E - c o m m e r c e a n d m o r e … Box recommend Increase 45% traffic from the Recommend Engine boxes
  • 45. A d ve r t i s i n g Te c h n o l o g y
  • 46. A D M I C R O – o v e r v i e w #1 adnetwork in Vietnam 200+ top publishers in Vietnam 10,000+ advertiser Formats Leading the marketshare with 22 formats (including display, video, native) Traffic 4B pageviews/month 1.5B impressions/day Coverage 38% marketshare 43M internet user reach (97.6% VN internet user) 38M mobile user reach (95% VN smartphone user)
  • 47. A I - p o w e r e d A d v e r t i s i n g T e c h n o l o g y CHALLENGES • 80 ms is the maximum time of a transaction • >1000 sites in Vietnam • 4.5B request/day • Number of transaction: $300,000/day Ad Optimization Retargeting ads User profiles • Age • Gender • Location • Behavior Realtime Bidding
  • 48. A d v e r t i s i n g T e c h n o l o g y – R e t a r g e t i n g Recommender System o CF, Content-based Filtering o Collaborative Topic Modeling Production Extraction o Deep Knowledge Network o Semantic Web and Ontology Analysis
  • 49. A d v e r t i s i n g T e c h n o l o g y – R e a l t i m e b i d d i n g A transaction (sell/purchase) of ad impressions is immediately proceeded when an audience triggers the ad zones CHALLENGES • Serving more 40M users • 80 ms is the max of a transaction • >1000 sites in Vietnam • 4.5B request/day • Number of transaction: $300,000/day • 430B estimated operations for each time SOLUTIONS • High load capacity web server • Optimization algorithms • Estimate and predict algorithms
  • 50. A d v e r t i s i n g T e c h n o l o g y – B i d d i n g a l g o r i t h m s ( a u t o m a t e d b u y i n g ) CTR/CVR Estimation • Logistic Regression • Factorization Machine • Online Bayesian Logistic Regression • Convolutional Click Prediction Model • Neural Networks Prediction Model Bid Landscape Forecasting • Xgboost (Gradient Boosting Tree) • Survival Tree Models Bidding Strategy Non-truthful Linear bidding Optimal RTB Bidding Strategy Contextual Bandits (Online Learning) Image adapted from : Real-Time Bidding based Display Advertising: Mechanisms and Algorithms - Jun Wang
  • 51. A d v e r t i s i n g T e c h n o l o g y – P e r f o r m a n c e Paid Network display Ads Targeted ads through Admarket DSP Conversion Rate (CVR) Before Optimization After Optimization Click-Through Rate (CTR) Optimization CTR (Click-Through Rate) The CTR was 41% higher than before. Thanks to the machine learning mechanism of the bidding engine. CVR (Conversion Rate) The CVR of targeted Ads through Admarket DSP was 7.4 higher than that of display network ads and 1.4 higher than that of paid search