SlideShare a Scribd company logo
1 of 40
Download to read offline
HOANG ANH TUAN
CTO Admicro | VCCorp
tuanhoanganh@vccorp.vn
A I C O N T E N T P L AT F O R M
AT V C C O R P
A g e n d a
1. Overview
o VCCorp in a nutshell
o Milestones
2. Main Applications
o News Distribution &
Recommendation Engine
o A/B Testing for Evaluation
1. O ver view
V C C O R P i n a n u t s h e l l
43M
PC audience
Media
PC, Mobile, Smart TV
38M
Mobile audience
Ad Network
Payment Platform | Cloud Computing & Big Data Analysis
VCCorp – Innovation. Non-stop!
OVERVIEW
o First mover DNA
o 50% YoY Growth
o 43M web audience
o 38M mobile audience
o 1700 employees
INVESTORS
Games
Mobile games, web games,
client games
E-commerce 12M
Visitors & Buyers
V C C O R P M i l e s t o n e s
2007
HadoopCluster
Bamboo
searchserving
|10Musers
Analytic
Platform
2010
MachineLearningCore
Adnetwork
CoreNLP/
DeepLearning
Adnetwork2.0Bidding
Web4.0
AutopubNews
2014
AIPlatform
V C C O R P C o m p u t i n g f a r m
Storage Engine
6 PB
Storage Engine
40TB
GPU Farm
120 GPUs GTX 1080 Ti
Compute Engine
20.000 CPUs
30 Tb processing
per day
500 models selected
10 billion calculations
per day
7 billion records
processing per day
2. Main Applications
Cross Device
User Content
Consumption
Fraud Detection
Analytic Framework
Anomaly
Detection
Video Analytic
Cyber Security
Advertising Technology
CTR Prediction
Bid
Landscape
Bidding
Strategy
Retargeting
Video Content Distribution
News Distribution
Breaking
news
Social news
Events
Detection
Recommendation Engine
News
Streaming
E-Commerce
User
Personalization
Text Mining
Sentiment Analysis
Core ML & AI Core NLP
Deep Learning
Computer Vision
Image
Processing
Voice
Recognition
Pattern
Recognition
Object
Detection
A next generation
of Digital content
VCCORP
News distribution & News recommendation engine
A I - p o w e r e d C o n t e n t p l a t f o r m - E c o s y s t e m
Consumption
Articles
Feeds
Reader Logs
Reader Rating
Other
endpoint...
Hadoop
Deep learning cluster Neural net
Creation
Articles
Images
Video
Editors
Other
endpoint...
Process, Analyze, Data-Mine, Understand, Organize
Bigdata Platform
AI Infrastructure, Machine Learning Services, Algorithms
AI-assisted
Content
Consumption
AI-assisted
Content
Creation
AI INFRASTRUCTURE TO DISTRIBUTE AND PUBLISH NEWS
N ew s d i s t r i b u t i o n e n g i n e
( N D E )
I n t e r a t i v e V i d e o A d sN e w d i s t r i b u t i o n e n g i n e ( N D E )
VCCORP possesses many large online newspapers in Vietnam
• 14 sites in top 100 websites in various categories: news, finance, family, youth, auto, hi-tech…
AI-powered
News Platform
D i g i t a l m a g a z i n e e m p o w e r e d b y
A I P l a t f o r m
Broadsheets
• Dantri.com.vn
• SohaNews
• Vtv.vn
• Kenh14.vn
SYSTEM INFO
• > 3.6B request per day
• > 120,000 concurrent requests
CHALLENGES
• 50M readers
• 40% loyalty readers
• 20% subscribe readers
• Publish over 300 articles, blog
spots and interactive stories
per day
Tech, Game
• Gamek.vn
• Genk.vn
Finance
• Cafef.vn
• Cafebiz.vn
Family
• Afamily.vn
E d i t o r s ’ c h a l l e n g e s
1. On an average, publishes
over 300 articles, blog posts
and interactive stories a day
2. Workload of editors:
24h/day, seven days a week
sitting to review contents
3. Editors manually and and
intuitively selected articles.
Bad news Good news
What kinds of breaking news stories
will be selected?
My decision is good? How to
know that?
Editor Emotions
S o l u t i o n : A I - a s s i s t e d c o n t e n t d i s t r i b u t i o n
RESULTSACTIONS
Bob Select article per hour depending on
his knowledge and mood!
Few knowledge articles, limited content,
poor curation, lots of dead links, and no
semantic relationships
EDITOR
Bob
EDITOR
with
AI-assisted
Distribution
Bob’s replacement generates hundred of
higher quality, highly curated, and
semantically inter-linked articles,
in the time it takes Bob to create just
one… at a fraction of the cost
Good content, greater
curation, almost no dead links,
and semantic relationships
N ew s r e c o m m e n d a t i o n
e n g i n e ( N R E )
USER PREFERENCES AND HIS CONTEXT? GIVE IT TO
OUR RECOMMENDATION ENGINE
S o m e E v i d e n c e s
GOOGLE NEWS
38% more click-through-rate due to
recommendation
NEMO (VCCORP)
40% sales from recommendation
KENH14.VN | GENK.VN
• 30% page-views more
from recommendation
boxes
• 14% reading time more
NETFLIX
2/3 rented movies from
recommendation
AMAZON
35% sales from recommendation
N e w r e c o m m e n d a t i o n e n g i n e ( N R E ) –
I m p o r t a n t t o m e
Deep learning
Model - Ranking
For you
Hot & Popular
In case you
missed it
News Timeline
I n t e r a t i v e V i d e o A d sN e w r e c o m m e n d a t i o n e n g i n e ( N R E )
Choose news that fits with many contexts and user preferences
• Increase user experiments
• Choose high quality-content for each user
• Discover new contents for each user
Relatedness News
● Relevant content
● Tag based
recommender
● Semantic Entity Graph
based recommender
Personalized News
● Light-Personalization
● Deep Interest
personalization
Trending and Popular
News
● Trending (long, short)
● Breaking news
● Developing Story
User Profiles
● Demographic (age,
gender, location)
● User interest
● User lookalike
N R E – G e n e r a l w o r k f l o w
Graph processor
• Aggregate edges relationship
• Trims irrelevant ones
• Find user look-alike
Digital News ranking engine
• Collect hot and breaking news
• Ranking event by time and context
Model Selection Algorithms
• Model optimization
MODEL
parameters/ graphs
User Graph DB
Push news
Machine Learning
• Model selection and optimization
• Optimize real time-decision
Learning Framework (tensorflow, theano,...)
Feedback
update
update
N R E – A r c h i t e c t u r e h i g h l e v e l
Offline Learning
Online Learning
Bigdata-
HDFS
User’s session
and Profile
query
Event
Distribution
logs
Learning
management
Services
Recommended
contents
feedback
Model Training
Online
Computeration
model
Online
machine learning
Machine learning
Algorithms
update
update
User Modeling
User Profiling
Behavior
Analyzing
User
Profile
load/
update
feeds
Content Analysing
Relevant content
Semantic Entity
Graph
N R E – H o w t o b u i l d u s e r m o d e l i n g
1. User’s network
profile
2. User’s GraphDB 3. User profile
representation
4. User behavior
prediction
N R E – U s e r p r o f i l e r e p r e s e n t a t i o n
Brower’s
logs
Event’s
logs
Rating
Service
Feature
Extraction
User Interesting
User Representation
User Segmentation
User Demographic
Domain Model
User model
Inferencer
Semantic
Understanding
Personalization
User Graph
Model
Collaborative
Filtering
N R E – U s e r b e h a v i o r p r e d i c t i o n
User
Profile
Interactive Contents
LS t-1
LS t
LS t+1
Times
sequence
features
Observer(CNN)
Inferencer(RNN)
User
LS1 ,LS2 ,LS3...
L (Learning dimension| P)
I n t e r a t i v e V i d e o A d s
Purpose: Find new stories relevant to users, such as the right news at the
right time, personalized supplements news stories
N R E – C o n t e n t a n a l y s i n g
Personalised News Relatedness News
• Hybrid model
(User Interesting +
Collaborative Topic Modeling +
Hot Trending news)
• Relatedness new stories
• Automated keyword
tagging
N R E – P e r s o n a l i s e d n e w s
Article Representation: Convolution
Neural Network + word2vec/fasttext
User Representation: Temporal
Convolutional Networks
Figure: implemented and customized from paper: Embedding-based News Recommendation for Millions of Users - KDD 2017
Old model Customized Model
Article Representation: Autoencoder
User Representation: GRU
N R E – R e l a t e d n e s s n e w s
Box Relatedness news
Box Relatedness news
Keyword Extraction
N R E – R e l a t e d n e s s n e w s
Image Title
Description
Feature extraction
NLP Core
Matching score
N R E – C o r e N L P ( S o m e r e s u l t s )
Task Data Evaluate Note
Segmentation Social (VCCorp Dataset) F1 = 94.20
joint learning segmentation +
POS Tagging
News (VLSP 2013)
News (Viet Treebank)
News (VCTreebank)
Social (VCCorp Dataset)
POS Tagging F1 = 96.30
F1 = 93.52
F1 = 93.20
F1 = 80.91
SOTA (State of the art)
SOTA (State of the art)
joint Dependency Parsing + POS
Tagging
joint learning segmentation +
POS Tagging
News (VLSP 2016)
News (VCCorp Dataset)
NER F1= 94.88
F1= 93.08
BiLSTM-CRFs + POS + Chunk
BiLSTM-CRFs + POS + Chunk
News (VCTreebank)
News (VCTreebank)
Dependency
Grammar
News (VCTreebank)
UAS = 86.01
LAS = 81.79
UAS = 89.21
LAS = 87.38
joint Dependency Parsing + POS
Tagging
BiLSTM with deep biaffine
attention + Word2vec
N R E – K e y w o r d e x t r a c t i o n
Text Rank (Customized version)
RAKE (Customized version)
Keyword
Extraction
N R E – T e x t s u m m a r i z a t i o n
http://soha.vn/truc-thang-mi-4-phuc-vu-chien-dau-tai-chien-
truong-lao-20180702143936172.htm
Vào những năm cuối thập niên 60 của thế kỉ 20, tình hình nước bạn
Lào có nhiều diễn biến phức tạp, theo yêu cầu của mặt trận Lào
yêu nước, một bộ phận quân tình nguyện Việt Nam được cử sang
phối hợp chiến đấu với bạn. Lực lượng Không quân Mỹ đã thử
nghiệm thành công bom điều khiển hạt nhân B-61-12. Trong quá
trình thử nghiệm, các máy bay ném bom chiến lược B-2 Spirit đã
được triển khai sử dụng. Hoạt động kiểm tra chuyến bay này cho
thấy thiết kế của B61-12 đáp ứng đòi hỏi của hệ thống và tiến bộ
tiếp theo của chương trình mở rộng thời hạn hoạt động B61-12
theo yêu cầu an ninh quốc gia. Mỹ dự định triển khai B61-12 tại các
căn cứ quân sự ở châu Âu như ở Đức, Ý,…
Summarization
model
Image source: Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
N R E – I m a g e u n d e r s t a n d i n g
Object
Detection
Crowd
detection
Logo
Detection
Facial Detection
(Quyền Linh)
I n t e r a t i v e V i d e o A d sN R E – T r e n d i n g & p o p u l a r N e w s
Problems:
• Hot news detection
• Breaking news detection
• Trending detection
• Event detection
Contents
Timeline
Articles - 1
Articles - 2
Breaking period
Trending news
Emotion, Facts
Location
Text Base
Event detection and Tracking new stories
• Event detection & Tracking in News stream
N R E – S i t e s
Box recommend
Click &
show popup
Box Relatedness
news
I n t e r a t i v e V i d e o A d sA / B T e s t i n g – F r a m e w o r k f o r E v a l u a t i o n
Is Option A better than Option B? Let’s test:
• Increase the rate of learning
• Accelerate ranking algorithm
• Evaluate many algorithms and models
TRAFFIC
A
B
Old
Model
New
Model
50%
50%
I n t e r a t i v e V i d e o A d sA / B T e s t i n g – A l g o r i t h m A s s e s s m e n t s
Base-line Algorithm
Experimental Algorithm
ALG No.1
Group Users A
ALG No.2
Group Users B
A / B T e s t i n g – B a n d i t s e l e c t i o n
AB Testing: Random selection Bandit: Value weighted selection
Source: https://conductrics.com/balancing-earning-with-learning-bandits-and-adaptive-optimization/
M e t r i c s o f S u c c e s s
Optim
izationm
etrics
Behavioral
Metrics
(CTR, CTP, Time on Read)
Businessmetrics
What are new metrics of success ?
Based on reward functions:
• Mean (as threshold)
• Additive (based on mean and standard deviation)
• Ex:
I n t e r a t i v e V i d e o A d sN R E – E v a l u a t i o n f o r D i g i t a l m a g a z i n e
Thank you

More Related Content

Similar to VCCORP SoICT 2018

La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphes
Cédric Fauvet
 
Metro design primer
Metro design primerMetro design primer
Metro design primer
Andy Chiang
 

Similar to VCCORP SoICT 2018 (20)

Prateek Agnihotri5
Prateek Agnihotri5Prateek Agnihotri5
Prateek Agnihotri5
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphes
 
Building a Cognitive Business – Josh Sutton, SapientRazorfish Data & Artifici...
Building a Cognitive Business – Josh Sutton, SapientRazorfish Data & Artifici...Building a Cognitive Business – Josh Sutton, SapientRazorfish Data & Artifici...
Building a Cognitive Business – Josh Sutton, SapientRazorfish Data & Artifici...
 
[Partner TechShift 2017] AWS와 함께하는 글로벌 클라우드 소프트웨어 사업
[Partner TechShift 2017] AWS와 함께하는 글로벌 클라우드 소프트웨어 사업[Partner TechShift 2017] AWS와 함께하는 글로벌 클라우드 소프트웨어 사업
[Partner TechShift 2017] AWS와 함께하는 글로벌 클라우드 소프트웨어 사업
 
Recommendations and Statistics with Graph Databases
Recommendations and Statistics with Graph DatabasesRecommendations and Statistics with Graph Databases
Recommendations and Statistics with Graph Databases
 
Technology and UX,UI design trends for 2023
Technology and UX,UI design trends for 2023Technology and UX,UI design trends for 2023
Technology and UX,UI design trends for 2023
 
Crowd Documentation - How Programmer Social Communities are Flipping Software...
Crowd Documentation - How Programmer Social Communities are Flipping Software...Crowd Documentation - How Programmer Social Communities are Flipping Software...
Crowd Documentation - How Programmer Social Communities are Flipping Software...
 
Clicks, Conversions and Crawls
Clicks, Conversions and CrawlsClicks, Conversions and Crawls
Clicks, Conversions and Crawls
 
Data Science, Personalisation & Product management
Data Science, Personalisation & Product managementData Science, Personalisation & Product management
Data Science, Personalisation & Product management
 
Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018
 
Graphs in Action: In-depth look at Neo4j in Production
Graphs in Action: In-depth look at Neo4j in ProductionGraphs in Action: In-depth look at Neo4j in Production
Graphs in Action: In-depth look at Neo4j in Production
 
Actual cases of applying AI related technologiesin Rakuten
Actual cases of applying AI related technologiesin RakutenActual cases of applying AI related technologiesin Rakuten
Actual cases of applying AI related technologiesin Rakuten
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMO
 
Metro design primer
Metro design primerMetro design primer
Metro design primer
 
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social Media
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
Analytics what to look for sustaining your growing business-
Analytics   what to look for sustaining your growing business-Analytics   what to look for sustaining your growing business-
Analytics what to look for sustaining your growing business-
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Data Science for Online Services: Problems & Frontiers (Changbal Conference 2...
Data Science for Online Services: Problems & Frontiers (Changbal Conference 2...Data Science for Online Services: Problems & Frontiers (Changbal Conference 2...
Data Science for Online Services: Problems & Frontiers (Changbal Conference 2...
 
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
 

Recently uploaded

原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Recently uploaded (20)

Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .ppt
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
 
👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...
👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...
👉 Tirunelveli Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Gir...
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 

VCCORP SoICT 2018

  • 1. HOANG ANH TUAN CTO Admicro | VCCorp tuanhoanganh@vccorp.vn A I C O N T E N T P L AT F O R M AT V C C O R P
  • 2. A g e n d a 1. Overview o VCCorp in a nutshell o Milestones 2. Main Applications o News Distribution & Recommendation Engine o A/B Testing for Evaluation
  • 3. 1. O ver view
  • 4. V C C O R P i n a n u t s h e l l 43M PC audience Media PC, Mobile, Smart TV 38M Mobile audience Ad Network Payment Platform | Cloud Computing & Big Data Analysis VCCorp – Innovation. Non-stop! OVERVIEW o First mover DNA o 50% YoY Growth o 43M web audience o 38M mobile audience o 1700 employees INVESTORS Games Mobile games, web games, client games E-commerce 12M Visitors & Buyers
  • 5. V C C O R P M i l e s t o n e s 2007 HadoopCluster Bamboo searchserving |10Musers Analytic Platform 2010 MachineLearningCore Adnetwork CoreNLP/ DeepLearning Adnetwork2.0Bidding Web4.0 AutopubNews 2014 AIPlatform
  • 6. V C C O R P C o m p u t i n g f a r m Storage Engine 6 PB Storage Engine 40TB GPU Farm 120 GPUs GTX 1080 Ti Compute Engine 20.000 CPUs 30 Tb processing per day 500 models selected 10 billion calculations per day 7 billion records processing per day
  • 8. Cross Device User Content Consumption Fraud Detection Analytic Framework Anomaly Detection Video Analytic Cyber Security Advertising Technology CTR Prediction Bid Landscape Bidding Strategy Retargeting Video Content Distribution News Distribution Breaking news Social news Events Detection Recommendation Engine News Streaming E-Commerce User Personalization Text Mining Sentiment Analysis Core ML & AI Core NLP Deep Learning Computer Vision Image Processing Voice Recognition Pattern Recognition Object Detection
  • 9. A next generation of Digital content VCCORP News distribution & News recommendation engine
  • 10. A I - p o w e r e d C o n t e n t p l a t f o r m - E c o s y s t e m Consumption Articles Feeds Reader Logs Reader Rating Other endpoint... Hadoop Deep learning cluster Neural net Creation Articles Images Video Editors Other endpoint... Process, Analyze, Data-Mine, Understand, Organize Bigdata Platform AI Infrastructure, Machine Learning Services, Algorithms AI-assisted Content Consumption AI-assisted Content Creation
  • 11. AI INFRASTRUCTURE TO DISTRIBUTE AND PUBLISH NEWS N ew s d i s t r i b u t i o n e n g i n e ( N D E )
  • 12. I n t e r a t i v e V i d e o A d sN e w d i s t r i b u t i o n e n g i n e ( N D E ) VCCORP possesses many large online newspapers in Vietnam • 14 sites in top 100 websites in various categories: news, finance, family, youth, auto, hi-tech… AI-powered News Platform
  • 13. D i g i t a l m a g a z i n e e m p o w e r e d b y A I P l a t f o r m Broadsheets • Dantri.com.vn • SohaNews • Vtv.vn • Kenh14.vn SYSTEM INFO • > 3.6B request per day • > 120,000 concurrent requests CHALLENGES • 50M readers • 40% loyalty readers • 20% subscribe readers • Publish over 300 articles, blog spots and interactive stories per day Tech, Game • Gamek.vn • Genk.vn Finance • Cafef.vn • Cafebiz.vn Family • Afamily.vn
  • 14. E d i t o r s ’ c h a l l e n g e s 1. On an average, publishes over 300 articles, blog posts and interactive stories a day 2. Workload of editors: 24h/day, seven days a week sitting to review contents 3. Editors manually and and intuitively selected articles. Bad news Good news What kinds of breaking news stories will be selected? My decision is good? How to know that? Editor Emotions
  • 15. S o l u t i o n : A I - a s s i s t e d c o n t e n t d i s t r i b u t i o n RESULTSACTIONS Bob Select article per hour depending on his knowledge and mood! Few knowledge articles, limited content, poor curation, lots of dead links, and no semantic relationships EDITOR Bob EDITOR with AI-assisted Distribution Bob’s replacement generates hundred of higher quality, highly curated, and semantically inter-linked articles, in the time it takes Bob to create just one… at a fraction of the cost Good content, greater curation, almost no dead links, and semantic relationships
  • 16. N ew s r e c o m m e n d a t i o n e n g i n e ( N R E ) USER PREFERENCES AND HIS CONTEXT? GIVE IT TO OUR RECOMMENDATION ENGINE
  • 17. S o m e E v i d e n c e s GOOGLE NEWS 38% more click-through-rate due to recommendation NEMO (VCCORP) 40% sales from recommendation KENH14.VN | GENK.VN • 30% page-views more from recommendation boxes • 14% reading time more NETFLIX 2/3 rented movies from recommendation AMAZON 35% sales from recommendation
  • 18. N e w r e c o m m e n d a t i o n e n g i n e ( N R E ) – I m p o r t a n t t o m e Deep learning Model - Ranking For you Hot & Popular In case you missed it News Timeline
  • 19. I n t e r a t i v e V i d e o A d sN e w r e c o m m e n d a t i o n e n g i n e ( N R E ) Choose news that fits with many contexts and user preferences • Increase user experiments • Choose high quality-content for each user • Discover new contents for each user Relatedness News ● Relevant content ● Tag based recommender ● Semantic Entity Graph based recommender Personalized News ● Light-Personalization ● Deep Interest personalization Trending and Popular News ● Trending (long, short) ● Breaking news ● Developing Story User Profiles ● Demographic (age, gender, location) ● User interest ● User lookalike
  • 20. N R E – G e n e r a l w o r k f l o w Graph processor • Aggregate edges relationship • Trims irrelevant ones • Find user look-alike Digital News ranking engine • Collect hot and breaking news • Ranking event by time and context Model Selection Algorithms • Model optimization MODEL parameters/ graphs User Graph DB Push news Machine Learning • Model selection and optimization • Optimize real time-decision Learning Framework (tensorflow, theano,...) Feedback update update
  • 21. N R E – A r c h i t e c t u r e h i g h l e v e l Offline Learning Online Learning Bigdata- HDFS User’s session and Profile query Event Distribution logs Learning management Services Recommended contents feedback Model Training Online Computeration model Online machine learning Machine learning Algorithms update update User Modeling User Profiling Behavior Analyzing User Profile load/ update feeds Content Analysing Relevant content Semantic Entity Graph
  • 22. N R E – H o w t o b u i l d u s e r m o d e l i n g 1. User’s network profile 2. User’s GraphDB 3. User profile representation 4. User behavior prediction
  • 23. N R E – U s e r p r o f i l e r e p r e s e n t a t i o n Brower’s logs Event’s logs Rating Service Feature Extraction User Interesting User Representation User Segmentation User Demographic Domain Model User model Inferencer Semantic Understanding Personalization User Graph Model Collaborative Filtering
  • 24. N R E – U s e r b e h a v i o r p r e d i c t i o n User Profile Interactive Contents LS t-1 LS t LS t+1 Times sequence features Observer(CNN) Inferencer(RNN) User LS1 ,LS2 ,LS3... L (Learning dimension| P)
  • 25. I n t e r a t i v e V i d e o A d s Purpose: Find new stories relevant to users, such as the right news at the right time, personalized supplements news stories N R E – C o n t e n t a n a l y s i n g Personalised News Relatedness News • Hybrid model (User Interesting + Collaborative Topic Modeling + Hot Trending news) • Relatedness new stories • Automated keyword tagging
  • 26. N R E – P e r s o n a l i s e d n e w s Article Representation: Convolution Neural Network + word2vec/fasttext User Representation: Temporal Convolutional Networks Figure: implemented and customized from paper: Embedding-based News Recommendation for Millions of Users - KDD 2017 Old model Customized Model Article Representation: Autoencoder User Representation: GRU
  • 27. N R E – R e l a t e d n e s s n e w s Box Relatedness news Box Relatedness news Keyword Extraction
  • 28. N R E – R e l a t e d n e s s n e w s Image Title Description Feature extraction NLP Core Matching score
  • 29. N R E – C o r e N L P ( S o m e r e s u l t s ) Task Data Evaluate Note Segmentation Social (VCCorp Dataset) F1 = 94.20 joint learning segmentation + POS Tagging News (VLSP 2013) News (Viet Treebank) News (VCTreebank) Social (VCCorp Dataset) POS Tagging F1 = 96.30 F1 = 93.52 F1 = 93.20 F1 = 80.91 SOTA (State of the art) SOTA (State of the art) joint Dependency Parsing + POS Tagging joint learning segmentation + POS Tagging News (VLSP 2016) News (VCCorp Dataset) NER F1= 94.88 F1= 93.08 BiLSTM-CRFs + POS + Chunk BiLSTM-CRFs + POS + Chunk News (VCTreebank) News (VCTreebank) Dependency Grammar News (VCTreebank) UAS = 86.01 LAS = 81.79 UAS = 89.21 LAS = 87.38 joint Dependency Parsing + POS Tagging BiLSTM with deep biaffine attention + Word2vec
  • 30. N R E – K e y w o r d e x t r a c t i o n Text Rank (Customized version) RAKE (Customized version) Keyword Extraction
  • 31. N R E – T e x t s u m m a r i z a t i o n http://soha.vn/truc-thang-mi-4-phuc-vu-chien-dau-tai-chien- truong-lao-20180702143936172.htm Vào những năm cuối thập niên 60 của thế kỉ 20, tình hình nước bạn Lào có nhiều diễn biến phức tạp, theo yêu cầu của mặt trận Lào yêu nước, một bộ phận quân tình nguyện Việt Nam được cử sang phối hợp chiến đấu với bạn. Lực lượng Không quân Mỹ đã thử nghiệm thành công bom điều khiển hạt nhân B-61-12. Trong quá trình thử nghiệm, các máy bay ném bom chiến lược B-2 Spirit đã được triển khai sử dụng. Hoạt động kiểm tra chuyến bay này cho thấy thiết kế của B61-12 đáp ứng đòi hỏi của hệ thống và tiến bộ tiếp theo của chương trình mở rộng thời hạn hoạt động B61-12 theo yêu cầu an ninh quốc gia. Mỹ dự định triển khai B61-12 tại các căn cứ quân sự ở châu Âu như ở Đức, Ý,… Summarization model Image source: Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
  • 32. N R E – I m a g e u n d e r s t a n d i n g Object Detection Crowd detection Logo Detection Facial Detection (Quyền Linh)
  • 33. I n t e r a t i v e V i d e o A d sN R E – T r e n d i n g & p o p u l a r N e w s Problems: • Hot news detection • Breaking news detection • Trending detection • Event detection Contents Timeline Articles - 1 Articles - 2 Breaking period Trending news Emotion, Facts Location Text Base Event detection and Tracking new stories • Event detection & Tracking in News stream
  • 34. N R E – S i t e s Box recommend Click & show popup Box Relatedness news
  • 35. I n t e r a t i v e V i d e o A d sA / B T e s t i n g – F r a m e w o r k f o r E v a l u a t i o n Is Option A better than Option B? Let’s test: • Increase the rate of learning • Accelerate ranking algorithm • Evaluate many algorithms and models TRAFFIC A B Old Model New Model 50% 50%
  • 36. I n t e r a t i v e V i d e o A d sA / B T e s t i n g – A l g o r i t h m A s s e s s m e n t s Base-line Algorithm Experimental Algorithm ALG No.1 Group Users A ALG No.2 Group Users B
  • 37. A / B T e s t i n g – B a n d i t s e l e c t i o n AB Testing: Random selection Bandit: Value weighted selection Source: https://conductrics.com/balancing-earning-with-learning-bandits-and-adaptive-optimization/
  • 38. M e t r i c s o f S u c c e s s Optim izationm etrics Behavioral Metrics (CTR, CTP, Time on Read) Businessmetrics What are new metrics of success ? Based on reward functions: • Mean (as threshold) • Additive (based on mean and standard deviation) • Ex:
  • 39. I n t e r a t i v e V i d e o A d sN R E – E v a l u a t i o n f o r D i g i t a l m a g a z i n e