Data science & Natural language
processing (NLP) for content
recommendation & personalized
experience
Kim Ming
Media Prima Digital (MPD)
WHO ARE YOUR USERS?
WHAT THEY LOOK FOR?
Overview
1. Business goal 2. Data & technology 3. Data science & NLP 4. Impact
Understand the business:
01 Business goal
- Understand 10+ millions user behavior
& better serve them
User behavior shift
Customer journey & decision
Personalized user experience
02 Data & technology
capture, store, process & create data-
driven application
Data pipelines
Enteprise
Datawarehouse
Modeling |
Analytic Layer
Main Source
Platform | API
Source Systems
Data Prep &
ETL Layer
Presentation
| Serving Layer
User & content & interaction data
Machine learning on GCP
03 Data Science
- Understand user behavior, serve them relevant content
- Recommendation engine
User profiling & personalized content & ads
User behavior & interaction User behavior profile and segment Personalized content & ads
Audience profile & segment
Music
Celebrities
Entertainment Seekers
Entertainment
English Literate
Malay Literate
Home Improvement
NST Users
Harian Metro Users
Berita Harian Users
Travel
Automobile
Tonton monthly subs – last visit 20 days
And many more… total 400+ audience profiles
Content Recommendation engine
User behavior & viewership User profile and segment Recommendation engine
trending
Content-based
User-based
collaborative
filtering (UBCF)
DEMO:
Recommendation engine
GCP CloudML
04 Natural language processing
- Building local Malay text processing library
- Document classification, brand-safe score & other
Same sentence, people say it differently :
● “I want a fresh apple”
● “Where can I find a fresh apple in this city to eat”
● “Ooi, give me apple la ….”
Languages are ambiguous
● “I love Blackberry”.
Existing solution provider & cloud not support Malay
language (yet) 
Build Malay text processing & NLP library
stemming stopwords Word segment Part-of-speech
Name entity
recognition
Abbrevation Ambigious term Word similarity Translation
Automate Document classification & Brand-Safe score
Automate Document classification & Brand-Safe score (2)
Not safe for advertiser!! 
Build local data dictionary
600+ content categories
1.1 millions
entities/document
English & Malay terms
Build local knowledge graph
Local word embedding & word similarity
Local word embedding & word similarity (2)
DEMO:
Document classification &
brand-safe score
05 Impact & Benefit
- Higher user engagement, better campaign performance
- Automate manual work such as document classification & label
+20% users +30% Page-views +25% user avg time spent
User profiling & content recommendation
50000+ documents/contents auto-tagged
50+ man-hours saved per week
Improved content categorization
Improved campaign performance
NLP & automate content classification
Question?
#nextxkl #next18extended
Thanks! 
#nextxkl #next18extended
ML Workflow
#nextxkl #next18extended
NSTP article recommendation

Google Next '18 extended -- data science & nlp for content recommendation