SlideShare a Scribd company logo
Data science @
RTL Nederland
Longhow Lam
@longhowlam
1st Master Search
Advanced Analytics meetup
5-juli-2017
Agenda
 RTL Data science set up
 Some data science topics @RTL
 Text mining
 Computer vision
 Association rules
 Power BI
RTL Data science set up
Source Data
 Click data
 Heartbeat data
 Account data
 Location
 Metadata
 Campaign data
 Etc..
Data science team
4 data engineers
4 data scientists
The main businesses at RTL we work for
ETL processes
Find out for your self:
https://www.rtl.nl/werkenbij/
Use cases
 Churn modeling
 Response modeling
 Customer segmentation
 Look-alikes-for Advertisers
 Recommendation engines
Content similarity
Which movies on Videoland are close to each other?
Which news articles on RTL Nieuws are close to each other?
movies, we can look at the movie summaries or Video captures
news articles, we can look the text of the articles or corresponding news image
Hence text mining and computer vision
Text mining
Text mining
2000 movie plots / summaries on VideoLand
For each movie plot: count the words / terms
Put the counts in a so-called term document matrix
There are around 50.000 terms in the 2000 movie plots
Usually this matrix is very sparse
Aap Film Auto …. Leven …. …. Zwaar
Film1 1 4 8
Film2 10
Film3 5
… 1
…
… 6 6
Film2000 1 8
Term document matrix
Similarity: cosine similarity
Between movies (so between rows of the matrix) we can now calculate similarities
A distance that is often used is cosine similarity
Visually we can see this distance in the following figure:
Suppose we only have two terms:
1. Leven
2. Spannend
# leven
# Spannend
Film 1
Film 2
cosine similarity
VideoLand To get a feeling for Movie similarities we created a small shiny app
RTL Nieuws power-bi dashboard to see article similarties
Computer vision
Two approaches used @RTL
Computer Vision API from Vendors (Microsoft, Clarifai, Google,…)
Tweak things ourselves with Keras/Tensorflow
+ Ready and Easy to use (Just send your image to them)
+ Not too expensive ($0.84 / 1000 images)
- No control on what is returned
- Takes more effort to set it up
- Needs more knowledge
+ More control on what you are doing
RTL Nieuws image: API examples
Feature Name Value
Description { "type": 0, "captions": [ { "text": "a group of people
sitting on a table", "confidence":
0.4894670976127814 } ] }
Tags [ { "name": "person", "confidence":
0.996391236782074 }, { "name": "indoor",
"confidence": 0.9104063510894775 }, { "name":
"people", "confidence": 0.7057779431343079 } ]
Image Format Jpeg
Image Dimensions 4096 x 3078
Clip Art Type 0 Non-clipart
Line Drawing Type 0 Non-LineDrawing
Black & White Image False
Is Adult Content False
Adult Score 0.042066238820552826
Is Racy Content False
Racy Score 0.061784882098436356
Categories [ { "name": "people_many", "score": 0.9296875 } ]
Faces [ { "age": 52, "gender": "Male", "faceRectangle": {
"width": 298, "height": 298, "left": 433, "top": 1370 }
}, { "age": 78, "gender": "Male", "faceRectangle": {
"width": 269, "height": 269, "left": 3212, "top": 1410
} }, { "age": 64, "gender": "Male", "faceRectangle": {
"width": 241, "height": 241, "left": 2108, "top": 1534
} } ]
Feature Name Value
Description { "type": 0, "captions": [ { "text": "Linda de Mol
talking on a cell phone", "confidence":
0.46178352459016536 } ] }
Tags [ { "name": "person", "confidence":
0.9999904632568359 }, { "name": "outdoor",
"confidence": 0.9974232912063599 }, { "name":
"woman", "confidence": 0.9967917799949646 }, {
"name": "lady", "confidence":
0.7658315896987915 } ]
Image Format Jpeg
Image Dimensions 1024 x 421
Clip Art Type 0 Non-clipart
Line Drawing Type 0 Non-LineDrawing
Black & White
Image
False
Is Adult Content False
Adult Score 0.009753250516951084
Is Racy Content False
Racy Score 0.014254707843065262
Categories [ { "name": "people_portrait", "score": 0.96875 } ]
Faces [ { "age": 28, "gender": "Female", "faceRectangle": {
"width": 282, "height": 282, "left": 286, "top": 35 } }
]
RTL Nieuws image: API examples
See TrelliscopJS app
Tweak things ourselves with Keras
 Keras is a high-level neural networks API running on top of
 either TensorFlow
 or Theano.
 and now also CNTK
 Developed for fast experimentation.
 Easier to use than tensorflow, but you still have lot’s of options
 There is now also an R interface (of course created by Rstudio… )
Keras: Simpel set-up “Architecture”
Tensorflow installed on a (linux) machine
Ideally with lots of GPU’s 
pip install keras
You’re good to go in Python
(Jupyter notebooks)
install_github("rstudio/keras")
You’re good to go in
R / RStudio
Example in R: Neural network with two hidden layers
Pixel 3
Pixel 2
Pixel 1
Pixel 783
Pixel 784
Label 0
Label 9
Using pre-trained models
Image classifiers have been trained on big GPU machines
for weeks with millions of pictures on very large networks
Not many people do that from scratch. Instead, one can
use pre-trained networks and start from there.
VGG19 deep learning model
143 million weights!!!
predict image class using pretrained models
RTL NIEUWS Images labeled with resnet and vgg16
Link to trellisJS app
Extract features using pre-trained models
Remove top layers for feature extraction
We have a 7*7*512 ‘feature’ tensor = 25.088 values
Only a few lines of R code
RTL NIEUWS Image similarity
1024 RTL Nieuws Sample pictures. Compute for each image the 25.088 feature values.
Calculate for each image the top 10 closest images, based on cosine similarity.
Little Shiny APP
Examples RTL Nieuws image similarities
Examples RTL Nieuws image similarities
The Brad Pitt
Similarity index
Take five Brad Pitt pictures
Run them trough the pre-trained
vgg16 and extract feature vectors.
This is a 5 by 25088 matrix
The brad Pit Index
Take other images, run them through the VGG16
Calculate the distances with the five Brad Pitt pictures and average:
0.771195 0.802654 0.714752 0.792587 0.8291976 0.80969440.665990 0.9737212
0.6273 0.5908 0.8231 0.7711 0.8839 0.8975 0.6934 0.9659
Focusing on only the face!!
Can you shake hands with your neighbor?
A little Statistical Experiment
Can you shake hands with your neighbor?
A little Statistical Experiment
50.1% of people don’t wash their
hands after visiting the toilet
A little Statistical Experiment
Can you shake hands with your neighbor?
A little Statistical Experiment
84.6% of all statistics are just
made up on the spot !!
Association Rules
Mining
Association Rules Mining
Market basket analysis
 Association rules mining (arm)
Mixture of different methods
 Ensemble
ARM is one of several so called collaborative filter algorithms
Collaborative filtering is a method of making recommendations
about the interests of one user (filter) by collecting preferences
or behavior from many users (collaborating).
Memory-based algorithms
 Slope one (slope1)
 K nearest neighbors (knn)
Model-based algorithms
 Matrix factorization methods
Association rule mining
The basics
 Identify frequent item sets (or rules) in the customer transaction data:
 IF item X THEN item Y
 IF item A and B THEN item very likely item C
 Not all rules are interesting, use ‘support’ and ‘lift’ to judge importance of a rule
# trxs. {X}  {Y}
Total # trxs.
Support (X,Y) =
Lift (X,Y) =
Support (X,Y)
Support (X) * Support(Y)
Support & Lift
GTST  Nieuwe Tijden 10.8%
Star trek  GTST 0.018%
For example a lift of 2.5 means:
If people have watched movie X then they are 2.5 more
likely to watch movie Y than if they didn’t watch movie X
Association rules virtual items
User Movie
1 Blacklist
1 Startrek
1 James bond
2 Kill Bill
2 Pulp fiction
3 Stargate
3 Men in Black
An old trick with association rules mining is to add ‘virtual’ items
User Virtual item
1 Blacklist
1 Startrek
1 James bond
1 Male
1 [25-30) Y
2 Kill Bill
2 Pulp fiction
2 Female
2 [40-45) Y
3 Stargate
3 Men in Black
2 Male
2 [50-55) Y
Rules that now might appear are for example:
 Male, [40-45), Startrek  James Bond
 Female, [20-25), Kill Bill  Pulp Fiction
Association rules with R and Gephi
Power BI
Survival curves
Survival curve
At which moment in an episode do people stop watching?
Can we compare different episodes and series?
Survival Curves!!
For a specific Episode from a specific Serie:
 Take all Videoland streams: Starts / Stops from
 Determine completion rate, and rank all streams on completion rate
 Calculate empirical distribution F
 Survival: S =1 – F
 Do this for all episodes and series
MasterSearch_Meetup_AdvancedAnalytics
MasterSearch_Meetup_AdvancedAnalytics

More Related Content

Similar to MasterSearch_Meetup_AdvancedAnalytics

Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
Sara Hooker
 
artificial intelligence and applications
artificial intelligence and applicationsartificial intelligence and applications
artificial intelligence and applications
KanchanaRSVVV
 
2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net
Bruno Capuano
 
BDACA - Lecture7
BDACA - Lecture7BDACA - Lecture7
BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7BDACA1617s2 - Lecture7
Accelerating Path to Production for Generative AI-powered Applications
Accelerating Path to Production for Generative AI-powered ApplicationsAccelerating Path to Production for Generative AI-powered Applications
Accelerating Path to Production for Generative AI-powered Applications
HostedbyConfluent
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
Sanghamitra Deb
 
MLSEV. Automating Decision Making
MLSEV. Automating Decision MakingMLSEV. Automating Decision Making
MLSEV. Automating Decision Making
BigML, Inc
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon Web Services
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
Neo4j
 
Deep learning: from theory via virality to applications in business
Deep learning: from theory via virality to applications in businessDeep learning: from theory via virality to applications in business
Deep learning: from theory via virality to applications in business
Rasmus Rothe
 
Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization
Chris Grabosky
 
Big Data for Small Businesses & Startups
Big Data for Small Businesses & StartupsBig Data for Small Businesses & Startups
Big Data for Small Businesses & Startups
Fujio Turner
 
Worst Practices in Artificial Intelligence
Worst Practices in Artificial IntelligenceWorst Practices in Artificial Intelligence
Worst Practices in Artificial Intelligence
William Tsoi
 
Eland: A Python client for data analysis and exploration
Eland: A Python client for data analysis and explorationEland: A Python client for data analysis and exploration
Eland: A Python client for data analysis and exploration
Elasticsearch
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent Applications
Machine Learning Prague
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine Learning
Mostafa
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
DataWorks Summit
 
Data Modelling Zone 2019 - data modelling and JSON
Data Modelling Zone 2019 - data modelling and JSONData Modelling Zone 2019 - data modelling and JSON
Data Modelling Zone 2019 - data modelling and JSON
George McGeachie
 
BSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBSSML17 - Feature Engineering
BSSML17 - Feature Engineering
BigML, Inc
 

Similar to MasterSearch_Meetup_AdvancedAnalytics (20)

Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
artificial intelligence and applications
artificial intelligence and applicationsartificial intelligence and applications
artificial intelligence and applications
 
2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net
 
BDACA - Lecture7
BDACA - Lecture7BDACA - Lecture7
BDACA - Lecture7
 
BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7
 
Accelerating Path to Production for Generative AI-powered Applications
Accelerating Path to Production for Generative AI-powered ApplicationsAccelerating Path to Production for Generative AI-powered Applications
Accelerating Path to Production for Generative AI-powered Applications
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
MLSEV. Automating Decision Making
MLSEV. Automating Decision MakingMLSEV. Automating Decision Making
MLSEV. Automating Decision Making
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
 
Deep learning: from theory via virality to applications in business
Deep learning: from theory via virality to applications in businessDeep learning: from theory via virality to applications in business
Deep learning: from theory via virality to applications in business
 
Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization
 
Big Data for Small Businesses & Startups
Big Data for Small Businesses & StartupsBig Data for Small Businesses & Startups
Big Data for Small Businesses & Startups
 
Worst Practices in Artificial Intelligence
Worst Practices in Artificial IntelligenceWorst Practices in Artificial Intelligence
Worst Practices in Artificial Intelligence
 
Eland: A Python client for data analysis and exploration
Eland: A Python client for data analysis and explorationEland: A Python client for data analysis and exploration
Eland: A Python client for data analysis and exploration
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent Applications
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine Learning
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
 
Data Modelling Zone 2019 - data modelling and JSON
Data Modelling Zone 2019 - data modelling and JSONData Modelling Zone 2019 - data modelling and JSON
Data Modelling Zone 2019 - data modelling and JSON
 
BSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBSSML17 - Feature Engineering
BSSML17 - Feature Engineering
 

More from Longhow Lam

Xomia_20220602.pptx
Xomia_20220602.pptxXomia_20220602.pptx
Xomia_20220602.pptx
Longhow Lam
 
A Unifying theory for blockchain and AI
A Unifying theory for blockchain and AIA Unifying theory for blockchain and AI
A Unifying theory for blockchain and AI
Longhow Lam
 
Data science inspiratie_sessie
Data science inspiratie_sessieData science inspiratie_sessie
Data science inspiratie_sessie
Longhow Lam
 
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en IensJaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
Longhow Lam
 
text2vec SatRDay Amsterdam
text2vec SatRDay Amsterdamtext2vec SatRDay Amsterdam
text2vec SatRDay Amsterdam
Longhow Lam
 
Dataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 AmsterdamDataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 Amsterdam
Longhow Lam
 
Data science in action
Data science in actionData science in action
Data science in action
Longhow Lam
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
Longhow Lam
 
Latent transwarp neural networks
Latent transwarp neural networksLatent transwarp neural networks
Latent transwarp neural networks
Longhow Lam
 
MathPaperPublished
MathPaperPublishedMathPaperPublished
MathPaperPublishedLonghow Lam
 
Heliview 29sep2015 slideshare
Heliview 29sep2015 slideshareHeliview 29sep2015 slideshare
Heliview 29sep2015 slideshare
Longhow Lam
 
Parameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov modelParameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov model
Longhow Lam
 
The analysis of doubly censored survival data
The analysis of doubly censored survival dataThe analysis of doubly censored survival data
The analysis of doubly censored survival data
Longhow Lam
 
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)
Longhow Lam
 

More from Longhow Lam (14)

Xomia_20220602.pptx
Xomia_20220602.pptxXomia_20220602.pptx
Xomia_20220602.pptx
 
A Unifying theory for blockchain and AI
A Unifying theory for blockchain and AIA Unifying theory for blockchain and AI
A Unifying theory for blockchain and AI
 
Data science inspiratie_sessie
Data science inspiratie_sessieData science inspiratie_sessie
Data science inspiratie_sessie
 
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en IensJaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
 
text2vec SatRDay Amsterdam
text2vec SatRDay Amsterdamtext2vec SatRDay Amsterdam
text2vec SatRDay Amsterdam
 
Dataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 AmsterdamDataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 Amsterdam
 
Data science in action
Data science in actionData science in action
Data science in action
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 
Latent transwarp neural networks
Latent transwarp neural networksLatent transwarp neural networks
Latent transwarp neural networks
 
MathPaperPublished
MathPaperPublishedMathPaperPublished
MathPaperPublished
 
Heliview 29sep2015 slideshare
Heliview 29sep2015 slideshareHeliview 29sep2015 slideshare
Heliview 29sep2015 slideshare
 
Parameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov modelParameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov model
 
The analysis of doubly censored survival data
The analysis of doubly censored survival dataThe analysis of doubly censored survival data
The analysis of doubly censored survival data
 
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)
 

Recently uploaded

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 

MasterSearch_Meetup_AdvancedAnalytics

  • 1. Data science @ RTL Nederland Longhow Lam @longhowlam 1st Master Search Advanced Analytics meetup 5-juli-2017
  • 2. Agenda  RTL Data science set up  Some data science topics @RTL  Text mining  Computer vision  Association rules  Power BI
  • 3. RTL Data science set up Source Data  Click data  Heartbeat data  Account data  Location  Metadata  Campaign data  Etc.. Data science team 4 data engineers 4 data scientists The main businesses at RTL we work for ETL processes Find out for your self: https://www.rtl.nl/werkenbij/ Use cases  Churn modeling  Response modeling  Customer segmentation  Look-alikes-for Advertisers  Recommendation engines
  • 4. Content similarity Which movies on Videoland are close to each other? Which news articles on RTL Nieuws are close to each other? movies, we can look at the movie summaries or Video captures news articles, we can look the text of the articles or corresponding news image Hence text mining and computer vision
  • 6. Text mining 2000 movie plots / summaries on VideoLand For each movie plot: count the words / terms Put the counts in a so-called term document matrix There are around 50.000 terms in the 2000 movie plots Usually this matrix is very sparse Aap Film Auto …. Leven …. …. Zwaar Film1 1 4 8 Film2 10 Film3 5 … 1 … … 6 6 Film2000 1 8 Term document matrix
  • 7. Similarity: cosine similarity Between movies (so between rows of the matrix) we can now calculate similarities A distance that is often used is cosine similarity Visually we can see this distance in the following figure: Suppose we only have two terms: 1. Leven 2. Spannend # leven # Spannend Film 1 Film 2 cosine similarity
  • 8. VideoLand To get a feeling for Movie similarities we created a small shiny app
  • 9. RTL Nieuws power-bi dashboard to see article similarties
  • 11. Two approaches used @RTL Computer Vision API from Vendors (Microsoft, Clarifai, Google,…) Tweak things ourselves with Keras/Tensorflow + Ready and Easy to use (Just send your image to them) + Not too expensive ($0.84 / 1000 images) - No control on what is returned - Takes more effort to set it up - Needs more knowledge + More control on what you are doing
  • 12. RTL Nieuws image: API examples Feature Name Value Description { "type": 0, "captions": [ { "text": "a group of people sitting on a table", "confidence": 0.4894670976127814 } ] } Tags [ { "name": "person", "confidence": 0.996391236782074 }, { "name": "indoor", "confidence": 0.9104063510894775 }, { "name": "people", "confidence": 0.7057779431343079 } ] Image Format Jpeg Image Dimensions 4096 x 3078 Clip Art Type 0 Non-clipart Line Drawing Type 0 Non-LineDrawing Black & White Image False Is Adult Content False Adult Score 0.042066238820552826 Is Racy Content False Racy Score 0.061784882098436356 Categories [ { "name": "people_many", "score": 0.9296875 } ] Faces [ { "age": 52, "gender": "Male", "faceRectangle": { "width": 298, "height": 298, "left": 433, "top": 1370 } }, { "age": 78, "gender": "Male", "faceRectangle": { "width": 269, "height": 269, "left": 3212, "top": 1410 } }, { "age": 64, "gender": "Male", "faceRectangle": { "width": 241, "height": 241, "left": 2108, "top": 1534 } } ]
  • 13. Feature Name Value Description { "type": 0, "captions": [ { "text": "Linda de Mol talking on a cell phone", "confidence": 0.46178352459016536 } ] } Tags [ { "name": "person", "confidence": 0.9999904632568359 }, { "name": "outdoor", "confidence": 0.9974232912063599 }, { "name": "woman", "confidence": 0.9967917799949646 }, { "name": "lady", "confidence": 0.7658315896987915 } ] Image Format Jpeg Image Dimensions 1024 x 421 Clip Art Type 0 Non-clipart Line Drawing Type 0 Non-LineDrawing Black & White Image False Is Adult Content False Adult Score 0.009753250516951084 Is Racy Content False Racy Score 0.014254707843065262 Categories [ { "name": "people_portrait", "score": 0.96875 } ] Faces [ { "age": 28, "gender": "Female", "faceRectangle": { "width": 282, "height": 282, "left": 286, "top": 35 } } ] RTL Nieuws image: API examples
  • 15. Tweak things ourselves with Keras  Keras is a high-level neural networks API running on top of  either TensorFlow  or Theano.  and now also CNTK  Developed for fast experimentation.  Easier to use than tensorflow, but you still have lot’s of options  There is now also an R interface (of course created by Rstudio… )
  • 16. Keras: Simpel set-up “Architecture” Tensorflow installed on a (linux) machine Ideally with lots of GPU’s  pip install keras You’re good to go in Python (Jupyter notebooks) install_github("rstudio/keras") You’re good to go in R / RStudio
  • 17. Example in R: Neural network with two hidden layers Pixel 3 Pixel 2 Pixel 1 Pixel 783 Pixel 784 Label 0 Label 9
  • 18. Using pre-trained models Image classifiers have been trained on big GPU machines for weeks with millions of pictures on very large networks Not many people do that from scratch. Instead, one can use pre-trained networks and start from there. VGG19 deep learning model 143 million weights!!!
  • 19. predict image class using pretrained models
  • 20. RTL NIEUWS Images labeled with resnet and vgg16 Link to trellisJS app
  • 21. Extract features using pre-trained models Remove top layers for feature extraction We have a 7*7*512 ‘feature’ tensor = 25.088 values
  • 22. Only a few lines of R code
  • 23. RTL NIEUWS Image similarity 1024 RTL Nieuws Sample pictures. Compute for each image the 25.088 feature values. Calculate for each image the top 10 closest images, based on cosine similarity. Little Shiny APP
  • 24. Examples RTL Nieuws image similarities
  • 25. Examples RTL Nieuws image similarities
  • 27. Take five Brad Pitt pictures Run them trough the pre-trained vgg16 and extract feature vectors. This is a 5 by 25088 matrix The brad Pit Index Take other images, run them through the VGG16 Calculate the distances with the five Brad Pitt pictures and average: 0.771195 0.802654 0.714752 0.792587 0.8291976 0.80969440.665990 0.9737212
  • 28. 0.6273 0.5908 0.8231 0.7711 0.8839 0.8975 0.6934 0.9659 Focusing on only the face!!
  • 29. Can you shake hands with your neighbor? A little Statistical Experiment
  • 30. Can you shake hands with your neighbor? A little Statistical Experiment 50.1% of people don’t wash their hands after visiting the toilet
  • 31. A little Statistical Experiment
  • 32. Can you shake hands with your neighbor? A little Statistical Experiment 84.6% of all statistics are just made up on the spot !!
  • 34. Association Rules Mining Market basket analysis  Association rules mining (arm) Mixture of different methods  Ensemble ARM is one of several so called collaborative filter algorithms Collaborative filtering is a method of making recommendations about the interests of one user (filter) by collecting preferences or behavior from many users (collaborating). Memory-based algorithms  Slope one (slope1)  K nearest neighbors (knn) Model-based algorithms  Matrix factorization methods
  • 35. Association rule mining The basics  Identify frequent item sets (or rules) in the customer transaction data:  IF item X THEN item Y  IF item A and B THEN item very likely item C  Not all rules are interesting, use ‘support’ and ‘lift’ to judge importance of a rule # trxs. {X}  {Y} Total # trxs. Support (X,Y) = Lift (X,Y) = Support (X,Y) Support (X) * Support(Y) Support & Lift GTST  Nieuwe Tijden 10.8% Star trek  GTST 0.018% For example a lift of 2.5 means: If people have watched movie X then they are 2.5 more likely to watch movie Y than if they didn’t watch movie X
  • 36. Association rules virtual items User Movie 1 Blacklist 1 Startrek 1 James bond 2 Kill Bill 2 Pulp fiction 3 Stargate 3 Men in Black An old trick with association rules mining is to add ‘virtual’ items User Virtual item 1 Blacklist 1 Startrek 1 James bond 1 Male 1 [25-30) Y 2 Kill Bill 2 Pulp fiction 2 Female 2 [40-45) Y 3 Stargate 3 Men in Black 2 Male 2 [50-55) Y Rules that now might appear are for example:  Male, [40-45), Startrek  James Bond  Female, [20-25), Kill Bill  Pulp Fiction
  • 37. Association rules with R and Gephi
  • 39. Survival curve At which moment in an episode do people stop watching? Can we compare different episodes and series? Survival Curves!! For a specific Episode from a specific Serie:  Take all Videoland streams: Starts / Stops from  Determine completion rate, and rank all streams on completion rate  Calculate empirical distribution F  Survival: S =1 – F  Do this for all episodes and series