SlideShare a Scribd company logo
CIKM CUP 2016: Track 1
Cross-Device Linking
Alexey Grigorev
Berlin Machine Learning
2016.12.05
About Me
Software Developer BI Masters @ TU Berlin Data Scientist
CIKM Cup 2016: Cross-Device Linking
user advertisements ad providers
Goal: Restore the Graph
?
training data:
know the links
new unseen devices:
no links
Data
240k train “users” (devices), 100k test users
500k train device-device pairs, 215k test pairs
Denormalized: 2.5 Gb click logs + 1 Gb URLs & titles
67m clicks in total, 197 clicks per user on average
How to Approach?
● Machine Learning?
● First, optimize Recall
○ IR, unsupervised
○ Select “candidate” device-device pairs
○ Build a design matrix
● Then, optimize Precision
○ ML, supervised
○ Push the true pairs up the list
● Select top K pairs s.t. F1 is max
Train Test
Optimizing Recall
● Recall: fraction of all true device pairs we discover
● Information Retrieval problem!
● For each device need to find the most similar ones
● Device == Document with
○ Tokens from all visited URL + Tokens from all titles
○ Put them together into a one single document
● Then use standard IR methods like TF-IDF
Optimizing Recall
most similar
least similar
IR
ES MLT query Top 70 candidatesDevice
(240k + 100k) * 70 = 24m
1
1
0
Optimizing Precision
1
1
0
0
0
1
● Now have high Recall, but low Precision
○ Recall: fraction of all positive pairs we discover
○ Precision: fraction of positive pairs within our results
● Use Supervised Machine Learning for improving it
Next steps:
● Create features for each device pair
● Train a ranking ML model
● Take top most reliable predictions
Features: Profiling
● Create a profile for each user
● Profile from sessions (30 minutes inactivity cut):
○ Session duration
○ Number of visits per session
○ Number of sessions with only one visit
○ Duration of breaks between and within sessions
○ Number of consecutive requests with a ≤ 1ms delay
○ Starts and ends of sessions
○ Number of unique domains per session
○ Similarity of domains/urls/titles within each session
○ For all features: min, mean, max and std
Features: Device-Device Similarities
● |profile1.feature - profile2.feature|
● TF-IDF similarity of
○ Domains
○ Titles
○ URLs
● LSA similarity of the same
● 54 features in total
Optimizing Precision: Ranking Model
1
1
0
1
0
1
0.90
0.87
0.2
0.3
0.7
Train 240k * 70 = 17m
100k-200k
Top K
Test 100k * 70 = 7m
Features: Importance
pair
features
profile
difference
features
XGB feature importance:
# times used in split
Cross-Validation
FOLD 1 FOLD 2vs
information leak!
Cross-Validation
● Split the graph into non-overlapping
regions
● For each region separately
○ Build ES index (i.e. apply filter)
○ Build a model
● Evaluation (AUC + F1):
○ Apply F1 model to F2 data
○ And vise versa
F1 F2
0.90
0.87
0.7
0.90
0.87
0.7
Evaluation
Public/private test split: 50/50
● During the competition:
○ Evaluation on 1st half of data
● After the competition: 2nd half
● P = 0.5 of real P
Test
normal F1
“real” evaluation
function
Choosing K
● Order the pairs by the probability
● For each K calculate P, R and F1
● Select best K such that F1 is max
● 8th position
Post-Competition
What did others do?
● Using several candidate selection methods
● Stacking with rank features (by D. Dremov)
● Markov Clustering (by I. Bendyna)
Rank Features
source: http://gh.mltrainings.ru/presentations/Dremov_CIKMCup2016_DCA.pdf slide 9
● Relative position of a node within a group
● Motivation: “local” within-group effect instead of global
● df_train.groupby('user_1')[feature].rank()
Stacking (post competition)
all features
XGBoost
ET
best features
XGBoost
rank features
8th → 5th position
Markov Clustering
source: http://gh.mltrainings.ru/presentations/Bendyna_CIKMCup2016_DCA.pdf
● take a connected component
● add loops
● put into a Markov Matrix M
○ also called “Stochastic Matrix”
○ values in cols sum up to 1
● calculate M ** n
○ ~ n Random Walk steps
● for each element M.v = M.v ** p
○ makes weak links weaker
● re-normalize and repeat
Animation http://micans.org/mcl/ani/mcl-animation.html
Links & Further Info
● Competition website: http://cikmcup.org/
● Competition platform: https://competitions.codalab.org/competitions/11171
● My solution: https://github.com/alexeygrigorev/cikm-cup-2016-cross-device
● Reports: http://cikmcup.org/workshop.html
Self-promotion:
● http://alexeygrigorev.com/
● contact@alexeygrigorev.com
Thank you. Questions?

More Related Content

What's hot

Unit test demo for calculatechinesenamenumber
Unit test demo for calculatechinesenamenumberUnit test demo for calculatechinesenamenumber
Unit test demo for calculatechinesenamenumberJuggernaut Liu
 
3D webservices - where do we stand? (ENG)
3D webservices - where do we stand? (ENG)3D webservices - where do we stand? (ENG)
3D webservices - where do we stand? (ENG)Camptocamp
 
D422 7-2 string hadeling
D422 7-2  string hadelingD422 7-2  string hadeling
D422 7-2 string hadeling
Omkar Rane
 
[Question Paper] Object Oriented Programming With C++ (Revised Course) [Janua...
[Question Paper] Object Oriented Programming With C++ (Revised Course) [Janua...[Question Paper] Object Oriented Programming With C++ (Revised Course) [Janua...
[Question Paper] Object Oriented Programming With C++ (Revised Course) [Janua...
Mumbai B.Sc.IT Study
 
ملخص البرمجة المرئية - الوحدة الثالثة
ملخص البرمجة المرئية - الوحدة الثالثةملخص البرمجة المرئية - الوحدة الثالثة
ملخص البرمجة المرئية - الوحدة الثالثة
جامعة القدس المفتوحة
 
Demo the reactive jargons
Demo the reactive jargonsDemo the reactive jargons
Demo the reactive jargonsThoughtworks
 
Grails workshops
Grails workshopsGrails workshops
Grails workshops
Łukasz Tenerowicz
 
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
Hayahide Yamagishi
 
Clojure/conj 2017
Clojure/conj 2017Clojure/conj 2017
Clojure/conj 2017
Darren Kim
 
Formalising Graph Pattern Matching Gremlin traversals in Graph Alegra
Formalising Graph Pattern Matching Gremlin traversals in Graph AlegraFormalising Graph Pattern Matching Gremlin traversals in Graph Alegra
Formalising Graph Pattern Matching Gremlin traversals in Graph Alegra
Harsh Thakkar
 
Implementation
ImplementationImplementation
Implementation
Syed Zaid Irshad
 
Globe Infographics
Globe InfographicsGlobe Infographics
Globe Infographics
LINE Corporation
 
A hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documentsA hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documents
Hayahide Yamagishi
 
Date and time on the internet
Date and time on the internetDate and time on the internet
Date and time on the internet
Igalia
 
PFDet: 2nd Place Solutions to Open Images Competition
PFDet: 2nd Place Solutions to Open Images CompetitionPFDet: 2nd Place Solutions to Open Images Competition
PFDet: 2nd Place Solutions to Open Images Competition
Shotaro Sano
 
JavaScript Getting Started
JavaScript Getting StartedJavaScript Getting Started
JavaScript Getting Started
Hazem Hagrass
 
Kaggle bosch presentation material for Kaggle Tokyo Meetup #2
Kaggle bosch presentation material for Kaggle Tokyo Meetup #2Kaggle bosch presentation material for Kaggle Tokyo Meetup #2
Kaggle bosch presentation material for Kaggle Tokyo Meetup #2
Keisuke Hosaka
 

What's hot (17)

Unit test demo for calculatechinesenamenumber
Unit test demo for calculatechinesenamenumberUnit test demo for calculatechinesenamenumber
Unit test demo for calculatechinesenamenumber
 
3D webservices - where do we stand? (ENG)
3D webservices - where do we stand? (ENG)3D webservices - where do we stand? (ENG)
3D webservices - where do we stand? (ENG)
 
D422 7-2 string hadeling
D422 7-2  string hadelingD422 7-2  string hadeling
D422 7-2 string hadeling
 
[Question Paper] Object Oriented Programming With C++ (Revised Course) [Janua...
[Question Paper] Object Oriented Programming With C++ (Revised Course) [Janua...[Question Paper] Object Oriented Programming With C++ (Revised Course) [Janua...
[Question Paper] Object Oriented Programming With C++ (Revised Course) [Janua...
 
ملخص البرمجة المرئية - الوحدة الثالثة
ملخص البرمجة المرئية - الوحدة الثالثةملخص البرمجة المرئية - الوحدة الثالثة
ملخص البرمجة المرئية - الوحدة الثالثة
 
Demo the reactive jargons
Demo the reactive jargonsDemo the reactive jargons
Demo the reactive jargons
 
Grails workshops
Grails workshopsGrails workshops
Grails workshops
 
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
 
Clojure/conj 2017
Clojure/conj 2017Clojure/conj 2017
Clojure/conj 2017
 
Formalising Graph Pattern Matching Gremlin traversals in Graph Alegra
Formalising Graph Pattern Matching Gremlin traversals in Graph AlegraFormalising Graph Pattern Matching Gremlin traversals in Graph Alegra
Formalising Graph Pattern Matching Gremlin traversals in Graph Alegra
 
Implementation
ImplementationImplementation
Implementation
 
Globe Infographics
Globe InfographicsGlobe Infographics
Globe Infographics
 
A hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documentsA hierarchical neural autoencoder for paragraphs and documents
A hierarchical neural autoencoder for paragraphs and documents
 
Date and time on the internet
Date and time on the internetDate and time on the internet
Date and time on the internet
 
PFDet: 2nd Place Solutions to Open Images Competition
PFDet: 2nd Place Solutions to Open Images CompetitionPFDet: 2nd Place Solutions to Open Images Competition
PFDet: 2nd Place Solutions to Open Images Competition
 
JavaScript Getting Started
JavaScript Getting StartedJavaScript Getting Started
JavaScript Getting Started
 
Kaggle bosch presentation material for Kaggle Tokyo Meetup #2
Kaggle bosch presentation material for Kaggle Tokyo Meetup #2Kaggle bosch presentation material for Kaggle Tokyo Meetup #2
Kaggle bosch presentation material for Kaggle Tokyo Meetup #2
 

Viewers also liked

Todoencaja - Presentacion Betabeers Galicia
Todoencaja - Presentacion Betabeers GaliciaTodoencaja - Presentacion Betabeers Galicia
Todoencaja - Presentacion Betabeers Galicia
WeKCo Coworking
 
050516 La Prensa Libre
050516 La Prensa Libre050516 La Prensa Libre
050516 La Prensa LibreJose Lopez
 
Actas 8ª reunion mesa del convenio (consolidación de empleo)
Actas 8ª reunion mesa del convenio (consolidación de empleo)Actas 8ª reunion mesa del convenio (consolidación de empleo)
Actas 8ª reunion mesa del convenio (consolidación de empleo)
Cgt Sevilla
 
COASTAL ESSENCE MAGAZINE
COASTAL ESSENCE MAGAZINECOASTAL ESSENCE MAGAZINE
COASTAL ESSENCE MAGAZINE
johnwillie1956
 
Antologia oral poesia hispanoamericana del siglo xx
Antologia oral poesia hispanoamericana del siglo xxAntologia oral poesia hispanoamericana del siglo xx
Antologia oral poesia hispanoamericana del siglo xx
Claribel Pereira
 
Comandos basicosunix
Comandos basicosunixComandos basicosunix
Comandos basicosunixvampiregv
 
Estudio Economía Colaborativa en Valencia OuiShare marzo 2015
Estudio Economía Colaborativa en Valencia OuiShare marzo 2015Estudio Economía Colaborativa en Valencia OuiShare marzo 2015
Estudio Economía Colaborativa en Valencia OuiShare marzo 2015
OuiShare
 
2017 dossier eventos empresas Los Angeles de San Rafael
2017 dossier eventos empresas Los Angeles de San Rafael2017 dossier eventos empresas Los Angeles de San Rafael
2017 dossier eventos empresas Los Angeles de San Rafael
Los Angeles de San Rafael
 
SMART CARDS
SMART CARDSSMART CARDS
SMART CARDS
salman khan
 
DIAPOSITIVA CONCLUSIONES
DIAPOSITIVA CONCLUSIONESDIAPOSITIVA CONCLUSIONES
DIAPOSITIVA CONCLUSIONESalimancheno
 
Adicción a las tecnologías
Adicción a las tecnologíasAdicción a las tecnologías
Adicción a las tecnologías
Juan Luis Hueso
 
As cidades e o mundo urbano (2º ESO)
As cidades e o mundo urbano (2º ESO)As cidades e o mundo urbano (2º ESO)
As cidades e o mundo urbano (2º ESO)
rubempaul
 

Viewers also liked (15)

Todoencaja - Presentacion Betabeers Galicia
Todoencaja - Presentacion Betabeers GaliciaTodoencaja - Presentacion Betabeers Galicia
Todoencaja - Presentacion Betabeers Galicia
 
Trabajo cabañeros
Trabajo cabañerosTrabajo cabañeros
Trabajo cabañeros
 
050516 La Prensa Libre
050516 La Prensa Libre050516 La Prensa Libre
050516 La Prensa Libre
 
Actas 8ª reunion mesa del convenio (consolidación de empleo)
Actas 8ª reunion mesa del convenio (consolidación de empleo)Actas 8ª reunion mesa del convenio (consolidación de empleo)
Actas 8ª reunion mesa del convenio (consolidación de empleo)
 
COASTAL ESSENCE MAGAZINE
COASTAL ESSENCE MAGAZINECOASTAL ESSENCE MAGAZINE
COASTAL ESSENCE MAGAZINE
 
Sant fruitós
Sant fruitósSant fruitós
Sant fruitós
 
Antologia oral poesia hispanoamericana del siglo xx
Antologia oral poesia hispanoamericana del siglo xxAntologia oral poesia hispanoamericana del siglo xx
Antologia oral poesia hispanoamericana del siglo xx
 
Comandos basicosunix
Comandos basicosunixComandos basicosunix
Comandos basicosunix
 
Estudio Economía Colaborativa en Valencia OuiShare marzo 2015
Estudio Economía Colaborativa en Valencia OuiShare marzo 2015Estudio Economía Colaborativa en Valencia OuiShare marzo 2015
Estudio Economía Colaborativa en Valencia OuiShare marzo 2015
 
2017 dossier eventos empresas Los Angeles de San Rafael
2017 dossier eventos empresas Los Angeles de San Rafael2017 dossier eventos empresas Los Angeles de San Rafael
2017 dossier eventos empresas Los Angeles de San Rafael
 
SMART CARDS
SMART CARDSSMART CARDS
SMART CARDS
 
DIAPOSITIVA CONCLUSIONES
DIAPOSITIVA CONCLUSIONESDIAPOSITIVA CONCLUSIONES
DIAPOSITIVA CONCLUSIONES
 
Adicción a las tecnologías
Adicción a las tecnologíasAdicción a las tecnologías
Adicción a las tecnologías
 
As cidades e o mundo urbano (2º ESO)
As cidades e o mundo urbano (2º ESO)As cidades e o mundo urbano (2º ESO)
As cidades e o mundo urbano (2º ESO)
 
Curriculum actualizado nuevo 1 1
Curriculum actualizado nuevo 1 1Curriculum actualizado nuevo 1 1
Curriculum actualizado nuevo 1 1
 

Similar to CIKM Cup 2016: Cross-Device Linking

Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ Signal
Joachim Draeger
 
Click prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real lifeClick prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real life
Alexey Grigorev
 
Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018
Karthik Murugesan
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
Faisal Siddiqi
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
Databricks
 
Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 review
ManageIQ
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
MLconf
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
Xavier Amatriain
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
Xavier Amatriain
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
GeeksLab Odessa
 
Faceted Search And Result Reordering
Faceted Search And Result ReorderingFaceted Search And Result Reordering
Faceted Search And Result Reordering
Varun Thacker
 
Continuous Profiling for Android Game Performance Optimization
Continuous Profiling for Android Game Performance OptimizationContinuous Profiling for Android Game Performance Optimization
Continuous Profiling for Android Game Performance Optimization
KLab Inc. / Tech
 
Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Nikhil Dandekar
 
Safer’s Tips & Tricks to Optimize Top FME Transformers
Safer’s Tips & Tricks to Optimize Top FME TransformersSafer’s Tips & Tricks to Optimize Top FME Transformers
Safer’s Tips & Tricks to Optimize Top FME Transformers
Safe Software
 
Peer sim (p2p network)
Peer sim (p2p network)Peer sim (p2p network)
Peer sim (p2p network)
Hein Min Htike
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
Stepan Pushkarev
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Provectus
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain
 
Active Learning on Question Answering with Dialogues
 Active Learning on Question Answering with Dialogues Active Learning on Question Answering with Dialogues
Active Learning on Question Answering with Dialogues
Jinho Choi
 

Similar to CIKM Cup 2016: Cross-Device Linking (20)

Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ Signal
 
Click prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real lifeClick prediction: kaggle competitions vs real life
Click prediction: kaggle competitions vs real life
 
Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
 
Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 review
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
 
Faceted Search And Result Reordering
Faceted Search And Result ReorderingFaceted Search And Result Reordering
Faceted Search And Result Reordering
 
Continuous Profiling for Android Game Performance Optimization
Continuous Profiling for Android Game Performance OptimizationContinuous Profiling for Android Game Performance Optimization
Continuous Profiling for Android Game Performance Optimization
 
Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)
 
Safer’s Tips & Tricks to Optimize Top FME Transformers
Safer’s Tips & Tricks to Optimize Top FME TransformersSafer’s Tips & Tricks to Optimize Top FME Transformers
Safer’s Tips & Tricks to Optimize Top FME Transformers
 
Peer sim (p2p network)
Peer sim (p2p network)Peer sim (p2p network)
Peer sim (p2p network)
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Active Learning on Question Answering with Dialogues
 Active Learning on Question Answering with Dialogues Active Learning on Question Answering with Dialogues
Active Learning on Question Answering with Dialogues
 

More from Alexey Grigorev

MLOps week 1 intro
MLOps week 1 introMLOps week 1 intro
MLOps week 1 intro
Alexey Grigorev
 
Codementor - Data Science at OLX
Codementor - Data Science at OLX Codementor - Data Science at OLX
Codementor - Data Science at OLX
Alexey Grigorev
 
Data Monitoring with whylogs
Data Monitoring with whylogsData Monitoring with whylogs
Data Monitoring with whylogs
Alexey Grigorev
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
Alexey Grigorev
 
AI in Fashion - Size & Fit - Nour Karessli
 AI in Fashion - Size & Fit - Nour Karessli AI in Fashion - Size & Fit - Nour Karessli
AI in Fashion - Size & Fit - Nour Karessli
Alexey Grigorev
 
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia PavlovaAI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
Alexey Grigorev
 
ML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - KubernetesML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - Kubernetes
Alexey Grigorev
 
Paradoxes in Data Science
Paradoxes in Data ScienceParadoxes in Data Science
Paradoxes in Data Science
Alexey Grigorev
 
ML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learningML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learning
Alexey Grigorev
 
Algorithmic fairness
Algorithmic fairnessAlgorithmic fairness
Algorithmic fairness
Alexey Grigorev
 
MLOps at OLX
MLOps at OLXMLOps at OLX
MLOps at OLX
Alexey Grigorev
 
ML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble LearningML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble Learning
Alexey Grigorev
 
ML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deploymentML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deployment
Alexey Grigorev
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
ML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for ClassificationML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for Classification
Alexey Grigorev
 
ML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for ClassificationML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for Classification
Alexey Grigorev
 
ML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office HoursML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office Hours
Alexey Grigorev
 
AMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplacesAMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplaces
Alexey Grigorev
 
ML Zoomcamp 2 - Slides
ML Zoomcamp 2 - SlidesML Zoomcamp 2 - Slides
ML Zoomcamp 2 - Slides
Alexey Grigorev
 
ML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction ProjectML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction Project
Alexey Grigorev
 

More from Alexey Grigorev (20)

MLOps week 1 intro
MLOps week 1 introMLOps week 1 intro
MLOps week 1 intro
 
Codementor - Data Science at OLX
Codementor - Data Science at OLX Codementor - Data Science at OLX
Codementor - Data Science at OLX
 
Data Monitoring with whylogs
Data Monitoring with whylogsData Monitoring with whylogs
Data Monitoring with whylogs
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
 
AI in Fashion - Size & Fit - Nour Karessli
 AI in Fashion - Size & Fit - Nour Karessli AI in Fashion - Size & Fit - Nour Karessli
AI in Fashion - Size & Fit - Nour Karessli
 
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia PavlovaAI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
 
ML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - KubernetesML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - Kubernetes
 
Paradoxes in Data Science
Paradoxes in Data ScienceParadoxes in Data Science
Paradoxes in Data Science
 
ML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learningML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learning
 
Algorithmic fairness
Algorithmic fairnessAlgorithmic fairness
Algorithmic fairness
 
MLOps at OLX
MLOps at OLXMLOps at OLX
MLOps at OLX
 
ML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble LearningML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble Learning
 
ML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deploymentML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deployment
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
ML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for ClassificationML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for Classification
 
ML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for ClassificationML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for Classification
 
ML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office HoursML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office Hours
 
AMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplacesAMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplaces
 
ML Zoomcamp 2 - Slides
ML Zoomcamp 2 - SlidesML Zoomcamp 2 - Slides
ML Zoomcamp 2 - Slides
 
ML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction ProjectML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction Project
 

Recently uploaded

一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
SatyamNeelmani2
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Alireza Kamrani
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
zahraomer517
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
DOT TECH
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 

Recently uploaded (20)

一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 

CIKM Cup 2016: Cross-Device Linking

  • 1. CIKM CUP 2016: Track 1 Cross-Device Linking Alexey Grigorev Berlin Machine Learning 2016.12.05
  • 2. About Me Software Developer BI Masters @ TU Berlin Data Scientist
  • 3. CIKM Cup 2016: Cross-Device Linking user advertisements ad providers
  • 4. Goal: Restore the Graph ? training data: know the links new unseen devices: no links
  • 5. Data 240k train “users” (devices), 100k test users 500k train device-device pairs, 215k test pairs Denormalized: 2.5 Gb click logs + 1 Gb URLs & titles 67m clicks in total, 197 clicks per user on average
  • 6. How to Approach? ● Machine Learning? ● First, optimize Recall ○ IR, unsupervised ○ Select “candidate” device-device pairs ○ Build a design matrix ● Then, optimize Precision ○ ML, supervised ○ Push the true pairs up the list ● Select top K pairs s.t. F1 is max Train Test
  • 7. Optimizing Recall ● Recall: fraction of all true device pairs we discover ● Information Retrieval problem! ● For each device need to find the most similar ones ● Device == Document with ○ Tokens from all visited URL + Tokens from all titles ○ Put them together into a one single document ● Then use standard IR methods like TF-IDF
  • 8. Optimizing Recall most similar least similar IR ES MLT query Top 70 candidatesDevice (240k + 100k) * 70 = 24m 1 1 0
  • 9. Optimizing Precision 1 1 0 0 0 1 ● Now have high Recall, but low Precision ○ Recall: fraction of all positive pairs we discover ○ Precision: fraction of positive pairs within our results ● Use Supervised Machine Learning for improving it Next steps: ● Create features for each device pair ● Train a ranking ML model ● Take top most reliable predictions
  • 10. Features: Profiling ● Create a profile for each user ● Profile from sessions (30 minutes inactivity cut): ○ Session duration ○ Number of visits per session ○ Number of sessions with only one visit ○ Duration of breaks between and within sessions ○ Number of consecutive requests with a ≤ 1ms delay ○ Starts and ends of sessions ○ Number of unique domains per session ○ Similarity of domains/urls/titles within each session ○ For all features: min, mean, max and std
  • 11. Features: Device-Device Similarities ● |profile1.feature - profile2.feature| ● TF-IDF similarity of ○ Domains ○ Titles ○ URLs ● LSA similarity of the same ● 54 features in total
  • 12. Optimizing Precision: Ranking Model 1 1 0 1 0 1 0.90 0.87 0.2 0.3 0.7 Train 240k * 70 = 17m 100k-200k Top K Test 100k * 70 = 7m
  • 14. Cross-Validation FOLD 1 FOLD 2vs information leak!
  • 15. Cross-Validation ● Split the graph into non-overlapping regions ● For each region separately ○ Build ES index (i.e. apply filter) ○ Build a model ● Evaluation (AUC + F1): ○ Apply F1 model to F2 data ○ And vise versa F1 F2 0.90 0.87 0.7 0.90 0.87 0.7
  • 16. Evaluation Public/private test split: 50/50 ● During the competition: ○ Evaluation on 1st half of data ● After the competition: 2nd half ● P = 0.5 of real P Test normal F1 “real” evaluation function
  • 17. Choosing K ● Order the pairs by the probability ● For each K calculate P, R and F1 ● Select best K such that F1 is max ● 8th position
  • 18. Post-Competition What did others do? ● Using several candidate selection methods ● Stacking with rank features (by D. Dremov) ● Markov Clustering (by I. Bendyna)
  • 19. Rank Features source: http://gh.mltrainings.ru/presentations/Dremov_CIKMCup2016_DCA.pdf slide 9 ● Relative position of a node within a group ● Motivation: “local” within-group effect instead of global ● df_train.groupby('user_1')[feature].rank()
  • 20. Stacking (post competition) all features XGBoost ET best features XGBoost rank features 8th → 5th position
  • 21. Markov Clustering source: http://gh.mltrainings.ru/presentations/Bendyna_CIKMCup2016_DCA.pdf ● take a connected component ● add loops ● put into a Markov Matrix M ○ also called “Stochastic Matrix” ○ values in cols sum up to 1 ● calculate M ** n ○ ~ n Random Walk steps ● for each element M.v = M.v ** p ○ makes weak links weaker ● re-normalize and repeat Animation http://micans.org/mcl/ani/mcl-animation.html
  • 22. Links & Further Info ● Competition website: http://cikmcup.org/ ● Competition platform: https://competitions.codalab.org/competitions/11171 ● My solution: https://github.com/alexeygrigorev/cikm-cup-2016-cross-device ● Reports: http://cikmcup.org/workshop.html Self-promotion: ● http://alexeygrigorev.com/ ● contact@alexeygrigorev.com