SlideShare a Scribd company logo
write your own data story!
Personalized Web Search 
Fri 11 Oct 2013 – Fri 10 Jan 2014 
194 Teams 
$9,000 cash prize 
Using Historical Logs of a search engine 
QUERIES 
RESULTS 
CLICKS 
and a set of new QUERIES and RESULTS 
rerank the RESULTS in order to optimize relevance 
34,573,630 Sessions with user id 
21,073,569 Queries 
64,693,054 Clicks 
~ 15GB
A METRIC FOR RELEVANCE RIGHT FROM THE LOG? 
ASSUMING WE SEARCH FOR "FRENCH NEWSPAPER", WE TAKE 
A LOOK AT THE LOGS.
DWELL TIME 
WE COMPUTE THE SO CALLED DWELL TIME OF A CLICK 
I.E. THE TIME ELAPSED BEFORE THE NEXT ACTION
DWELL TIME HAS BEEN SHOWN TO BE CORRELATED WITH 
THE RELEVANCE
GOOD WE HAVE A MEASURE OF RELEVANCE ! 
CAN WE GET AN OVERALL SCORE FOR OUR SEARCH ENGINE 
NOW?
Emphasis on relevant 
documents 
Discount per ranking 
Discount Cumulative Gain 
Normalized Discount Cumulative Gain 
Just Normalize Between 0 and 1
PERSONALIZED RERANKING 
IS ABOUT REORDERING THE N-BEST RESULTS BASED ON 
THE USER PAST SEARCH HISTORY 
Results Obtained in the contest: 
Original NCDG 0.79056 
ReRanked NCDG 0.80714 
Equivalent To 
~ Raising the rank of a relevant ( relevancy = 2) result 
from Rank #6 to Rank #5 on each query 
~ Raising the rank of a relevant ( relevancy = 2) result 
from Rank #6 to Rank #2 in 20% of the queries
No researcher. 
No experience in reranking. 
Not much experience in ML for most of us. 
Not exactly our job. No expectations. 
Kenji Lefevre 
37 
Algebraic Geometry 
Learning Python 
Christophe Bourguignat 
37 
Signal Processing Eng. 
Learning Scikit 
Mathieu Scordia 
24 
Data Scientist 
Paul Masurel 
33 
Soft. Engineer 
The Team
A-Team?
Data Hobbits
Understanding 
The Problem
53% OF THE COMPETITORS 
COULD NOT IMPROVE THE BASELINE 
Worse 
53% 
Better 
47%
IDEAL SETUP 
1. compute non-personalized rank 
2. select 10 best hits and serves them in order 
3. re-rank using log analysis. 
4. put new ranking algorithm in prod (yeah right!) 
5. compute NDCG on new logs 
6. … 
7. Profits !!
REAL SETUP 
1. compute non-personalized rank 
2. select 10 bests hits 
3. serve 10 bests hits ranked in random 
order 
4. re-rank using log analysis, including non-personalized 
rank as a feature 
5. compute score against the log with the 
former rank 
IDEAL 
rank 
serves them in order 
prod (yeah right!)
PROBLEM 
Users tend to click on the first few urls. 
User satisfaction metric is influenced by the display rank. 
Our score is not aligned with our goal. 
We cannot discriminate the effect of the signal 
of the non-personalized rank from effect of the display rank
PROMOTES 
OVER CONSERVATIVE RE-RANKING POLICY 
Even if we know for sure that the url with rank 9 would be clicked by the user if it was presented at 
rank 1, it would be probably a bad idea to rerank it to rank 1 in this contest. 
Average per session of the max position jump
Simple, point wise approach 
Session 1 Session 2 .... 
2 
1 
0 
For each (URL, Session) predict relevance (0,1 or 2)
Supervised Learning on History 
We split 27 days of the train dataset 24 (history) + 3 days (annotated). 
Stop randomly in the last 3 days at a “test" session (like Yandex) 
Train Set 
(24 history) 
Train Set 
(annotation) Test Set
How They Did It
Features Construction : 
Split Train & Validation 
Team Member work independantly
FEATURES 
The Existing Rank (base rank) 
Revisits (Query-(User)-URL) features and variants 
Query Features 
Cumulative Features 
User Click Habits Features 
Collaborative Filtering Features 
Seasonality Features
REVISITS 
In the past, when the user was displayed this url, with the exact same query 
what is the probability that : 
• satisfaction=2 
• satisfaction=1 
• satisfaction=0 
• miss (not-clicked) 
• skipped (after the last click) 
5 Conditional Probability Features 
1 An overall counter of display 
4 mean reciprocal rank 
(kind of the harmonic mean of the rank) 
1 snippet quality score 
(twisted formula used to compute 
snippet quality) 
11 Base Features
MANY VARIATIONS 
• (In the past|within the same sesssion), 
• (with this very query | whatever query | a subquery | a super query) 
• and was offered (this url/this domain) 
X2 
X 3 
X 2 
12 variants 
With the same user 
Without being the same user ( URL - query features) 
• Same Domain 
• Same URL 
• Same Query and Same URL 
3 variants 
15 Variants 
X 11 Base Features 
165 Features
Labelled 30 days data 
Features Construction : 
Team Member work independantly 
Learning : 
Team Member work independantly 
Split Train & Validation 
> 200 Potential Features 
on 30 days
Short Story 
Point Wise, Random Forest, 30 Features, 4th Place (*) 
Optimize & Train in ~ 1 hour (12 cores), 24 trees 
List Wise , LambdaMART, 90 Features, 1st Place (*) 
Trained in 2 days, 1135 Trees 
(*) A Yandex “PaceMaker" Team was also displaying results on the leaderboard and were 
at the first place during the whole competition even if not officially contestant
Lambda Mart 
Original Ranking Re Ranked 
13 errors 11 errors 
High Quality Hit 
Low Quality Hit 
Rank Net Gradient 
LambdaRank "Gradient" 
Gradient Boosted Trees 
with a special gradient called 
“Lambda Rank" 
From RankNet to LambdaRank to LambdaMART: An Overview 
Christopher J.C. Burges - Microsoft Research Technical Report MSR-TR-2010-82
Grid Search 
We are not doing typical classification here. It is extremely important to perform grid 
search directly against NDCG final score. 
NDCG “conservatism” end up with large “min samples per leaf” 
(between 40 and 80 )
Feature Selection 
Top-Down approach : Starting from 
a high number of features, 
iteratively removed subsets of 
features. This approach led to the 
subset of 90 features for the 
LambdaMart winning solutions 
(Similar strategy now implemented by 
sklearn.feature_selection.RFECV) 
Bottom-up approach : Starting from a low 
number of features, add the features that 
produce the best marginal improvement. 
Gave the 30 features that lead to the best 
solution with the point-wise approach.
Take Away 
Set up a Valid and Solid Cross Validation scheme 
Prototype with fast ML methods, optimize with boosting 
Be systematic in terms of feature selection 
Setup a reproductible workflows early on 
Split tasks when running as a team
Special Offer 
We offer a free server (with DSS) for 
teams running on Kaggle 
Competitions 
Conditions: 
- Be at least 3 people 
- Up to 3 three teams Max. sponsored 
per competition 
competitions@dataiku.com 
Florian DOUETTEAU 
florian.douetteau@daitaku.com
References 
Ranklib ( Implementation of LambdaMART) 
http://sourceforge.net/p/lemur/wiki/RankLib/ 
These Slides 
http://www.slideshare.net/Dataiku 
Blog Post About Additive Smoothing 
http://fumicoton.com/posts/bayesian_rating 
Blog Posts about the solution 
http://www.dataiku.com/blog/2014/01/14/winning-kaggle.html 
http://blog.kaggle.com/2014/02/06/winning-personalized-web-search-team-dataiku/ 
Contest Url 
https://www.kaggle.com/c/yandex-personalized-web-search-challenge 
Paper with Detailed Description 
http://research.microsoft.com/en-us/um/people/nickcr/wscd2014/papers/wscdchallenge2014dataiku.pdf 
Research Papers 
From RankNet to LambdaRank to LambdaMART: An Overview 
Christopher J.C. Burges - Microsoft Research Technical Report MSR-TR-2010-82 
Learning to rank using multiple classification and gradient boosting. 
P. Li, C. J. C. Burges, and Q. Wu. Mcrank - In NIPS, 2007

More Related Content

What's hot

Hadoop testing workshop - july 2013
Hadoop testing workshop - july 2013Hadoop testing workshop - july 2013
Hadoop testing workshop - july 2013
Ophir Cohen
 
Question Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep LearningQuestion Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep Learning
Lucidworks
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
lucenerevolution
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Databricks
 
Personalized Search at Sandia National Labs
Personalized Search at Sandia National LabsPersonalized Search at Sandia National Labs
Personalized Search at Sandia National Labs
Lucidworks
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
RTTS
 
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
sparktc
 
Apache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareApache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why Care
Databricks
 
Reduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchReduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective Search
Lucidworks
 
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Spark Summit
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
RTTS
 
Improve the Health of Your Data
Improve the Health of Your DataImprove the Health of Your Data
Improve the Health of Your Data
RTTS
 
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Databricks
 
Karen's Favourite Features of SQL Server 2016
Karen's Favourite Features of  SQL Server 2016Karen's Favourite Features of  SQL Server 2016
Karen's Favourite Features of SQL Server 2016
Karen Lopez
 
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku
 
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuireEmbracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Databricks
 
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution
RTTS
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
RTTS
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database World
Karen Lopez
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
Sri Ambati
 

What's hot (20)

Hadoop testing workshop - july 2013
Hadoop testing workshop - july 2013Hadoop testing workshop - july 2013
Hadoop testing workshop - july 2013
 
Question Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep LearningQuestion Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep Learning
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
 
Personalized Search at Sandia National Labs
Personalized Search at Sandia National LabsPersonalized Search at Sandia National Labs
Personalized Search at Sandia National Labs
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
 
Apache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareApache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why Care
 
Reduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchReduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective Search
 
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
 
Improve the Health of Your Data
Improve the Health of Your DataImprove the Health of Your Data
Improve the Health of Your Data
 
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
 
Karen's Favourite Features of SQL Server 2016
Karen's Favourite Features of  SQL Server 2016Karen's Favourite Features of  SQL Server 2016
Karen's Favourite Features of SQL Server 2016
 
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
 
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuireEmbracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
 
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database World
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
 

Similar to Florian Douetteau @ Dataiku

Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataВладимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Mail.ru Group
 
Play with Kaggle
Play with KagglePlay with Kaggle
Play with Kaggle
Matthieu Scordia
 
MongoDB in Denver: How Global Healthcare Exchange is Using MongoDB
MongoDB in Denver: How Global Healthcare Exchange is Using MongoDBMongoDB in Denver: How Global Healthcare Exchange is Using MongoDB
MongoDB in Denver: How Global Healthcare Exchange is Using MongoDB
MongoDB
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
MLconf
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
RTTS
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
Abhishek M Shivalingaiah
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
Ganesh Venkataraman
 
Click Log Mining CS598
Click Log Mining CS598Click Log Mining CS598
Click Log Mining CS598Shih-Wen Huang
 
Big Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityBig Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data Quality
RTTS
 
Welcome Webinar Slides
Welcome Webinar SlidesWelcome Webinar Slides
Welcome Webinar Slides
Sumo Logic
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
Justin Basilico
 
Winning performance challenges in oracle standard editions
Winning performance challenges in oracle standard editionsWinning performance challenges in oracle standard editions
Winning performance challenges in oracle standard editions
Pini Dibask
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
Jadna Almeida
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
Jadna Almeida
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
Modern Data Stack France
 
dd presentation.pdf
dd presentation.pdfdd presentation.pdf
dd presentation.pdf
AnSHiKa187943
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
MongoDB
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
MongoDB
 

Similar to Florian Douetteau @ Dataiku (20)

Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataВладимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
 
Play with Kaggle
Play with KagglePlay with Kaggle
Play with Kaggle
 
MongoDB in Denver: How Global Healthcare Exchange is Using MongoDB
MongoDB in Denver: How Global Healthcare Exchange is Using MongoDBMongoDB in Denver: How Global Healthcare Exchange is Using MongoDB
MongoDB in Denver: How Global Healthcare Exchange is Using MongoDB
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
kdd2015
kdd2015kdd2015
kdd2015
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
Click Log Mining CS598
Click Log Mining CS598Click Log Mining CS598
Click Log Mining CS598
 
Big Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityBig Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data Quality
 
Welcome Webinar Slides
Welcome Webinar SlidesWelcome Webinar Slides
Welcome Webinar Slides
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Winning performance challenges in oracle standard editions
Winning performance challenges in oracle standard editionsWinning performance challenges in oracle standard editions
Winning performance challenges in oracle standard editions
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
dd presentation.pdf
dd presentation.pdfdd presentation.pdf
dd presentation.pdf
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
 

More from PAPIs.io

Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
Shortening the time from analysis to deployment with ml as-a-service — Luiz A...Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
PAPIs.io
 
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
PAPIs.io
 
Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...
PAPIs.io
 
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
PAPIs.io
 
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
PAPIs.io
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
PAPIs.io
 
Building machine learning applications locally with Spark — Joel Pinho Lucas ...
Building machine learning applications locally with Spark — Joel Pinho Lucas ...Building machine learning applications locally with Spark — Joel Pinho Lucas ...
Building machine learning applications locally with Spark — Joel Pinho Lucas ...
PAPIs.io
 
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
PAPIs.io
 
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
PAPIs.io
 
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
PAPIs.io
 
Real-world applications of AI - Daniel Hulme @ PAPIs Connect
Real-world applications of AI - Daniel Hulme @ PAPIs ConnectReal-world applications of AI - Daniel Hulme @ PAPIs Connect
Real-world applications of AI - Daniel Hulme @ PAPIs Connect
PAPIs.io
 
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
PAPIs.io
 
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
PAPIs.io
 
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs ConnectDemystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
PAPIs.io
 
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs ConnectPredictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
PAPIs.io
 
Microdecision making in financial services - Greg Lamp @ PAPIs Connect
Microdecision making in financial services - Greg Lamp @ PAPIs ConnectMicrodecision making in financial services - Greg Lamp @ PAPIs Connect
Microdecision making in financial services - Greg Lamp @ PAPIs Connect
PAPIs.io
 
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
PAPIs.io
 
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
PAPIs.io
 
How to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
How to predict the future of shopping - Ulrich Kerzel @ PAPIs ConnectHow to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
How to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
PAPIs.io
 
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
PAPIs.io
 

More from PAPIs.io (20)

Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
Shortening the time from analysis to deployment with ml as-a-service — Luiz A...Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
Shortening the time from analysis to deployment with ml as-a-service — Luiz A...
 
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
Feature engineering — HJ Van Veen (Nubank) @@PAPIs Connect — São Paulo 2017
 
Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...Extracting information from images using deep learning and transfer learning ...
Extracting information from images using deep learning and transfer learning ...
 
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
Discovering the hidden treasure of data using graph analytic — Ana Paula Appe...
 
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
Deep learning for sentiment analysis — André Barbosa (elo7) @PAPIs Connect — ...
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
Building machine learning applications locally with Spark — Joel Pinho Lucas ...
Building machine learning applications locally with Spark — Joel Pinho Lucas ...Building machine learning applications locally with Spark — Joel Pinho Lucas ...
Building machine learning applications locally with Spark — Joel Pinho Lucas ...
 
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
Battery log data mining — Ramon Oliveira (Datart) @PAPIs Connect — São Paulo ...
 
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
A tensorflow recommending system for news — Fabrício Vargas Matos (Hearst tv)...
 
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016
 
Real-world applications of AI - Daniel Hulme @ PAPIs Connect
Real-world applications of AI - Daniel Hulme @ PAPIs ConnectReal-world applications of AI - Daniel Hulme @ PAPIs Connect
Real-world applications of AI - Daniel Hulme @ PAPIs Connect
 
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
 
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
Revolutionizing Offline Retail Pricing & Promotions with ML - Daniel Guhl @ P...
 
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs ConnectDemystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
Demystifying Deep Learning - Roberto Paredes Palacios @ PAPIs Connect
 
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs ConnectPredictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
Predictive APIs: What about Banking? - Natalino Busa @ PAPIs Connect
 
Microdecision making in financial services - Greg Lamp @ PAPIs Connect
Microdecision making in financial services - Greg Lamp @ PAPIs ConnectMicrodecision making in financial services - Greg Lamp @ PAPIs Connect
Microdecision making in financial services - Greg Lamp @ PAPIs Connect
 
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
Engineering the Future of Our Choice with General AI - JoEllen Lukavec Koeste...
 
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
 
How to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
How to predict the future of shopping - Ulrich Kerzel @ PAPIs ConnectHow to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
How to predict the future of shopping - Ulrich Kerzel @ PAPIs Connect
 
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs C...
 

Recently uploaded

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 

Florian Douetteau @ Dataiku

  • 1. write your own data story!
  • 2. Personalized Web Search Fri 11 Oct 2013 – Fri 10 Jan 2014 194 Teams $9,000 cash prize Using Historical Logs of a search engine QUERIES RESULTS CLICKS and a set of new QUERIES and RESULTS rerank the RESULTS in order to optimize relevance 34,573,630 Sessions with user id 21,073,569 Queries 64,693,054 Clicks ~ 15GB
  • 3. A METRIC FOR RELEVANCE RIGHT FROM THE LOG? ASSUMING WE SEARCH FOR "FRENCH NEWSPAPER", WE TAKE A LOOK AT THE LOGS.
  • 4. DWELL TIME WE COMPUTE THE SO CALLED DWELL TIME OF A CLICK I.E. THE TIME ELAPSED BEFORE THE NEXT ACTION
  • 5. DWELL TIME HAS BEEN SHOWN TO BE CORRELATED WITH THE RELEVANCE
  • 6. GOOD WE HAVE A MEASURE OF RELEVANCE ! CAN WE GET AN OVERALL SCORE FOR OUR SEARCH ENGINE NOW?
  • 7. Emphasis on relevant documents Discount per ranking Discount Cumulative Gain Normalized Discount Cumulative Gain Just Normalize Between 0 and 1
  • 8. PERSONALIZED RERANKING IS ABOUT REORDERING THE N-BEST RESULTS BASED ON THE USER PAST SEARCH HISTORY Results Obtained in the contest: Original NCDG 0.79056 ReRanked NCDG 0.80714 Equivalent To ~ Raising the rank of a relevant ( relevancy = 2) result from Rank #6 to Rank #5 on each query ~ Raising the rank of a relevant ( relevancy = 2) result from Rank #6 to Rank #2 in 20% of the queries
  • 9. No researcher. No experience in reranking. Not much experience in ML for most of us. Not exactly our job. No expectations. Kenji Lefevre 37 Algebraic Geometry Learning Python Christophe Bourguignat 37 Signal Processing Eng. Learning Scikit Mathieu Scordia 24 Data Scientist Paul Masurel 33 Soft. Engineer The Team
  • 13. 53% OF THE COMPETITORS COULD NOT IMPROVE THE BASELINE Worse 53% Better 47%
  • 14. IDEAL SETUP 1. compute non-personalized rank 2. select 10 best hits and serves them in order 3. re-rank using log analysis. 4. put new ranking algorithm in prod (yeah right!) 5. compute NDCG on new logs 6. … 7. Profits !!
  • 15. REAL SETUP 1. compute non-personalized rank 2. select 10 bests hits 3. serve 10 bests hits ranked in random order 4. re-rank using log analysis, including non-personalized rank as a feature 5. compute score against the log with the former rank IDEAL rank serves them in order prod (yeah right!)
  • 16. PROBLEM Users tend to click on the first few urls. User satisfaction metric is influenced by the display rank. Our score is not aligned with our goal. We cannot discriminate the effect of the signal of the non-personalized rank from effect of the display rank
  • 17. PROMOTES OVER CONSERVATIVE RE-RANKING POLICY Even if we know for sure that the url with rank 9 would be clicked by the user if it was presented at rank 1, it would be probably a bad idea to rerank it to rank 1 in this contest. Average per session of the max position jump
  • 18. Simple, point wise approach Session 1 Session 2 .... 2 1 0 For each (URL, Session) predict relevance (0,1 or 2)
  • 19. Supervised Learning on History We split 27 days of the train dataset 24 (history) + 3 days (annotated). Stop randomly in the last 3 days at a “test" session (like Yandex) Train Set (24 history) Train Set (annotation) Test Set
  • 21. Features Construction : Split Train & Validation Team Member work independantly
  • 22. FEATURES The Existing Rank (base rank) Revisits (Query-(User)-URL) features and variants Query Features Cumulative Features User Click Habits Features Collaborative Filtering Features Seasonality Features
  • 23. REVISITS In the past, when the user was displayed this url, with the exact same query what is the probability that : • satisfaction=2 • satisfaction=1 • satisfaction=0 • miss (not-clicked) • skipped (after the last click) 5 Conditional Probability Features 1 An overall counter of display 4 mean reciprocal rank (kind of the harmonic mean of the rank) 1 snippet quality score (twisted formula used to compute snippet quality) 11 Base Features
  • 24. MANY VARIATIONS • (In the past|within the same sesssion), • (with this very query | whatever query | a subquery | a super query) • and was offered (this url/this domain) X2 X 3 X 2 12 variants With the same user Without being the same user ( URL - query features) • Same Domain • Same URL • Same Query and Same URL 3 variants 15 Variants X 11 Base Features 165 Features
  • 25. Labelled 30 days data Features Construction : Team Member work independantly Learning : Team Member work independantly Split Train & Validation > 200 Potential Features on 30 days
  • 26. Short Story Point Wise, Random Forest, 30 Features, 4th Place (*) Optimize & Train in ~ 1 hour (12 cores), 24 trees List Wise , LambdaMART, 90 Features, 1st Place (*) Trained in 2 days, 1135 Trees (*) A Yandex “PaceMaker" Team was also displaying results on the leaderboard and were at the first place during the whole competition even if not officially contestant
  • 27. Lambda Mart Original Ranking Re Ranked 13 errors 11 errors High Quality Hit Low Quality Hit Rank Net Gradient LambdaRank "Gradient" Gradient Boosted Trees with a special gradient called “Lambda Rank" From RankNet to LambdaRank to LambdaMART: An Overview Christopher J.C. Burges - Microsoft Research Technical Report MSR-TR-2010-82
  • 28. Grid Search We are not doing typical classification here. It is extremely important to perform grid search directly against NDCG final score. NDCG “conservatism” end up with large “min samples per leaf” (between 40 and 80 )
  • 29. Feature Selection Top-Down approach : Starting from a high number of features, iteratively removed subsets of features. This approach led to the subset of 90 features for the LambdaMart winning solutions (Similar strategy now implemented by sklearn.feature_selection.RFECV) Bottom-up approach : Starting from a low number of features, add the features that produce the best marginal improvement. Gave the 30 features that lead to the best solution with the point-wise approach.
  • 30. Take Away Set up a Valid and Solid Cross Validation scheme Prototype with fast ML methods, optimize with boosting Be systematic in terms of feature selection Setup a reproductible workflows early on Split tasks when running as a team
  • 31. Special Offer We offer a free server (with DSS) for teams running on Kaggle Competitions Conditions: - Be at least 3 people - Up to 3 three teams Max. sponsored per competition competitions@dataiku.com Florian DOUETTEAU florian.douetteau@daitaku.com
  • 32. References Ranklib ( Implementation of LambdaMART) http://sourceforge.net/p/lemur/wiki/RankLib/ These Slides http://www.slideshare.net/Dataiku Blog Post About Additive Smoothing http://fumicoton.com/posts/bayesian_rating Blog Posts about the solution http://www.dataiku.com/blog/2014/01/14/winning-kaggle.html http://blog.kaggle.com/2014/02/06/winning-personalized-web-search-team-dataiku/ Contest Url https://www.kaggle.com/c/yandex-personalized-web-search-challenge Paper with Detailed Description http://research.microsoft.com/en-us/um/people/nickcr/wscd2014/papers/wscdchallenge2014dataiku.pdf Research Papers From RankNet to LambdaRank to LambdaMART: An Overview Christopher J.C. Burges - Microsoft Research Technical Report MSR-TR-2010-82 Learning to rank using multiple classification and gradient boosting. P. Li, C. J. C. Burges, and Q. Wu. Mcrank - In NIPS, 2007