Neural Models for Information Retrieval
Bhaskar Mitra, Nick Craswell
(Submitted on 3 May 2017)
Neural ranking models for information retrieval (IR) use shallow or deep neural networks to rank search results in response to a query. Traditional learning to rank models employ machine learning techniques over hand-crafted IR features. By contrast, neural models learn representations of language from raw text that can bridge the gap between query and document vocabulary. Unlike classical IR models, these new machine learning based approaches are data-hungry, requiring large scale training data before they can be deployed. This tutorial introduces basic concepts and intuitions behind neural IR models, and places them in the context of traditional retrieval models. We begin by introducing fundamental concepts of IR and different neural and non-neural approaches to learning vector representations of text. We then review shallow neural IR methods that employ pre-trained neural term embeddings without learning the IR task end-to-end. We introduce deep neural networks next, discussing popular deep architectures. Finally, we review the current DNN models for information retrieval. We conclude with a discussion on potential future directions for neural IR.
ACM SIGMOD日本支部第56回支部大会でお話しした、ICDE 2014の参加報告についての資料です。以下のような6部構成になっています。全190ページです。
・ICDE 2014を俯瞰してみる(5p~)
・ビッグデータ時代の新発想:もうデータは蓄えない(32p~)
Keynote, Running with Scissors: Fast Queries on Just-in-Time Databases
・見えない相手と協調作業:センサネットワーク上のデータ集約(64p~)
10 Year Most Influential Paper, Approximate Aggregation Techniques for Sensor Databases
・メインメモリデータベースがハードウェアトランザクショナルメモリを使ったら…(96p~)
Best Paper, Exploiting Hardware Transactional Memory in Main-Memory Databases
・過去の結果を再利用:ビューを用いた大規模グラフからのパターン発見(126p~)
Best Paper Runner-up, Answering Graph Pattern Queries Using Views
・アルゴリズムでゴリゴリ解決:大量のベクトルから類似ペアを厳密に見つけたい(155p~)
気になる論文, L2AP: Fast Cosine Similarity Search With Prefix L-2 Norm Bounds
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...Masumi Shirakawa
A deck of slides for "N-gram IDF: A Global Term Weighting Scheme Based on Information Distance" (Shirakawa et al.) that was presented at 24th International World Wide Web Conference (WWW 2015).
Neural Models for Information Retrieval
Bhaskar Mitra, Nick Craswell
(Submitted on 3 May 2017)
Neural ranking models for information retrieval (IR) use shallow or deep neural networks to rank search results in response to a query. Traditional learning to rank models employ machine learning techniques over hand-crafted IR features. By contrast, neural models learn representations of language from raw text that can bridge the gap between query and document vocabulary. Unlike classical IR models, these new machine learning based approaches are data-hungry, requiring large scale training data before they can be deployed. This tutorial introduces basic concepts and intuitions behind neural IR models, and places them in the context of traditional retrieval models. We begin by introducing fundamental concepts of IR and different neural and non-neural approaches to learning vector representations of text. We then review shallow neural IR methods that employ pre-trained neural term embeddings without learning the IR task end-to-end. We introduce deep neural networks next, discussing popular deep architectures. Finally, we review the current DNN models for information retrieval. We conclude with a discussion on potential future directions for neural IR.
ACM SIGMOD日本支部第56回支部大会でお話しした、ICDE 2014の参加報告についての資料です。以下のような6部構成になっています。全190ページです。
・ICDE 2014を俯瞰してみる(5p~)
・ビッグデータ時代の新発想:もうデータは蓄えない(32p~)
Keynote, Running with Scissors: Fast Queries on Just-in-Time Databases
・見えない相手と協調作業:センサネットワーク上のデータ集約(64p~)
10 Year Most Influential Paper, Approximate Aggregation Techniques for Sensor Databases
・メインメモリデータベースがハードウェアトランザクショナルメモリを使ったら…(96p~)
Best Paper, Exploiting Hardware Transactional Memory in Main-Memory Databases
・過去の結果を再利用:ビューを用いた大規模グラフからのパターン発見(126p~)
Best Paper Runner-up, Answering Graph Pattern Queries Using Views
・アルゴリズムでゴリゴリ解決:大量のベクトルから類似ペアを厳密に見つけたい(155p~)
気になる論文, L2AP: Fast Cosine Similarity Search With Prefix L-2 Norm Bounds
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...Masumi Shirakawa
A deck of slides for "N-gram IDF: A Global Term Weighting Scheme Based on Information Distance" (Shirakawa et al.) that was presented at 24th International World Wide Web Conference (WWW 2015).
2. Session 16: Content Analysis 2 – Topics の論文
p527.pdf
A Time-Based Collective Factorization for Topic Discovery
and Monitoring in News
Carmen Vaca (Politecnico di Milano & Escuela Superior Politecnica del Litoral), Amin Mantrach
(Yahoo! Labs), Alejandro Jaimes (Yahoo! Labs), Marco Saerens (Université de Louvain)
P539.pdf
The Dual-Sparse Topic Model: Mining Focused Topics and
Focused Terms in Short Text
Tianyi Lin (The Chinese University of Hong Kong), Wentao Tian (The Chinese University of Hong
Kong), Qiaozhu Mei (University of Michigan), Hong Cheng (The Chinese University of Hong Kong)
P551.pdf
Acquisition of Open-Domain Classes via Intersective
Semantics
Marius Paşca (Google Inc.)
2
3. Session 16: Content Analysis 2 – Topics の論文
p527.pdf
A Time-Based Collective Factorization for Topic Discovery
and Monitoring in News
Carmen Vaca (Politecnico di Milano & Escuela Superior Politecnica del Litoral), Amin Mantrach
(Yahoo! Labs), Alejandro Jaimes (Yahoo! Labs), Marco Saerens (Université de Louvain)
P539.pdf
The Dual-Sparse Topic Model: Mining Focused Topics and
Focused Terms in Short Text
Tianyi Lin (The Chinese University of Hong Kong), Wentao Tian (The Chinese University of Hong
Kong), Qiaozhu Mei (University of Michigan), Hong Cheng (The Chinese University of Hong Kong)
P551.pdf
Acquisition of Open-Domain Classes via Intersective
Semantics
Marius Paşca (Google Inc.)
3
9. Session 16: Content Analysis 2 – Topics の論文
p527.pdf
A Time-Based Collective Factorization for Topic Discovery
and Monitoring in News
Carmen Vaca (Politecnico di Milano & Escuela Superior Politecnica del Litoral), Amin Mantrach
(Yahoo! Labs), Alejandro Jaimes (Yahoo! Labs), Marco Saerens (Université de Louvain)
P539.pdf
The Dual-Sparse Topic Model: Mining Focused Topics and
Focused Terms in Short Text
Tianyi Lin (The Chinese University of Hong Kong), Wentao Tian (The Chinese University of Hong
Kong), Qiaozhu Mei (University of Michigan), Hong Cheng (The Chinese University of Hong Kong)
P551.pdf
Acquisition of Open-Domain Classes via Intersective
Semantics
Marius Paşca (Google Inc.)
9
13. 評価
DBLP, 20 Newsgroups, Twitterの3種類のデータ
13 Session 16: Content Analysis 2 - Topics 担当:白川(大阪大学)
The Dual-Sparse Topic Model: Mining Focused Topics and Focused Terms in Short Text
評価結果(論文より)
ユーザごとに
ツイートを
まとめたもの
データセット(論文より)
20 Newsgroupsの文書長を短くした
場合でも提案手法(DSPTM)が安定
提案手法(DsparseTM)が全般的に良い
Twitterだとほぼ1ツイート1トピックらしく,
Mixture of unigrams[Blei, JMLR03]がベスト
だが,それでも提案手法は結構良い
14. Session 16: Content Analysis 2 – Topics の論文
p527.pdf
A Time-Based Collective Factorization for Topic Discovery
and Monitoring in News
Carmen Vaca (Politecnico di Milano & Escuela Superior Politecnica del Litoral), Amin Mantrach
(Yahoo! Labs), Alejandro Jaimes (Yahoo! Labs), Marco Saerens (Université de Louvain)
P539.pdf
The Dual-Sparse Topic Model: Mining Focused Topics and
Focused Terms in Short Text
Tianyi Lin (The Chinese University of Hong Kong), Wentao Tian (The Chinese University of Hong
Kong), Qiaozhu Mei (University of Michigan), Hong Cheng (The Chinese University of Hong Kong)
P551.pdf
Acquisition of Open-Domain Classes via Intersective
Semantics
Marius Paşca (Google Inc.)
14