Presented at the 32nd ACM Symposium on Applied Computing (SAC2017)
Paper: http://www.lsc.cs.titech.ac.jp/keyaki/paper/SAC2017_Keyaki_cameraReady.pdf
Data set: http://www.lsc.cs.titech.ac.jp/keyaki/en/data/
Web site: http://www.lsc.cs.titech.ac.jp/keyaki/en/
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
Part-of-speech Tagging for Web Search Queries Using a Large-scale Web Corpus
1. ◯Atsushi Keyaki†, Jun Miyazaki†
†: Tokyo Institute of Technology,
Japan
Part-‐‑of-‐‑speech Tagging for
Web Search Queries using a
Large-‐‑scale Web Corpus
SAC2017 IAR
2. Objective
• Accurate part-of-speech (POS) tagging to Web
queries
o POS tags are beneficial in accurate IR
• Different search strategy per POS tag [1]
• Identifying unnecessary data with POS tags [2]
o Example
• Query: “discovery channel”
• Doc: “Victim’s discovery is broadcasted by the channel”
2
[1] Crestani et al.: “Short Queries, Natural Language and Spoken Document
Retrieval: Experiments at Glasgow University”, TREC-‐‑6, 1998.
[2] Chowdhury and Mccabe: “Improving Information Retrieval Systems using
Part of Speech Tagging”, Univ. of Maryland, 1993.
POS tag mismatch may cause false positive
TV program (proper nouns)
common noun common
noun
3. Difficulty in query POS tagging
• Characteristics of Web query
o Length is short (composed of a few words)
o Capitalization is missing
o Word order is fairly free
• Solution of related work [3][4]
o Utilizing the results of sentence-level morphological analysis
• Sentences are based on natural language grammar
• Results of sentence-level morphological analysis are accurate
3
Difficult to correctly identify POS tags
with existing morphological analysis tool
[3] Bendersky et al.: "ʺStructural Annotation of Search Queries Using Pseudo
Relevance Feedback"ʺ, CIKM2010.
[4] K. Ganchev et al.: "ʺUsing Search-‐‑Logs to Improve Query Tagging"ʺ, ACL2012.
developed for
natural language
Sentence: “We stayed at Rif Carlton.”
Query : “rif carlton”
4. Difficulty in query POS tagging
• Characteristics of Web query
o Length is short (composed of a few words)
o Capitalization is missing
o Word order is fairly free
• Solution of related work [3][4]
o Utilizing the results of sentence-level morphological analysis
• Sentences are based on natural language grammar
• Results of sentence-level morphological analysis are accurate
4
Difficult to correctly identify POS tags
with existing morphological analysis tool
[3] Bendersky et al.: "ʺStructural Annotation of Search Queries Using Pseudo
Relevance Feedback"ʺ, CIKM2010.
[4] K. Ganchev et al.: "ʺUsing Search-‐‑Logs to Improve Query Tagging"ʺ, ACL2012.
developed for
natural language
Sentence: “We stayed at Rif Carlton.”
pronoun verb particle proper noun
Query : “rif carlton”
5. Difficulty in query POS tagging
• Characteristics of Web query
o Length is short (composed of a few words)
o Capitalization is missing
o Word order is fairly free
• Solution of related work [3][4]
o Utilizing the results of sentence-level morphological analysis
• Sentences are based on natural language grammar
• Results of sentence-level morphological analysis are accurate
5
Difficult to correctly identify POS tags
with existing morphological analysis tool
[3] Bendersky et al.: "ʺStructural Annotation of Search Queries Using Pseudo
Relevance Feedback"ʺ, CIKM2010.
[4] K. Ganchev et al.: "ʺUsing Search-‐‑Logs to Improve Query Tagging"ʺ, ACL2012.
Sentence: “We stayed at Rif Carlton.”
pronoun verb particle proper noun
proper nounQuery : “rif carlton”
developed for
natural language
6. Difficulty in query POS tagging
• Characteristics of Web query
o Length is short (composed of a few words)
o Capitalization is missing
o Word order is fairly free
• Solution of related work [3][4]
o Utilizing the results of sentence-level morphological analysis
• Sentences are based on natural language grammar
• Results of sentence-level morphological analysis are accurate
6
Difficult to correctly identify POS tags
with existing morphological analysis tool
[3] Bendersky et al.: "ʺStructural Annotation of Search Queries Using Pseudo
Relevance Feedback"ʺ, CIKM2010.
[4] K. Ganchev et al.: "ʺUsing Search-‐‑Logs to Improve Query Tagging"ʺ, ACL2012.
Sentence: “We stayed at Rif Carlton.”
pronoun verb particle proper noun
proper nounQuery : “rif carlton”
developed for
natural language
Frequently
assigned POS tag
is employed
7. Our approach
• Related study
o Using sentence-level morphological analysis of
• Search results [3]
• Snippet from search logs [4]
o Considering just freq. of assigned POS tags
• Our approach
o Taking account of global statistics from large corpus
• Easily available, considering long tail
o Considering co-occurrence of query terms
April 5, 2017SAC2017 IAR 7
[3] Bendersky et al.: "ʺStructural Annotation of Search Queries Using Pseudo
Relevance Feedback"ʺ, CIKM2010.
[4] K. Ganchev et al.: "ʺUsing Search-‐‑Logs to Improve Query Tagging"ʺ, ACL2012.
A small number of highly relevant information
User feedback/search log is not always available
8. Preliminary investigation
• Morphological analysis to Web queries
o Queries
• TREC Web track topics (200 queries from 2009-2012)
o Oracle POS tags are annotated by three assessors
o Referring to description (information need)
o Morphological analysis tool
• Stanford Log-linear Part-Of-Speech Tagger [5]
o Model
• Default model
• Caseless model
o Not consider capitalization information during training
o Try to solve “Capitalization is missing” problem
April 5, 2017SAC2017 IAR 8
[5] Toutanova et al.: "ʺFeature-‐‑Rich Part-‐‑of-‐‑Speech Tagging
with a Cyclic Dependency Network"ʺ, NAACL 2003.
High agreement
Kappa: 0.98
9. Summary of error analysis
• Default model
o Only half of query terms were assigned correct POS tags
o Almost all of proper nouns were NOT identified
• 72% of proper nouns are mistakenly assigned as common nouns
• Error: “obama”, “india”, “ritz carlton”, “discovery channel”
• Caseless model
o Around 75% of query terms were assigned correct POS
tags
o Many proper nouns were identified
• Common nouns are mistakenly identified as proper nouns
• Errors caused by a partial grammatical rule
o “lower heart rate”
o “gs pay rate”
April 5, 2017SAC2017 IAR 9
verb adjective
common noun verb
: Adjectives come before common nouns
: Verbs come after a subject
10. Proposed POS tagging
• Summary of the error analysis
o Proper nouns/common nouns cannot be identified
• Problem1: Capitalization is missing
o Grammatical rules are mistakenly applied
• Problem2: Word order is fairly free
• Related study
o A small num. of highly relevant information
• Problem3: User feedback and user log are not always available
• Approach
o Sol-P1: Sentence-level morphological analysis
o Sol-P2: Proposing a POS tagging not based on word order
o Sol-P3: Large-scale Web corpus (easily available)
o Building the term-POS database (TPDB)
• Morphological analysis are applied offline
April 5, 2017SAC2017 IAR 10
11. Processing flow
April 5, 2017SAC2017 IAR 11
Large-scale
Web corpus
S1 tA/P1 tB/P2 tC/P3tA tB tC
tA tC tD
tC tE tA tF
tA/P1 tC/P4 tD/P5
tC/P3 tE/P1 tA/P2 tF/P1
tB tD tB/P2 tD/P3
Morphological
analysis
S2
S3
S4
S1
S2
S3
S4
TPDB
tA/P1 tB/P2 tC/P3
tA/P1 tC/P4 tD/P5
tC/P3 tE/P1 tA/P2 tA/P1
S1
S2
S3
tA tC Query
tA/P1 tC/P3
tA/P1 tC/P4
Scoring
method
Offline Online
Insert
12. Scoring for POS tagging
• Design principle
o Frequently appearing POS tags in the corpus are assigned to queries
o POS tags of a sentence are emphasized when the sentence contains
more kinds of query terms
• Co-occurrence of query terms is a useful clue
• Step of scoring
o Retrieving entries which contain query terms from TPDB
o Braking down into pairs of query terms
• Query: “tA tB tC”
o Counting entries per the term-POS pairs for each query term pair
• Query term pair: {tA tB}
o Scoring with three proposed methods
April 5, 2017 12
{tA tB} {tA tC} {tB tC}
tA/P1 tB/P2 5 0.33 (5/15)
tA/P1 tB/P3 3 0.20 (3/15)
tA/P2 tB/P4 7 0.47 (7/15)
freq. normalized freq. num. of entries
containing
tA/P1 and tB/P2
13. Three proposed methods
• MaxFreq
o The most frequently appearing
POS tag (highest freq.) is assigned
• MostLikelihood
o The highest normalized freq. is
assigned
o MaxFreq may be affected by
frequently appearing terms
• AllCombi
o POS tag of the highest sum of the
term-POS frequency is assigned
o MaxFreq and MostLikelihood
only focus on a POS tag with the
highest frequency/normalized
frequency
o More diversified context including
long tail can be considered
April 5, 2017SAC2017 IAR 13
Query:
“tA tB tC”
tA:tB
tA/P1 tB/P2 5 0.33
tA/P1 tB/P3 3 0.20
tA/P2 tB/P4 7 0.47
tA:tC
tA/P1 tC/P2 3 0.43
tA/P3 tC/P3 4 0.57
tB:tC
tB/P1 tC/P2 5 0.5
tB/P2 tC/P2 5 0.5
freq.
normalized
freq.
14. Three proposed methods
• MaxFreq
o The most frequently appearing
POS tag (highest freq.) is assigned
• MostLikelihood
o The highest normalized freq. is
assigned
o MaxFreq may be affected by
frequently appearing terms
• AllCombi
o POS tag of the highest sum of the
term-POS frequency is assigned
o MaxFreq and MostLikelihood
only focus on a POS tag with the
highest frequency/normalized
frequency
o More diversified context including
long tail can be considered
April 5, 2017SAC2017 IAR 14
Query:
“tA tB tC”
tA:tB
tA/P1 tB/P2 5 0.33
tA/P1 tB/P3 3 0.20
tA/P2 tB/P4 7 0.47
tB:tC
tB/P1 tC/P2 5 0.5
tB/P2 tC/P2 5 0.5
freq.
normalized
freq.
tA/P2
tA:tC
tA/P1 tC/P2 3 0.43
tA/P3 tC/P3 4 0.57
15. Three proposed methods
• MaxFreq
o The most frequently appearing
POS tag (highest freq.) is assigned
• MostLikelihood
o The highest normalized freq. is
assigned
o MaxFreq may be affected by
frequently appearing terms
• AllCombi
o POS tag of the highest sum of the
term-POS frequency is assigned
o MaxFreq and MostLikelihood
only focus on a POS tag with the
highest frequency/normalized
frequency
o More diversified context including
long tail can be considered
April 5, 2017SAC2017 IAR 15
tB:tC
tB/P1 tC/P2 5 0.5
tB/P2 tC/P2 5 0.5
freq.
normalized
freq.
Query:
“tA tB tC”
tA:tB
tA/P1 tB/P2 5 0.33
tA/P1 tB/P3 3 0.20
tA/P2 tB/P4 7 0.47
tA:tC
tA/P1 tC/P2 3 0.43
tA/P3 tC/P3 4 0.57
16. Three proposed methods
• MaxFreq
o The most frequently appearing
POS tag (highest freq.) is assigned
• MostLikelihood
o The highest normalized freq. is
assigned
o MaxFreq may be affected by
frequently appearing terms
• AllCombi
o POS tag of the highest sum of the
term-POS frequency is assigned
o MaxFreq and MostLikelihood
only focus on a POS tag with the
highest frequency/normalized
frequency
o More diversified context including
long tail can be considered
April 5, 2017SAC2017 IAR 16
tB:tC
tB/P1 tC/P2 5 0.5
tB/P2 tC/P2 5 0.5
freq.
normalized
freq.
tA/P3
Query:
“tA tB tC”
tA:tB
tA/P1 tB/P2 5 0.33
tA/P1 tB/P3 3 0.20
tA/P2 tB/P4 7 0.47
tA:tC
tA/P1 tC/P2 3 0.43
tA/P3 tC/P3 4 0.57
17. Three proposed methods
• MaxFreq
o The most frequently appearing
POS tag (highest freq.) is assigned
• MostLikelihood
o The highest normalized freq. is
assigned
o MaxFreq may be affected by
frequently appearing terms
• AllCombi
o POS tag of the highest sum of the
term-POS frequency is assigned
o MaxFreq and MostLikelihood
only focus on a POS tag with the
highest frequency/normalized
frequency
o More diversified context including
long tail can be considered
April 5, 2017SAC2017 IAR 17
tB:tC
tB/P1 tC/P2 5 0.5
tB/P2 tC/P2 5 0.5
freq.
normalized
freq.
tA/P1
Query:
“tA tB tC”
tA:tB
tA/P1 tB/P2 5 0.33
tA/P1 tB/P3 3 0.20
tA/P2 tB/P4 7 0.47
tA:tC
tA/P1 tC/P2 3 0.43
tA/P3 tC/P3 4 0.57
18. Three proposed methods
• MaxFreq
o The most frequently appearing
POS tag (highest freq.) is assigned
• MostLikelihood
o The highest normalized freq. is
assigned
o MaxFreq may be affected by
frequently appearing terms
• AllCombi
o POS tag of the highest sum of the
term-POS frequency is assigned
o MaxFreq and MostLikelihood
only focus on a POS tag with the
highest frequency/normalized
frequency
o More diversified context including
long tail can be considered
April 5, 2017SAC2017 IAR 18
tB:tC
tB/P1 tC/P2 5 0.5
tB/P2 tC/P2 5 0.5
freq.
normalized
freq.
tA/P1
Query:
“tA tB tC”
tA:tB
tA/P1 tB/P2 5 0.33
tA/P1 tB/P3 3 0.20
tA/P2 tB/P4 7 0.47
tA:tC
tA/P1 tC/P2 3 0.43
tA/P3 tC/P3 4 0.57
11
19. Experiment
• Datasets
o TREC Web track topics
• 200 queries from 2009-2012
o MS-251
• Microsoft search log used in related studies [3][4]
• Large-scale Web corpus
o ClueWeb09 Category B
• 50 million Web documents
• Evaluation methods
o Proposed methods: MaxFreq, MostLikelihood, AllCombi
o Existing methods: Stanford, Caseless, SingleFreq
April 5, 2017SAC2017 IAR 19
[3] Bendersky et al.: "ʺStructural Annotation of Search Queries Using Pseudo
Relevance Feedback"ʺ, CIKM2010.
[4] K. Ganchev et al.: "ʺUsing Search-‐‑Logs to Improve Query Tagging"ʺ, ACL2012.
The most frequently appearing POS tag is assigned
Skip because the trend is the same
20. POS-‐‑tagged Web track topics
• AllCombi: the highest for all terms, common noun, proper noun
o Good at judging nouns
o Considering more diversified context is useful
• Global statistics from large-scale Web corpus is useful
• MaxFreq and MostLikelihood: the highest for common noun, verb,
adjective
• Every proposed method significantly outperformed (VS Caseless)
April 5, 2017SAC2017 IAR 20
Precision All query
terms
Common
noun
Proper
noun
Verb Adjective sign test with
Caseless
MaxFreq .814 .825 .833 .769 .647 p < 0.05
MostLikelihood .814 .825 .833 .769 .647 p < 0.05
AllCombi .821 .825 .860 .714 .629 p < 0.01
Caseless .763 .789 .751 .733 .690
SingleFreq .702 .775 .670 .533 .581
Stanford .547 .550 1.0 .722 .451
21. Effect of the proposed method
• AllCombi correctly identified many query terms
• Some errors by partial grammatical rules still remain
• Negative effects of the proposed method
o “president” in the corpus are often identified as proper
nouns
• Need to normalize term weights
April 5, 2017SAC2017 IAR 21
Query Stanford AllCombi
obama
india
rif carlton
lower heart rate
gs pay rate
president united states
22. Conclusion
• POS tagging to Web queries
o Results of sentence-level morphological analysis
o Large-scale Web corpus
o Proposed three scoring methods
• Experiments
o Considering more diversified context is useful
o The best proposed method differs by POS tag
o Overwhelmed existing tools and existing studies
• Future work
o Combination of proposed methods may improve accuracy
o Database schema design for fast POS tagging
April 5, 2017SAC2017 IAR 22
23. Default model
April 5, 2017SAC2017 IAR 23
POS tags Precision Recall
Common noun .550 .985
Proper noun 1.0 .010
Verb .722 .867
Adjective .451 .958
All query terms .547 .547
• Nearly half of query terms
were assigned correct POS tags
• Almost all of proper nouns
were not identified
o 72% of proper nouns are
mistakenly assigned as common
nouns
o Error: “obama”, “india”, “ritz
carlton”, “discovery channel”
• Errors caused by a partial grammatical rule
o “lower heart rate”
o “gs pay rate”
verb adjective
common noun verb
: Adjectives come before common nouns
: Verbs come after a subject
24. Caseless model
• Precision and recall improved overall
• Many proper nouns were identified
o 31% of proper nouns are mistakenly assigned as common nouns
o Precision is decreased
• Harm of partial grammatical rules still exist
o “discovery channel store”
April 5, 2017SAC2017 IAR 24
common noun proper noun
POS tags Precision Recall
Common noun .789 .769
Proper noun .751 .640
Verb .733 .733
Adjective .690 .833
All query terms .763 .763
25. MS-‐‑251
• The trend of the proposed methods is the same
o The ratio of POS tags affected the order
• AllCombi
• MaxFreq, MostLikelihood
o The proposed methods are better than [4]
April 5, 2017SAC2017 IAR 25
Precision
MaxFreq .890
MostLikelihood .895
AllCombi .893
the best method in [4] .858
[4] K. Ganchev et al.: "ʺUsing Search-‐‑Logs to Improve Query Tagging"ʺ, ACL2012.
Good at judging nouns
Good at judging verb, adjective