Tuning Topical Queries through Context Vocabulary Enrichment: A Corpus-Based Approach

Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based Approach Carlos M Lorenzetti Ana G Maguitman [email_address] [email_address] Universidad Nacional del Sur Av. L.N. Alem 1253 Bahía Blanca - Argentina Grupo de Investigación en Recuperación de Información y Gestión del Conocimiento Laboratorio de Investigación y Desarrollo en Inteligencia Artificial CONICET AGENCIA

Context–Based Search Java? Animals Computers Consumables Entertainment Geography Flora Ships

Context–Based Search Context Articles Newspapers Others

Context–Based Search Java? Context Articles Newspapers Others Geography

Query tuning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Different Role of Terms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Different Role of Terms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Recall

Different Role of Terms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Precision Recall

[object Object],[object Object]

Descriptors and Discriminators Java Language Applets Code Topic: Java Virtual Machine NetBeans Computers JVM Ruby Programming JDK Virtual Machine

Descriptors and Discriminators Java Language Applets Code Topic: Java Virtual Machine NetBeans Computers JVM Ruby Programming JDK Virtual Machine Good descriptors

Descriptors and Discriminators Java Language Applets Code Topic: Java Virtual Machine NetBeans Computers JVM Ruby Programming JDK Virtual Machine Good discriminators

Documents Descriptors and Discriminators Number of occurrences of term j in document i Topic: Java Virtual Machine Initial Context H ,[object Object],[object Object],[object Object],[object Object],(1) (2) (3) (4) 0 3 3 0 0 1 2 0 1 0 0 4 2 0 0 4 3 0 0 3 0 2 2 0 1 1 2 0 0 1 1 0 0 2 3 6 2 5 5 2 0 jdk 0 jvm 0 province 0 island 0 coffee 3 programming 1 language 1 virtual 2 machine 4 java

Documents Descriptors Topic: Java Virtual Machine Initial Context Descriptive power of a term in a document 0 jdk 0 jvm 0 province 0 island 0 coffee 3 programming 1 language 1 virtual 2 machine 4 java 0,000 0,000 0,000 0,000 0,000 0,539 0,180 0,180 0,359 0,718

Documents Discriminators Topic: Java Virtual Machine Initial Context Discriminating power of a term in a document 0 jdk 0 jvm 0 province 0 island 0 coffee 3 programming 1 language 1 virtual 2 machine 4 java 0,000 0,000 0,000 0,000 0,000 0,577 0,500 0,577 0,500 0,447

Documents comparison criteria Documents similarity K 1 K 3 K 2 d 2 d 1  Cosine similarity

Topics Descriptors Topic: Java Virtual Machine Initial Context Term descriptive power in a topic of a document 0 jdk 0 jvm 0 province 0 island 0 coffee 3 programming 1 language 1 virtual 2 machine 4 java 0,014 0,032 0,040 0,040 0,055 0,064 0,089 0,124 0,158 0,385

Topics Discriminators Topic: Java Virtual Machine Initial Context Term discriminating power in a topic of a document 0 province 0 island 0 coffee 4 java 1 language 2 machine 3 programming 1 virtual 0 jdk 0 jvm 0,385 0,385 0,385 0,493 0,517 0,524 0,566 0,566 0,848 0,848

Proposed Algorithm Context w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w m-1 w m w m-2 w 9 . . . Roulette query 01 query 02 query 03 query n result 03 result 01 result 02 result n w 0,5 w 0,25 . . . w 0,1 1 2 m DESCRIPTORS DESCRIPTORS w 0,4 w 0,37 . . . w 0,01 1 2 m DISCRIMINATORS DISCRIMINATORS 1 2 4 3 Terms

Evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],1 st level 2 nd level 3 rd level Top Home Science Arts Cooking Family Childcare

Evaluation –  N Similarity Context update Top/Computers/Open_Source/Software Query formulation and retrieval process 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120 140 160 180 iteration novelty-driven similarity Maximum Average Minimum [0.5866; 0.6073] 0.5970 best [0.0618; 0.0704] 0.0661 1 st 95% CI Mean  N

Evaluation –  N Similarity [0.0822; 0.0924] 0.087 Baseline [0.5866; 0.6073] 0.597 Incremental [0.0710; 0.0803] 0.075 Bo1-DFR 95% CI Mean  N

Evaluation – Precision   Incremental (66.96%) Bo1-DFR (24.33%) Baseline (8.7%)  [0.2461; 0.2863] 0.266 Baseline [0.3325; 0.3764] 0.354 Incremental [0.2859; 0.3298] 0.307 Bo1-DFR 95% CI Mean Precision

Evaluation – Semantic Precision   Incremental (65.18%) Bo1-DFR (27.90%) Baseline (6.92%)  [0.5383; 0.5679] 0.553 Baseline [0.6068; 0.6372] 0.622 Incremental [0.5750; 0.6066] 0.590 Bo1-DFR 95% CI Mean Precision S

Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Thank you! CONICET AGENCIA Laboratorio de Investigación y Desarrollo en Inteligencia Artificial lidia.cs.uns.edu.ar Universidad Nacional del Sur Bahía Blanca www.uns.edu.ar

Tuning Topical Queries through Context Vocabulary Enrichment: A Corpus-Based Approach

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (18)

Similar to Tuning Topical Queries through Context Vocabulary Enrichment: A Corpus-Based Approach

Similar to Tuning Topical Queries through Context Vocabulary Enrichment: A Corpus-Based Approach (20)

Recently uploaded

Recently uploaded (20)

Tuning Topical Queries through Context Vocabulary Enrichment: A Corpus-Based Approach

Editor's Notes