SlideShare a Scribd company logo
THE WATER FILLING MODEL AND
THE CUBE TEST:
Multi-Dimensional Evaluation
for Professional Search
Jiyun Luo1 Christopher Wing1 Grace Hui Yang1
Marti A. Hearst2
1Department of Computer Science
Georgetown University
Washington, DC, USA
{jl1749, cpw26}@georgetown.edu
huiyang@cs.georgetown.edu
CIKM 2013
2School of Information
University of California, Berkeley
Berkeley, CA, USA
hearst@berkeley.edu
1
INTRODUCTION
¢  Complicated search has recently received much
attention
¢  Professional search activities are usually
complicated search tasks
—  Examples: Medical record search, Legal search,
Patent prior art search
¢  Evaluation metrics need to reflect this complexity
—  U-measure for whole session evaluation [Sakai et al.
sigir’13]
—  Time-based gain [Smucker and Clarke sigir’12]
—  α-nDCG for diversity and novelty [Clarke et al. sigir’08]
—  PRES for recall-orientated search tasks [Magdy and Jones,
sigir’10]
2
PROFESSIONAL SEARCH
¢  Rich information needs
—  Multiple aspects or subtopics
¢  Time-sensitive
—  It is not true that professional searchers, e.g., lawyers, are
evil and would like to read irrelevant documents since they
are paid by time and only care about recall
¢  Novelty
—  Once examined one relevant document, subsequent
relevant documents are perceived as less relevant
¢  Stopping criteria
—  Once a sub-information-need has been fulfilled, relevant
documents about it will contribute not much any more
¢  A mix of unranked and ranked retrieval
—  Boolean search and proximity search are still popular 3
Fenestration Segment Stent-
Graft and Fenestration Method
US 20090259290 A1
Patent Prior Art Search
ABSTRACT
A method includes deploying a fenestration
segment stent-graft into a main vessel such
that a fenestration section …
1. A fenestration segment stent-graft comprising : a proximal
section comprising a woven graft cloth; …
2. The fenestration segment stent-graft of claim 1 wherein said
proximal section comprises a proximal end and a distal end, …
3. The fenestration segment stent-graft of claim 2 wherein said
attachment means comprises stitching.
…
20. A fenestration segment stent-graft comprising : a
proximal section; a distal section; …
21. The fenestration segment stent-graft of claim 20 wherein said
fenestration section comprises : graft material comprising loose woven
fibers…
Claims
4
Looking for published literature that can be
used to `say no’ to a patent application. A
granted patent should be novel and non-
trivial.
Ø  Time constraint: less than 6 hours
Independent
DependentDependentDependent
5
¢  Information need with
multiple subtopics
¢  Goal: fulfill the info need
with relevant documents as
soon as possible
¢  A document can cover
different subtopics
¢  Stop finding more relevant
documents for a subtopic or
for the entire information
need
¢  A cube with multiple
segments
¢  Goal: fill up the cube with
water as soon as possible
¢  “document water” can flow
in different segments
¢  Reaching a cap in a segment
and no more water can go
there
Professional Search The Water-filling Model
We draw an analogy between Professional Search
and Filling Water into a Cube
How to judge a search system is good?
Ø  We assume the searcher wants the multi-subtopics of a task
to be fulfilled as quickly as possible & as much as possible
The Task Cube
Ø  The Cube with unit length
represents the entire
information need
Ø  Each cuboid in the Cube
represents a subtopic
Ø  The top of the Cube is the
cap that limits the maximum
amount of relevant
information needed
Ø  Stopping criterion
Ø  The bottom is segmented into different areas.
Ø  The area size indicates the importance of each
subtopic.
Ø  E.g. in prior art search, independent claims are
assigned more weights than dependent claims
6
An empty
task cube for
a search task
with 6 subtopics
The Water Filling Model
7
Ø  A new coming relevant
document will increase
waters in all its relevant
subtopics
Ø  The height increment is the
relevance gain from that
document with regard to that
subtopic
Ø  The total height of the water
in one cuboid represents the
accumulated relevance gain
for a subtopic
Ø  Total volume in the task
Cube is the total Gain
The Cube Test
Ø  Based on the water-filling model, we design
a new multi-dimensional evaluation metric
for professional search: the Cube Test (CT)
8
Ø  CT calculates the rates of how fast a search
system can fill up the task cube as much as
possible
Ø  It is a speed function
The Gain Function
𝐺𝑎𝑖𝑛( 𝑄, 𝑑𝑗)=∑𝑖↑▒𝑎𝑟𝑒𝑎𝑖 ×height𝑖, 𝑗 × KeepFilling𝑖
Ø  Document dj’s gain is calculated as the
volume of relevant “document water” that
matches to all subtopics in the task cube.
Ø  A more concrete equation:
where - Γ is a discounting factor for subtopic novelty, Γ = γnrel(c
i
,j-1)
where nrel(ci, j-1) is # of relevant documents for subtopic ci in
previously examined documents (d1 to dj-1).
- θi is the importance of the ith subtopic, ∑𝑖↑▒θ 𝑖   = 1.
- rel(d j,c i) is the water height, i.e., the document d j’s
relevance grade towards subtopic c i,
- Ι is the indicator function,
- MaxHeight is the cap for subtopic relevance (set to 1).
9
10
Ø  Total Gain for a list of documents
have been examined
The Total Gain Function
Ø  Note that it does not assume any
traversal order
Ø  It even does not assume ranked
retrieval
Ø  This allows us to support both ranked
and unranked retrieval or a mix of
them
The Cube Test - Recap
11
Ø  It is a speed function
Ø  The time function is the amount of time taken from the
beginning up to the tth document, it can be
Ø  actual reading time
Ø  a formulation similar to TBG [Smucker &
Clarke,sigir’12], taking into account document length
∑𝑗=1↑𝑡▒4.4+​ 𝑟↓𝑖 ×(0.018​ 𝑙↓𝑗 +7.8)   
Ø  or simply # of documents have been examined so far
EXPERIMENTS
Datasets
USPTO
•  It consists of three million US patent applications and
publications from 2001 to 2013 in XML with images removed.
•  We created 33 runs for 49 prior art finding tasks.
•  Office actions written by US Patent Examiners are parsed
and the ground truth are extracted automatically from them
(PublicPair)
CLEF-IP 2012
•  XML patent documents from the European Patent Office
(EPO) prior to 2002 and 400,000+ documents published by
the World Intellectual Property Organization (WIPO).
•  We evaluate the 31 official runs from 5 teams who
participated CLEF-IP 2012.
12
Discriminative Power
Ø  We compare the new metric with
a few well-known metrics:
•  Recall
•  I-rec (Sakai et al. EVIA’10]
•  nDCG
•  α-nDCG [Clarke et al. sigir’08]
•  PRES [Magdy and Jones, sigir’10]
•  MAP
•  TBG [Smucker & Clarke, sigir’12]
•  nERR-IA [Sakai & Song, sigir’11]
Ø  Evaluate the evaluation metrics
by their discrimination power
[Sakai, sigir’06]
Ø  We test a few variations of CT
Ø  In the CLEF-IP dataset, all CT
metrics show high
discriminative power.
13
Ø  For the USPTO dataset, Recall and
I-rec show the best discriminative
power. CT metrics show good
discriminative power.
Tradeoff between coverage and single relevance
Ø  CT is able to adjust its bias between
recall-oriented tasks and precision-
oriented tasks
Ø  We create two artificial runs
Ø  coverage run It arranges relevant
documents to each subtopic in a round-
robin fashion.
Ø  single relevance run It puts all relevant
documents ordered by rel(d, ci) for a
subtopic first, then for the next subtopic.
CT vs. γ for the coverage run
CT vs. γ for the single
relevance run
The novelty discount base γ ranges in
[0.1,0.9].
When γ is small, CT has a big novelty
discount, is biased towards coverage and
rewards more for runs that spread relevant
documents across different subtopics;
When γ is big, CT is biased towards precision
and rewards more for runs that produce highly
relevant documents early.
14
Conclusions
Ø  This paper presents a novel evaluation metric (the Cube
Test), based on a novel utility model (the water filling model)
Ø  It addresses several important dimensions in professional
search, and in complicated search in general
Ø  Covers different aspects or subtopics
Ø  Subtopics no need to be equally important
Ø  Allows for single document to cover several subtopics
Ø  Is time-sensitive
Ø  Handles the stopping criterion
Ø  Adding more relevant documents to certain subtopic
will not help to improve the overall gain
Ø  Expresses the tradeoff between time, quality of
documents, and diverse coverage of subtopics
15
Acknowledgments: Portions of this work were conducted to explore
new concepts under the umbrella of a larger project at the US Patent
and Trademark Office.
THANK YOU
Jiyun Luo1 Christopher Wing1 Hui Yang1 Marti A. Hearst2
1Department of Computer Science
Georgetown University
Washington, DC, USA
{jl1749, cpw26}@georgetown.edu
huiyang@cs.georgetown.edu
2School of Information
University of California, Berkeley
Berkeley, CA, USA
hearst@berkeley.edu
16
CT Variations
17

More Related Content

Viewers also liked

Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Grace Yang
 
Session Search by Direct Policy Learning
Session Search by Direct Policy LearningSession Search by Direct Policy Learning
Session Search by Direct Policy Learning
Grace Yang
 
Learning to Reinforce Search Effectiveness
Learning to Reinforce Search EffectivenessLearning to Reinforce Search Effectiveness
Learning to Reinforce Search Effectiveness
Grace Yang
 
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Project
 
CIKM Presentation at the AFAAS Review Workshop Addis-Ababa 15 oct 2014
CIKM Presentation at the AFAAS Review Workshop Addis-Ababa 15 oct  2014 CIKM Presentation at the AFAAS Review Workshop Addis-Ababa 15 oct  2014
CIKM Presentation at the AFAAS Review Workshop Addis-Ababa 15 oct 2014
AFAAS
 
Dagstuhl Search as Learning: summary breakout 1
Dagstuhl Search as Learning: summary breakout 1Dagstuhl Search as Learning: summary breakout 1
Dagstuhl Search as Learning: summary breakout 1
Claudia Hauff
 
Behavioral Intervention for ADHD, ASD, ODD and General Behavior Issues
Behavioral Intervention for ADHD, ASD, ODD and General Behavior IssuesBehavioral Intervention for ADHD, ASD, ODD and General Behavior Issues
Behavioral Intervention for ADHD, ASD, ODD and General Behavior Issues
Tuesday's Child
 

Viewers also liked (7)

Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
 
Session Search by Direct Policy Learning
Session Search by Direct Policy LearningSession Search by Direct Policy Learning
Session Search by Direct Policy Learning
 
Learning to Reinforce Search Effectiveness
Learning to Reinforce Search EffectivenessLearning to Reinforce Search Effectiveness
Learning to Reinforce Search Effectiveness
 
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
 
CIKM Presentation at the AFAAS Review Workshop Addis-Ababa 15 oct 2014
CIKM Presentation at the AFAAS Review Workshop Addis-Ababa 15 oct  2014 CIKM Presentation at the AFAAS Review Workshop Addis-Ababa 15 oct  2014
CIKM Presentation at the AFAAS Review Workshop Addis-Ababa 15 oct 2014
 
Dagstuhl Search as Learning: summary breakout 1
Dagstuhl Search as Learning: summary breakout 1Dagstuhl Search as Learning: summary breakout 1
Dagstuhl Search as Learning: summary breakout 1
 
Behavioral Intervention for ADHD, ASD, ODD and General Behavior Issues
Behavioral Intervention for ADHD, ASD, ODD and General Behavior IssuesBehavioral Intervention for ADHD, ASD, ODD and General Behavior Issues
Behavioral Intervention for ADHD, ASD, ODD and General Behavior Issues
 

Similar to The Water Filling Model and The Cube Test: Multi-Dimensional Evaluation for Professional Search (CIKM 2013)

ntcir14centre-overview
ntcir14centre-overviewntcir14centre-overview
ntcir14centre-overview
Tetsuya Sakai
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.ppt
Arumugam90
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
Anubhav Jain
 
H03302058066
H03302058066H03302058066
H03302058066
theijes
 
CCP_SEC2_ Cost Estimating
CCP_SEC2_ Cost EstimatingCCP_SEC2_ Cost Estimating
CCP_SEC2_ Cost Estimating
Hisham Haridy MBA, PMP®, RMP®, SP®
 
Evaporative Cooling Device for an Air Cooled final
Evaporative Cooling Device for an Air Cooled final Evaporative Cooling Device for an Air Cooled final
Evaporative Cooling Device for an Air Cooled final Danny Jones
 
Chapter 12(cpm pert)
Chapter 12(cpm pert)Chapter 12(cpm pert)
Chapter 12(cpm pert)Debanjan15
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Paolo Missier
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithmsFarhan Zaki
 
Bubble Breaker
Bubble BreakerBubble Breaker
Bubble Breaker
Ming-Yuan Lu
 
Energy simulation & analysis of two residential buildings
Energy simulation & analysis of two residential buildingsEnergy simulation & analysis of two residential buildings
Energy simulation & analysis of two residential buildings
chirag aggarwal
 
Student Research Award
Student Research AwardStudent Research Award
Student Research AwardJonathan Lepp
 
Download-manuals-surface water-waterlevel-38howtododatavalidationusinghydrol...
 Download-manuals-surface water-waterlevel-38howtododatavalidationusinghydrol... Download-manuals-surface water-waterlevel-38howtododatavalidationusinghydrol...
Download-manuals-surface water-waterlevel-38howtododatavalidationusinghydrol...hydrologyproject001
 
Bulk transfer scheduling and path reservations in research networks
Bulk transfer scheduling and path reservations in research networksBulk transfer scheduling and path reservations in research networks
Bulk transfer scheduling and path reservations in research networks
International Journal of Engineering Inventions www.ijeijournal.com
 
PMP Project Management Basics Tutorial For Beginners
PMP Project Management Basics Tutorial For BeginnersPMP Project Management Basics Tutorial For Beginners
PMP Project Management Basics Tutorial For Beginners
IIMSE Edu
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
StormForge .io
 
Project proposal Research.pptx
Project proposal Research.pptxProject proposal Research.pptx
Project proposal Research.pptx
DorjiWangchuk67
 
OPM101Chapter16_001 (2).ppt
OPM101Chapter16_001 (2).pptOPM101Chapter16_001 (2).ppt
OPM101Chapter16_001 (2).ppt
MarkNaguibElAbd
 
CE427-Chp6-Resource-Allocation-24Apr2019.pptx
CE427-Chp6-Resource-Allocation-24Apr2019.pptxCE427-Chp6-Resource-Allocation-24Apr2019.pptx
CE427-Chp6-Resource-Allocation-24Apr2019.pptx
ID Bilişim ve Ticaret Ltd. Şti.
 

Similar to The Water Filling Model and The Cube Test: Multi-Dimensional Evaluation for Professional Search (CIKM 2013) (20)

ntcir14centre-overview
ntcir14centre-overviewntcir14centre-overview
ntcir14centre-overview
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.ppt
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
H03302058066
H03302058066H03302058066
H03302058066
 
D-5436
D-5436D-5436
D-5436
 
CCP_SEC2_ Cost Estimating
CCP_SEC2_ Cost EstimatingCCP_SEC2_ Cost Estimating
CCP_SEC2_ Cost Estimating
 
Evaporative Cooling Device for an Air Cooled final
Evaporative Cooling Device for an Air Cooled final Evaporative Cooling Device for an Air Cooled final
Evaporative Cooling Device for an Air Cooled final
 
Chapter 12(cpm pert)
Chapter 12(cpm pert)Chapter 12(cpm pert)
Chapter 12(cpm pert)
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithms
 
Bubble Breaker
Bubble BreakerBubble Breaker
Bubble Breaker
 
Energy simulation & analysis of two residential buildings
Energy simulation & analysis of two residential buildingsEnergy simulation & analysis of two residential buildings
Energy simulation & analysis of two residential buildings
 
Student Research Award
Student Research AwardStudent Research Award
Student Research Award
 
Download-manuals-surface water-waterlevel-38howtododatavalidationusinghydrol...
 Download-manuals-surface water-waterlevel-38howtododatavalidationusinghydrol... Download-manuals-surface water-waterlevel-38howtododatavalidationusinghydrol...
Download-manuals-surface water-waterlevel-38howtododatavalidationusinghydrol...
 
Bulk transfer scheduling and path reservations in research networks
Bulk transfer scheduling and path reservations in research networksBulk transfer scheduling and path reservations in research networks
Bulk transfer scheduling and path reservations in research networks
 
PMP Project Management Basics Tutorial For Beginners
PMP Project Management Basics Tutorial For BeginnersPMP Project Management Basics Tutorial For Beginners
PMP Project Management Basics Tutorial For Beginners
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
 
Project proposal Research.pptx
Project proposal Research.pptxProject proposal Research.pptx
Project proposal Research.pptx
 
OPM101Chapter16_001 (2).ppt
OPM101Chapter16_001 (2).pptOPM101Chapter16_001 (2).ppt
OPM101Chapter16_001 (2).ppt
 
CE427-Chp6-Resource-Allocation-24Apr2019.pptx
CE427-Chp6-Resource-Allocation-24Apr2019.pptxCE427-Chp6-Resource-Allocation-24Apr2019.pptx
CE427-Chp6-Resource-Allocation-24Apr2019.pptx
 

Recently uploaded

Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 

Recently uploaded (20)

Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 

The Water Filling Model and The Cube Test: Multi-Dimensional Evaluation for Professional Search (CIKM 2013)

  • 1. THE WATER FILLING MODEL AND THE CUBE TEST: Multi-Dimensional Evaluation for Professional Search Jiyun Luo1 Christopher Wing1 Grace Hui Yang1 Marti A. Hearst2 1Department of Computer Science Georgetown University Washington, DC, USA {jl1749, cpw26}@georgetown.edu huiyang@cs.georgetown.edu CIKM 2013 2School of Information University of California, Berkeley Berkeley, CA, USA hearst@berkeley.edu 1
  • 2. INTRODUCTION ¢  Complicated search has recently received much attention ¢  Professional search activities are usually complicated search tasks —  Examples: Medical record search, Legal search, Patent prior art search ¢  Evaluation metrics need to reflect this complexity —  U-measure for whole session evaluation [Sakai et al. sigir’13] —  Time-based gain [Smucker and Clarke sigir’12] —  α-nDCG for diversity and novelty [Clarke et al. sigir’08] —  PRES for recall-orientated search tasks [Magdy and Jones, sigir’10] 2
  • 3. PROFESSIONAL SEARCH ¢  Rich information needs —  Multiple aspects or subtopics ¢  Time-sensitive —  It is not true that professional searchers, e.g., lawyers, are evil and would like to read irrelevant documents since they are paid by time and only care about recall ¢  Novelty —  Once examined one relevant document, subsequent relevant documents are perceived as less relevant ¢  Stopping criteria —  Once a sub-information-need has been fulfilled, relevant documents about it will contribute not much any more ¢  A mix of unranked and ranked retrieval —  Boolean search and proximity search are still popular 3
  • 4. Fenestration Segment Stent- Graft and Fenestration Method US 20090259290 A1 Patent Prior Art Search ABSTRACT A method includes deploying a fenestration segment stent-graft into a main vessel such that a fenestration section … 1. A fenestration segment stent-graft comprising : a proximal section comprising a woven graft cloth; … 2. The fenestration segment stent-graft of claim 1 wherein said proximal section comprises a proximal end and a distal end, … 3. The fenestration segment stent-graft of claim 2 wherein said attachment means comprises stitching. … 20. A fenestration segment stent-graft comprising : a proximal section; a distal section; … 21. The fenestration segment stent-graft of claim 20 wherein said fenestration section comprises : graft material comprising loose woven fibers… Claims 4 Looking for published literature that can be used to `say no’ to a patent application. A granted patent should be novel and non- trivial. Ø  Time constraint: less than 6 hours Independent DependentDependentDependent
  • 5. 5 ¢  Information need with multiple subtopics ¢  Goal: fulfill the info need with relevant documents as soon as possible ¢  A document can cover different subtopics ¢  Stop finding more relevant documents for a subtopic or for the entire information need ¢  A cube with multiple segments ¢  Goal: fill up the cube with water as soon as possible ¢  “document water” can flow in different segments ¢  Reaching a cap in a segment and no more water can go there Professional Search The Water-filling Model We draw an analogy between Professional Search and Filling Water into a Cube How to judge a search system is good? Ø  We assume the searcher wants the multi-subtopics of a task to be fulfilled as quickly as possible & as much as possible
  • 6. The Task Cube Ø  The Cube with unit length represents the entire information need Ø  Each cuboid in the Cube represents a subtopic Ø  The top of the Cube is the cap that limits the maximum amount of relevant information needed Ø  Stopping criterion Ø  The bottom is segmented into different areas. Ø  The area size indicates the importance of each subtopic. Ø  E.g. in prior art search, independent claims are assigned more weights than dependent claims 6 An empty task cube for a search task with 6 subtopics
  • 7. The Water Filling Model 7 Ø  A new coming relevant document will increase waters in all its relevant subtopics Ø  The height increment is the relevance gain from that document with regard to that subtopic Ø  The total height of the water in one cuboid represents the accumulated relevance gain for a subtopic Ø  Total volume in the task Cube is the total Gain
  • 8. The Cube Test Ø  Based on the water-filling model, we design a new multi-dimensional evaluation metric for professional search: the Cube Test (CT) 8 Ø  CT calculates the rates of how fast a search system can fill up the task cube as much as possible Ø  It is a speed function
  • 9. The Gain Function 𝐺𝑎𝑖𝑛( 𝑄, 𝑑𝑗)=∑𝑖↑▒𝑎𝑟𝑒𝑎𝑖 ×height𝑖, 𝑗 × KeepFilling𝑖 Ø  Document dj’s gain is calculated as the volume of relevant “document water” that matches to all subtopics in the task cube. Ø  A more concrete equation: where - Γ is a discounting factor for subtopic novelty, Γ = γnrel(c i ,j-1) where nrel(ci, j-1) is # of relevant documents for subtopic ci in previously examined documents (d1 to dj-1). - θi is the importance of the ith subtopic, ∑𝑖↑▒θ 𝑖   = 1. - rel(d j,c i) is the water height, i.e., the document d j’s relevance grade towards subtopic c i, - Ι is the indicator function, - MaxHeight is the cap for subtopic relevance (set to 1). 9
  • 10. 10 Ø  Total Gain for a list of documents have been examined The Total Gain Function Ø  Note that it does not assume any traversal order Ø  It even does not assume ranked retrieval Ø  This allows us to support both ranked and unranked retrieval or a mix of them
  • 11. The Cube Test - Recap 11 Ø  It is a speed function Ø  The time function is the amount of time taken from the beginning up to the tth document, it can be Ø  actual reading time Ø  a formulation similar to TBG [Smucker & Clarke,sigir’12], taking into account document length ∑𝑗=1↑𝑡▒4.4+​ 𝑟↓𝑖 ×(0.018​ 𝑙↓𝑗 +7.8)    Ø  or simply # of documents have been examined so far
  • 12. EXPERIMENTS Datasets USPTO •  It consists of three million US patent applications and publications from 2001 to 2013 in XML with images removed. •  We created 33 runs for 49 prior art finding tasks. •  Office actions written by US Patent Examiners are parsed and the ground truth are extracted automatically from them (PublicPair) CLEF-IP 2012 •  XML patent documents from the European Patent Office (EPO) prior to 2002 and 400,000+ documents published by the World Intellectual Property Organization (WIPO). •  We evaluate the 31 official runs from 5 teams who participated CLEF-IP 2012. 12
  • 13. Discriminative Power Ø  We compare the new metric with a few well-known metrics: •  Recall •  I-rec (Sakai et al. EVIA’10] •  nDCG •  α-nDCG [Clarke et al. sigir’08] •  PRES [Magdy and Jones, sigir’10] •  MAP •  TBG [Smucker & Clarke, sigir’12] •  nERR-IA [Sakai & Song, sigir’11] Ø  Evaluate the evaluation metrics by their discrimination power [Sakai, sigir’06] Ø  We test a few variations of CT Ø  In the CLEF-IP dataset, all CT metrics show high discriminative power. 13 Ø  For the USPTO dataset, Recall and I-rec show the best discriminative power. CT metrics show good discriminative power.
  • 14. Tradeoff between coverage and single relevance Ø  CT is able to adjust its bias between recall-oriented tasks and precision- oriented tasks Ø  We create two artificial runs Ø  coverage run It arranges relevant documents to each subtopic in a round- robin fashion. Ø  single relevance run It puts all relevant documents ordered by rel(d, ci) for a subtopic first, then for the next subtopic. CT vs. γ for the coverage run CT vs. γ for the single relevance run The novelty discount base γ ranges in [0.1,0.9]. When γ is small, CT has a big novelty discount, is biased towards coverage and rewards more for runs that spread relevant documents across different subtopics; When γ is big, CT is biased towards precision and rewards more for runs that produce highly relevant documents early. 14
  • 15. Conclusions Ø  This paper presents a novel evaluation metric (the Cube Test), based on a novel utility model (the water filling model) Ø  It addresses several important dimensions in professional search, and in complicated search in general Ø  Covers different aspects or subtopics Ø  Subtopics no need to be equally important Ø  Allows for single document to cover several subtopics Ø  Is time-sensitive Ø  Handles the stopping criterion Ø  Adding more relevant documents to certain subtopic will not help to improve the overall gain Ø  Expresses the tradeoff between time, quality of documents, and diverse coverage of subtopics 15 Acknowledgments: Portions of this work were conducted to explore new concepts under the umbrella of a larger project at the US Patent and Trademark Office.
  • 16. THANK YOU Jiyun Luo1 Christopher Wing1 Hui Yang1 Marti A. Hearst2 1Department of Computer Science Georgetown University Washington, DC, USA {jl1749, cpw26}@georgetown.edu huiyang@cs.georgetown.edu 2School of Information University of California, Berkeley Berkeley, CA, USA hearst@berkeley.edu 16