SlideShare a Scribd company logo
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Extended Searching Sessions
and Evaluating Success
Dr Max L.Wilson
Mixed Reality Lab
University of Nottingham, UK
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Studying Extended Search
Success In Observable Natural
Sessions
SESSIONS
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Extended Searching Sessions and
Evaluating Sensemaking Success
About Me
Study 1:The Real
Nature of Sessions
Study 2: Evaluating
Sensemaking Success
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
About me
MEng & Phd in Southampton
Taught in Swansea for 3 years
Moved to Nottingham April 2012
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
About Me
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
UIST
2008
JCDL
2008
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
My PhD
Bates, M. J. (1979a). Idea tactics. Journal of
the American Society for Information
Science, 30(5):280–289.
Bates, M. J. (1979b). Information search
tactics. Journal of the American Society for
Information Science, 30(4):205–214.
Belkin, N. J., Marchetti, P. G., and Cool, C.
(1993). Braque: design of an interface to support
user interaction in information retrieval.
Information Processing and Management, 29(3):
325–344.
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
My PhD
Wilson, M. L., schraefel, m. c., and White, R. W. (2009). Evaluating advanced
search interfaces using established information-seeking models. Journal of the
American Society for Information Science and Technology, 60(7):1407–1422.
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Search User Interface Design
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
MyTeam
Horia Maior Matthew Pike Jon Hurlock Paul BrindleyZenah Alkubaisy
Chaoyu (Kelvin)Ye
(Study 1)
Mathew Wilson
(Study 2)
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Extended Searching Sessions and
Evaluating Sensemaking Success
About Me
Study 1:The Real
Nature of Sessions
Study 2: Evaluating
Sensemaking Success
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
People Searching the Web
Elsweiler, D.,Wilson M. L. and Kirkegaard-Lunn, B. (2011) Understanding Casual-leisure Information Behaviour. In Spink,A.
and Heinstrom, J. (Eds) New Directions in Information Behaviour. Emerald Group Publishing Limited, pp 211-241.
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
The Search Communities
Ingwersen, P., Jarvelin, K., 2005.The turn:
integration of information seeking
and retrieval in context. Springer, Berlin, Germany.
The IR Community
•Focused on Accuracy
•Are these results relevant?
•How many are relevant?
•Did we get all the relevant ones?
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
The Search Communities
The IS Community
•Focused on Success
•Did they find the right result?
•How long did they take
•How many interactions?
Ingwersen, P., Jarvelin, K., 2005.The turn:
integration of information seeking
and retrieval in context. Springer, Berlin, Germany.
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
The Search Communities
The IB Community
•Focused on Quality
•Did they do a good job?
•How did the UI affect the task?
•Was the higher level motivating
task achieved more successfully?
Ingwersen, P., Jarvelin, K., 2005.The turn:
integration of information seeking
and retrieval in context. Springer, Berlin, Germany.
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
The Search Communities
“Relatively” well known
“Naively estimated”
- Study 1
“Simplistically” measured
- Study 2
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
WorkTasks
• Work tasks - typically considered work-led information-
intensive activities the lead to searching
• Can be out-of-work - like planning holidays, or buying a car
• We’ve begun looking at motivating ‘tasks’ outside of work
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Casual Leisure WorkTasks
behaviours documented so far.
4.1 Need-less browsing
Much like the desire to pass time at the television, we saw
many examples (some shown in Table 3) of people passing
time typically associated with the ‘browsing’ keyword.
1) ... I’m not even *doing* anything useful... just browsing
eBay aimlessly...
2) to do list today: browse the Internet until fasting break
time..
3) ... just got done eating dinner and my family is watch-
ing the football. Rather browse on the laptop
4) I’m at the dolphin mall. Just browsing.
Table 3: Example tweets where the browsing activ-
ity is need-less.
From the collected tweets it is clear that often the inform-
ation-need in these situations are not only fuzzy, but typi-
cally absent. The aim appears to be focused on the activity,
where the measure of success would be in how much they
D
d
a
5
h
f
o
S
i
b
a
d
f
t
s
W
t
Wilson, M. L. and Elsweiler, D. (2010) Casual-leisure Searching: the Exploratory Search scenarios
that break our current models. In: 4th HCIR Workshop ,Aug 22 2010. pp 28-31.
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
People Searching the Web
Elsweiler, D.,Wilson M. L. and Kirkegaard-Lunn, B. (2011) Understanding Casual-leisure Information Behaviour. In Spink,A.
and Heinstrom, J. (Eds) New Directions in Information Behaviour. Emerald Group Publishing Limited, pp 211-241.
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
• Traditionally examined by analysing logs for stats
• In the 90s, suggested they are broken by ~25mins
- More recently by ~5mins
• BUT evidence shows web use typically interleaves tasks
- AND tabs make this all much harder
• Become a big focus as Dagstuhls/workshops
Sessions
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
SearchTrails
• Aimed at finding common
end locations for queries
• An interesting step towards
sessions though
• most involved some trail
features (not query+click)
White, Ryen W., and Steven M. Drucker. "Investigating behavioral variability in
web search." in Proc WWW 2007 .ACM
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Top Sessions
as Seen by Bing
Bailey et al, User task understanding: a web search engine perspective, NII Shonan, 8 Oct 2012
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Top Sessions
as Seen by Bing
Bailey et al, User task understanding: a web search engine perspective, NII Shonan, 8 Oct 2012
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Top Sessions
as Seen by Bing
Bailey et al, User task understanding: a web search engine perspective, NII Shonan, 8 Oct 2012
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1:
Investigating Extended Sessions
What on earth
is happening here?
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1: Interview Method
Send & Preprocess History
Interview Recording, Cards, Card Sorts, Marked history file, log data
A history artefact - approx 300 items
How
would
you
define a
session?
Mark out history into
sessions, starting recently
+
Create ‘cards’ of varying
types of ‘sessions’
Open Card Sort
+
Close Card Sort
10mins 20-30mins 30-50mins
15-20Cards
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1: Data
• Rich discussion of ~20 Sessions per participant
• Currently: 7 participants and ~120 sessions
- richly described and compared
• Aiming for: 12 participants and 200+ sessions at first
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1: Questions for Sessions
1)Where was this done (e.g. work vs home vs mobile)
2)With who (collaborative?)
3)For who (shared task?)
4)Devices involved (whether devices affect things)
5)Length of the Session (how do they define long?)
6)Successful or not (for future measurement insights)
At some point: tried to learn these for each session
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1:A Card
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1:A Card
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1: Card Sorting
• We aimed first to let them define the dimensions
- this lets us see how they define things
- how do they self-categorise different sessions
• We then had some targeted card sorts
- For who, duration, difficulty, importance, location
- whats short vs long?
- whats important vs not?
- how do people divide work vs home etc
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1: Example Card Sorts
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1: Preliminary Findings
• avg 21 cards per person, inc ~8 sessions of 5+mins
- ~4 work & ~4 leisure
• 18.6% of those extended sessions involved task switches
• avg length: 17.5mins avg #queries: 3.55
• short: third said <30s, third said <1m third said <30m
• long: third said >1hour, third said >5mins
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1: Preliminary Findings
• longest sessions: entertainment, work prep, news, shopping
• longest leisure: 22-76mins youtube, 28mins news
• most important: work, money, urgent shopping
• lest important: leisure, entertainment, free time
• most difficult: technical work prep
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1: Preliminary Findings
• Huge divide over where sessions start or stop
- many people considered a session to span a large break
- paused and left in tabs
• One person divided a single topical episode by phases
- and phases were sessions
- e.g. broadening/confused stage vs successful focus stage
• One person divided single topical episode by major sources
- moved from web searching to video searching on same topic
What is a session?
Implications for where/when to measure success
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1:What is a session?
Single topic - changing purpose
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1:What is a session?
Single topic - pausing sessions
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1:What is a session?
Low-query extended sessions
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1: Other observations
• Seeing an informal relationship between who tasks are for
- and skewed importance
- including for another person, or for a group
- and slow sequential interactions (as talk to others)
• Seeing a strong low-query correlation with entertainment
- seeing serious-leisure more similar to work tasks
• Hard tasks have high query loads,
- and are related to rare or new areas
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 1: Summary
• We’re beginning to get some real insight into real sessions
• Already identifying examples where time-splitting isnt sufficient
- but intention changing is common
• We’re seeing possible common patterns of overlapping sessions
• We havent finished!
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2: Evaluating Sensemaking
“Simplistically” measured
- Study 2
Wilson, M. J. and Wilson, M. L. (2012) A Comparison ofTechniques for Measuring Sensemaking
and Learning within Participant-Generated Summaries. In: JASIST (accepted).
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2:“Simplistically” measured
• If learning is closed: then a quiz
- “closed” determines WHAT should be learned
- can measure recall, but also recognising if cued by Q.
• If learning is open:
a) sub-topic count (integer) & topic quality (judged likert)
b) simple count of facts (integer) and statements (integer)
• These do not measure how “good” the learning was
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2: Measuring “Depth” of Learning
• A theory from Education
• As learning improves
you progress up the diagram
• You begin to ‘understand’
- then critically ‘analyze’
- then ‘evaluate’ information
etc.
Image from: http://www.nwlink.com/~donclark/hrd/bloom.html
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2: Developed 3 Scales
• 12 participants performed 3 learning tasks
- mix of high and low prior knowledge
• 1) Write summary of knowledge, 2) Learn, 3) Write summary
• 36 pairs of pre/post summaries
- 18 high prior knowledge
- 18 low prior knowledge
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2: Developed 3 Scales
• Inductive GroundingTheory analysis
• 3 rounds of 6 high and 6 low pairs analysed by 2 researchers
• Validated by an external judge
• Until high Fleiss Kappa scores
i.e.‘substantial agreement’
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2: Measure 1: D-Qual
scale ranging from irrelevant or useless facts (0 points) to facts that showed a level of
technical understanding (3 points). The emphasis of usefulness in this measure meant that it
was closer to the “understanding” level of Bloom’s revised taxonomy, rather than simply
“remembering”. It was important to differentiate between the two levels as many poor
summaries, as determined by the authors during the coding session, simply listed many
redundantly obvious facts (“A labrador is a dog”) rather than describing them in sentences
and summaries. For D-Qual, the judges achieved a Fleiss kappa of 0.64.
Rating Description
0 Facts are irrelevant to the subject; Facts hold no useful information or advice.
1 Facts are generalised to the overall subject matter; Facts hold little useful information or
advice.
2 Facts fulfil the required information need and are useful.
3 A level of technical detail is given via at least one key term associated with the technology
of the subject; Statistics are given.
Table 1: Quality of Facts (D-Qual).
Many of the better summaries interpreted facts into more intelligent statements. To
identify this, D-Intrp (Table 2) measured summaries in how they synthesised facts and
statements to draw conclusions and deductions (Bloom’s “analysing”) using a 3-point scale.
Measure understanding rather than remembering
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2: Measure 2: D-Intrp
Rating Description
0 Facts contained within one statement with no association.
1 Association of two useful or detailed facts: ‘A -> B’
2 Association of multiple useful or detailed facts: ‘A+B->C’; ‘A->B->C’; ‘A->B∴C’
Table 2: Interpretation of data into statements (D-Intrp).
D-Crit reflected Bloom’s concept of “evaluating” by identifying statements that
compared facts, or used facts to raise questions about other statements. The measurement for
D-Crit was either true (1 point) or false (0 points), as shown in Table 3. A Fleiss kappa of
0.74 was achieved.
Measure analysing capabilities
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2: Measure 3: D-Crit
Measure evaluating capabilities
Table 2: Interpretation of data into statements (D-Intrp).
D-Crit reflected Bloom’s concept of “evaluating” by identifying statements that
compared facts, or used facts to raise questions about other statements. The measurement for
D-Crit was either true (1 point) or false (0 points), as shown in Table 3. A Fleiss kappa of
0.74 was achieved.
Rating Description
0 Facts are listed with no further thought or analysis.
1 Both advantages and disadvantages listed; Comparisons drawn between items;
Participant deduced his or her own questions.
Table 3: Use of critique (D-Crit).
We did not produce a scale for level three of Anderson’s revised version of Bloom’s
taxonomy, “applying”, since the act of writing a summary would not involve the participant
to carry out a procedure that has been learned. This level of learning was thus not identifiable
in our corpus of summaries. Similarly, the highest level, “creating”, also goes beyond writingFriday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2: Evaluating these measures
Compare against Counting &Topic measures
measure depth (‘T-Depth’), each topic was measured on a 4-point scale ranging from not
covered (0 points) to detailed focused coverage (3 points) and averaged.
As the process of learning is primarily internal it is difficult to measure it objectively.
For this reason our measures of learning focused on the difference between pre- and post-task
knowledge held by the participant.
Code Measurement Scale
D-Qual Recall of facts 0 – 3 points
D-Intrp Interpretation of data into statements 0 – 2 points
D-Crit Critique 0 – 1 point
F-Fact Number of facts Count
F-State Number of statements Count
F-Ratio Ratio of facts per statement Average
T-Count Number of topics covered (breadth of knowledge) Count
T-Depth Level of topic focus (depth of knowledge) 0 – 3 points, averaged
Table 4: Outline of coding scheme used for analysis.
5 Results
Before beginning, the data from two participants were removed from the analysis. A
first-pass sanity check over the collected summaries revealed that they had misunderstood the
tasks set. One chose to describe their own feelings and history relating to the task topic, rather
than trying to answer the task. Another described what they intended to search for in their
• Can you differentiate pre- & post- task summaries?
• Can you differentiate high & low prior knowledge?
• How long do summaries need to be?
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2:Analysing summaries
Pre-task example
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2:Analysing summaries
Post-task example
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2: Resultsknowledge, especially for pre-task summaries, which can possibly be explained that the
participants who wrote shorter summaries based on high prior knowledge are more likely to
concentrate on a single topic.
All Pre-task Post-task
D-Qual U(68) = 537.5, p = 0.32 U(34) = 125, p = 0.28 U(34) = 148, p = 0.46
D-Intrp U(68) = 642, p = 0.21 U(34) = 145, p = 0.47 U(34) = 174, p = 0.16
D-Crit U(68) = 570, p = 0.47 U(34) = 140, p = 0.47 U(34) = 144.5, p = 0.49
F-Fact t(66) = -0.4, p = 0.35 t(32) = -0.75, p = 0.23 t(32) = -0.25, p = 0.4
F-State t(66) = -0.21, p = 0.42 t(32) = -0.4, p = 0.35 t(32) = -0.17, p = 0.43
F-Ratio t(66) = 0.2, p = 0.42 t(32) = 0.31, p = 0.38 t(32) = -0.04, p = 0.48
T-Count t(66) = -0.35, p = 0.36 t(32) = 0.43, p = 0.34 t(32) = -1.01, p = 0.16
T-Depth U(68) = 721, p = 0.04 * U(34) = 194.5, p = 0.04 * U(34) = 168, p = 0.21
Table 12: Comparing high and low prior knowledge in shorter summaries. * Indicates significant results.
All Pre-task Post-task
D-Qual U(68) = 390, p = 0.01 * U(34) = 89.5, p = 0.03 * U(34) = 113.5, p = 0.18
D-Intrp U(68) = 497.5, p = 0.16 U(34) = 158.5, p = 0.29 U(34) = 95, p = 0.06
D-Crit U(68) = 693.5, p = 0.08 U(34) = 189, p = 0.05 * U(34) = 154, p = 0.32
F-Fact t(66) = 1.62, p = 0.06 t(32) = 0.64, p = 0.26 t(32) = 1, p = 0.16
F-State t(66) = 1, p = 0.16 t(32) = 0.29, p = 0.39 t(32) = 0.79, p = 0.22
F-Ratio t(66) = 0.86, p = 0.2 t(32) = 0.31, p = 0.38 t(32) = 0.21, p = 0.42
T-Count t(66) = 3.44, p = 0.0005 * t(32) = 1.92, p = 0.03 * t(32) = 2.82, p = 0.004 *
T-Depth U(68) = 572, p = 0.48 U(34) = 163, p = 0.25 U(34) = 142, p = 0.48
Table 13: Comparing high and low prior knowledge in longer summaries. * Indicates significant results.
Pretty obvious - as you can see
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2: Results
• 1) Most measures could identify learning (between pre-post)
- more robust with longer summaries
the summaries and the prior knowledge held by the participant should be taken in to
consideration. Table 14 provides an overview of the strengths and weaknesses of each
measure and recommendations are made below. While serving as a guide readers should refer
back to the full text in our results section for more detail before using them in a study.
Identifies Learning Identifies Prior Knowledge Ignores
Length
High Low Short Long Pre Post Short Long Pre Post
D-Qual       
D-Intrp    
D-Crit    
F-Fact      
F-State    
F-Ratio    
T-Count      
T-Depth     
Table 14: Overview of measure suitability.
If participants have written shorter summaries (here averaged to around 90 words) then
learning is only really noticeable if those participants began with low prior knowledge, where
measures such as the quality of facts (D-Qual), simple fact and statement counting (F-Fact, F-
State) and topic coverage (T-Count) can be used to determine an increase of knowledge. If
short summaries are written based on high prior knowledge then only simple fact and
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2: Results
• 2) Only some were good at identifying prior knowledge
- these required long pre-task summaries to be written
the summaries and the prior knowledge held by the participant should be taken in to
consideration. Table 14 provides an overview of the strengths and weaknesses of each
measure and recommendations are made below. While serving as a guide readers should refer
back to the full text in our results section for more detail before using them in a study.
Identifies Learning Identifies Prior Knowledge Ignores
Length
High Low Short Long Pre Post Short Long Pre Post
D-Qual       
D-Intrp    
D-Crit    
F-Fact      
F-State    
F-Ratio    
T-Count      
T-Depth     
Table 14: Overview of measure suitability.
If participants have written shorter summaries (here averaged to around 90 words) then
learning is only really noticeable if those participants began with low prior knowledge, where
measures such as the quality of facts (D-Qual), simple fact and statement counting (F-Fact, F-
State) and topic coverage (T-Count) can be used to determine an increase of knowledge. If
short summaries are written based on high prior knowledge then only simple fact and
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2: Results
• 3) Our measures were the most robust to length of summary
- others require pushing participants beyond 200 words
the summaries and the prior knowledge held by the participant should be taken in to
consideration. Table 14 provides an overview of the strengths and weaknesses of each
measure and recommendations are made below. While serving as a guide readers should refer
back to the full text in our results section for more detail before using them in a study.
Identifies Learning Identifies Prior Knowledge Ignores
Length
High Low Short Long Pre Post Short Long Pre Post
D-Qual       
D-Intrp    
D-Crit    
F-Fact      
F-State    
F-Ratio    
T-Count      
T-Depth     
Table 14: Overview of measure suitability.
If participants have written shorter summaries (here averaged to around 90 words) then
learning is only really noticeable if those participants began with low prior knowledge, where
measures such as the quality of facts (D-Qual), simple fact and statement counting (F-Fact, F-
State) and topic coverage (T-Count) can be used to determine an increase of knowledge. If
short summaries are written based on high prior knowledge then only simple fact and
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Study 2: Conclusions
• We proposed a new measure based on depth of learning
- demonstrating higher levels of thinking
• This was more robust to size of written summary,
- good at long and short, while measuring learning
- able to determine if someone has existing high knowledge
• All measures did surprisingly well, for measuring learning
• Ours was most robust for determining prior knowledge level
• Future work: behaviour between good vs bad learners
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Talk Summary
• Search communities are trying to move beyond simple tasks
- more than result quality, and time to target
• Current focusing on understanding sessions
- which has primarily been splitting logs by time gaps
• Our work
1) moving beyond assumptions about sessions
2) introducing new methods to evaluate sensemaking
Friday, 10 May 13
Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/
Talk Summary
• There’s a long way to go before search engines know what
we’re doing beyond a query (and immediate refinements)
- there’s a long way before we do
• Also - we still need to measure:
- success in decision making (like online shopping)
- success in entertainment sessions
Friday, 10 May 13

More Related Content

Similar to Understanding & Evaluating Search Sessions

Academic Book Review Format. Book Review Exam
Academic Book Review Format. Book Review ExamAcademic Book Review Format. Book Review Exam
Academic Book Review Format. Book Review Exam
Julie Gonzalez
 
Individual Analysis and Testing
Individual Analysis and TestingIndividual Analysis and Testing
Individual Analysis and Testing
Swilley Library
 
Nature and Functions of Research part 3
Nature and Functions of Research part 3Nature and Functions of Research part 3
Nature and Functions of Research part 3
MJezza Ledesma
 
EBP : what do we expect of 1st years in HE
EBP : what do we expect of 1st years in HE EBP : what do we expect of 1st years in HE
EBP : what do we expect of 1st years in HE
Academic and Research Libraries Group Yorkshire & Humberside
 
Eblip7 keynote pdf
Eblip7 keynote pdfEblip7 keynote pdf
Eblip7 keynote pdf
Denise Koufogiannakis
 
The twin purposes of guided inquiry final
The twin purposes of guided inquiry finalThe twin purposes of guided inquiry final
The twin purposes of guided inquiry final
Leonne FitzGerald
 
Introduction to information literacy part 1
Introduction to information literacy part 1Introduction to information literacy part 1
Introduction to information literacy part 1
mhayes2006
 
Pick College Essay Writing Services With Care - Research Master Essays
Pick College Essay Writing Services With Care - Research Master EssaysPick College Essay Writing Services With Care - Research Master Essays
Pick College Essay Writing Services With Care - Research Master Essays
Carla Bennington
 
Reverse instruction inquiry 2
Reverse instruction inquiry 2Reverse instruction inquiry 2
Reverse instruction inquiry 2
George Phillip
 
DeBari and Julin slides
DeBari and Julin slidesDeBari and Julin slides
DeBari and Julin slides
SERC at Carleton College
 
Stat 1040, Recitation packet 11. A 1999 study claimed that.docx
Stat 1040, Recitation packet 11. A 1999 study claimed that.docxStat 1040, Recitation packet 11. A 1999 study claimed that.docx
Stat 1040, Recitation packet 11. A 1999 study claimed that.docx
dessiechisomjj4
 
Reverse instruction inquiry
Reverse instruction inquiryReverse instruction inquiry
Reverse instruction inquiry
George Phillip
 
Reverse instruction inquiry
Reverse instruction inquiryReverse instruction inquiry
Reverse instruction inquiry
George Phillip
 
Ziang Wang Q&A: NSHSS 2015 Earth Day Award Recipient- National Society of Hig...
Ziang Wang Q&A: NSHSS 2015 Earth Day Award Recipient- National Society of Hig...Ziang Wang Q&A: NSHSS 2015 Earth Day Award Recipient- National Society of Hig...
Ziang Wang Q&A: NSHSS 2015 Earth Day Award Recipient- National Society of Hig...
The National Society of High School Scholars (NSHSS)
 
Michael Pocock: Citizen Science Project Design
Michael Pocock: Citizen Science Project DesignMichael Pocock: Citizen Science Project Design
Michael Pocock: Citizen Science Project Design
Alice Sheppard
 
NS_Jul_Aug_2016_Final
NS_Jul_Aug_2016_FinalNS_Jul_Aug_2016_Final
NS_Jul_Aug_2016_Final
Fernando Bejarano
 
Experiments: The Good, the Bad, and the Beautiful
Experiments: The Good, the Bad, and the BeautifulExperiments: The Good, the Bad, and the Beautiful
Experiments: The Good, the Bad, and the Beautiful
TechWell
 
The Revelation of the Father - Week 10
The Revelation of the Father - Week 10The Revelation of the Father - Week 10
The Revelation of the Father - Week 10
PDEI
 
Is ‘Open Science’ a solution or a threat?
Is ‘Open Science’ a solution or a threat?Is ‘Open Science’ a solution or a threat?
Is ‘Open Science’ a solution or a threat?
Danny Kingsley
 
Future of Data Sharing
Future of Data SharingFuture of Data Sharing
Future of Data Sharing
CTSciNet .org
 

Similar to Understanding & Evaluating Search Sessions (20)

Academic Book Review Format. Book Review Exam
Academic Book Review Format. Book Review ExamAcademic Book Review Format. Book Review Exam
Academic Book Review Format. Book Review Exam
 
Individual Analysis and Testing
Individual Analysis and TestingIndividual Analysis and Testing
Individual Analysis and Testing
 
Nature and Functions of Research part 3
Nature and Functions of Research part 3Nature and Functions of Research part 3
Nature and Functions of Research part 3
 
EBP : what do we expect of 1st years in HE
EBP : what do we expect of 1st years in HE EBP : what do we expect of 1st years in HE
EBP : what do we expect of 1st years in HE
 
Eblip7 keynote pdf
Eblip7 keynote pdfEblip7 keynote pdf
Eblip7 keynote pdf
 
The twin purposes of guided inquiry final
The twin purposes of guided inquiry finalThe twin purposes of guided inquiry final
The twin purposes of guided inquiry final
 
Introduction to information literacy part 1
Introduction to information literacy part 1Introduction to information literacy part 1
Introduction to information literacy part 1
 
Pick College Essay Writing Services With Care - Research Master Essays
Pick College Essay Writing Services With Care - Research Master EssaysPick College Essay Writing Services With Care - Research Master Essays
Pick College Essay Writing Services With Care - Research Master Essays
 
Reverse instruction inquiry 2
Reverse instruction inquiry 2Reverse instruction inquiry 2
Reverse instruction inquiry 2
 
DeBari and Julin slides
DeBari and Julin slidesDeBari and Julin slides
DeBari and Julin slides
 
Stat 1040, Recitation packet 11. A 1999 study claimed that.docx
Stat 1040, Recitation packet 11. A 1999 study claimed that.docxStat 1040, Recitation packet 11. A 1999 study claimed that.docx
Stat 1040, Recitation packet 11. A 1999 study claimed that.docx
 
Reverse instruction inquiry
Reverse instruction inquiryReverse instruction inquiry
Reverse instruction inquiry
 
Reverse instruction inquiry
Reverse instruction inquiryReverse instruction inquiry
Reverse instruction inquiry
 
Ziang Wang Q&A: NSHSS 2015 Earth Day Award Recipient- National Society of Hig...
Ziang Wang Q&A: NSHSS 2015 Earth Day Award Recipient- National Society of Hig...Ziang Wang Q&A: NSHSS 2015 Earth Day Award Recipient- National Society of Hig...
Ziang Wang Q&A: NSHSS 2015 Earth Day Award Recipient- National Society of Hig...
 
Michael Pocock: Citizen Science Project Design
Michael Pocock: Citizen Science Project DesignMichael Pocock: Citizen Science Project Design
Michael Pocock: Citizen Science Project Design
 
NS_Jul_Aug_2016_Final
NS_Jul_Aug_2016_FinalNS_Jul_Aug_2016_Final
NS_Jul_Aug_2016_Final
 
Experiments: The Good, the Bad, and the Beautiful
Experiments: The Good, the Bad, and the BeautifulExperiments: The Good, the Bad, and the Beautiful
Experiments: The Good, the Bad, and the Beautiful
 
The Revelation of the Father - Week 10
The Revelation of the Father - Week 10The Revelation of the Father - Week 10
The Revelation of the Father - Week 10
 
Is ‘Open Science’ a solution or a threat?
Is ‘Open Science’ a solution or a threat?Is ‘Open Science’ a solution or a threat?
Is ‘Open Science’ a solution or a threat?
 
Future of Data Sharing
Future of Data SharingFuture of Data Sharing
Future of Data Sharing
 

More from Max L. Wilson

Brain Data as Cognitive Personal Informatics - UCL 2022
Brain Data as Cognitive Personal Informatics - UCL 2022Brain Data as Cognitive Personal Informatics - UCL 2022
Brain Data as Cognitive Personal Informatics - UCL 2022
Max L. Wilson
 
Brain Data as Cognitive Personal Informatics - Bell Labs 2022
Brain Data as Cognitive Personal Informatics - Bell Labs 2022Brain Data as Cognitive Personal Informatics - Bell Labs 2022
Brain Data as Cognitive Personal Informatics - Bell Labs 2022
Max L. Wilson
 
Physiological indicators of task demand, fatigue, and cognition during Work T...
Physiological indicators of task demand, fatigue, and cognition during Work T...Physiological indicators of task demand, fatigue, and cognition during Work T...
Physiological indicators of task demand, fatigue, and cognition during Work T...
Max L. Wilson
 
Brain-based HCI - What brain data can tell us about HCI - St Andrews, 2019
Brain-based HCI - What brain data can tell us about HCI - St Andrews, 2019Brain-based HCI - What brain data can tell us about HCI - St Andrews, 2019
Brain-based HCI - What brain data can tell us about HCI - St Andrews, 2019
Max L. Wilson
 
Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Lei...
Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Lei...Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Lei...
Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Lei...
Max L. Wilson
 
Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Uni...
Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Uni...Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Uni...
Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Uni...
Max L. Wilson
 
Measuring & Reflecting on Mental Workload - Birmingham Uni, May 2017
Measuring & Reflecting on Mental Workload - Birmingham Uni, May 2017Measuring & Reflecting on Mental Workload - Birmingham Uni, May 2017
Measuring & Reflecting on Mental Workload - Birmingham Uni, May 2017
Max L. Wilson
 
CHIIR2017 - Tetris Model of Resolving Information Needs
CHIIR2017 - Tetris Model of Resolving Information NeedsCHIIR2017 - Tetris Model of Resolving Information Needs
CHIIR2017 - Tetris Model of Resolving Information Needs
Max L. Wilson
 
The HCI Perspective on IR (DIR2016 Keynote)
The HCI Perspective on IR (DIR2016 Keynote)The HCI Perspective on IR (DIR2016 Keynote)
The HCI Perspective on IR (DIR2016 Keynote)
Max L. Wilson
 
Why People Favourite Tweets (and a bit about usefulness and style) - Content ...
Why People Favourite Tweets (and a bit about usefulness and style) - Content ...Why People Favourite Tweets (and a bit about usefulness and style) - Content ...
Why People Favourite Tweets (and a bit about usefulness and style) - Content ...
Max L. Wilson
 
Fun information Interaction #Seaching4fun
Fun information Interaction #Seaching4funFun information Interaction #Seaching4fun
Fun information Interaction #Seaching4fun
Max L. Wilson
 
RepliCHI - 8 Challenges in Replicating a Study
RepliCHI - 8 Challenges in Replicating a StudyRepliCHI - 8 Challenges in Replicating a Study
RepliCHI - 8 Challenges in Replicating a Study
Max L. Wilson
 
Search User Interface Design
Search User Interface DesignSearch User Interface Design
Search User Interface Design
Max L. Wilson
 
ASIST2010 - The Revisit Rack - Group Web Search Thumbnails
ASIST2010 - The Revisit Rack - Group Web Search ThumbnailsASIST2010 - The Revisit Rack - Group Web Search Thumbnails
ASIST2010 - The Revisit Rack - Group Web Search Thumbnails
Max L. Wilson
 
Investigating Alternative Forms of Search
Investigating Alternative Forms of SearchInvestigating Alternative Forms of Search
Investigating Alternative Forms of Search
Max L. Wilson
 
Hcir2010 - Casual-Leisure Search
Hcir2010 - Casual-Leisure SearchHcir2010 - Casual-Leisure Search
Hcir2010 - Casual-Leisure Search
Max L. Wilson
 

More from Max L. Wilson (16)

Brain Data as Cognitive Personal Informatics - UCL 2022
Brain Data as Cognitive Personal Informatics - UCL 2022Brain Data as Cognitive Personal Informatics - UCL 2022
Brain Data as Cognitive Personal Informatics - UCL 2022
 
Brain Data as Cognitive Personal Informatics - Bell Labs 2022
Brain Data as Cognitive Personal Informatics - Bell Labs 2022Brain Data as Cognitive Personal Informatics - Bell Labs 2022
Brain Data as Cognitive Personal Informatics - Bell Labs 2022
 
Physiological indicators of task demand, fatigue, and cognition during Work T...
Physiological indicators of task demand, fatigue, and cognition during Work T...Physiological indicators of task demand, fatigue, and cognition during Work T...
Physiological indicators of task demand, fatigue, and cognition during Work T...
 
Brain-based HCI - What brain data can tell us about HCI - St Andrews, 2019
Brain-based HCI - What brain data can tell us about HCI - St Andrews, 2019Brain-based HCI - What brain data can tell us about HCI - St Andrews, 2019
Brain-based HCI - What brain data can tell us about HCI - St Andrews, 2019
 
Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Lei...
Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Lei...Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Lei...
Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Lei...
 
Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Uni...
Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Uni...Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Uni...
Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Uni...
 
Measuring & Reflecting on Mental Workload - Birmingham Uni, May 2017
Measuring & Reflecting on Mental Workload - Birmingham Uni, May 2017Measuring & Reflecting on Mental Workload - Birmingham Uni, May 2017
Measuring & Reflecting on Mental Workload - Birmingham Uni, May 2017
 
CHIIR2017 - Tetris Model of Resolving Information Needs
CHIIR2017 - Tetris Model of Resolving Information NeedsCHIIR2017 - Tetris Model of Resolving Information Needs
CHIIR2017 - Tetris Model of Resolving Information Needs
 
The HCI Perspective on IR (DIR2016 Keynote)
The HCI Perspective on IR (DIR2016 Keynote)The HCI Perspective on IR (DIR2016 Keynote)
The HCI Perspective on IR (DIR2016 Keynote)
 
Why People Favourite Tweets (and a bit about usefulness and style) - Content ...
Why People Favourite Tweets (and a bit about usefulness and style) - Content ...Why People Favourite Tweets (and a bit about usefulness and style) - Content ...
Why People Favourite Tweets (and a bit about usefulness and style) - Content ...
 
Fun information Interaction #Seaching4fun
Fun information Interaction #Seaching4funFun information Interaction #Seaching4fun
Fun information Interaction #Seaching4fun
 
RepliCHI - 8 Challenges in Replicating a Study
RepliCHI - 8 Challenges in Replicating a StudyRepliCHI - 8 Challenges in Replicating a Study
RepliCHI - 8 Challenges in Replicating a Study
 
Search User Interface Design
Search User Interface DesignSearch User Interface Design
Search User Interface Design
 
ASIST2010 - The Revisit Rack - Group Web Search Thumbnails
ASIST2010 - The Revisit Rack - Group Web Search ThumbnailsASIST2010 - The Revisit Rack - Group Web Search Thumbnails
ASIST2010 - The Revisit Rack - Group Web Search Thumbnails
 
Investigating Alternative Forms of Search
Investigating Alternative Forms of SearchInvestigating Alternative Forms of Search
Investigating Alternative Forms of Search
 
Hcir2010 - Casual-Leisure Search
Hcir2010 - Casual-Leisure SearchHcir2010 - Casual-Leisure Search
Hcir2010 - Casual-Leisure Search
 

Recently uploaded

20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 

Recently uploaded (20)

20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 

Understanding & Evaluating Search Sessions

  • 1. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Extended Searching Sessions and Evaluating Success Dr Max L.Wilson Mixed Reality Lab University of Nottingham, UK Friday, 10 May 13
  • 2. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Studying Extended Search Success In Observable Natural Sessions SESSIONS Friday, 10 May 13
  • 3. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Extended Searching Sessions and Evaluating Sensemaking Success About Me Study 1:The Real Nature of Sessions Study 2: Evaluating Sensemaking Success Friday, 10 May 13
  • 4. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ About me MEng & Phd in Southampton Taught in Swansea for 3 years Moved to Nottingham April 2012 Friday, 10 May 13
  • 5. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ About Me Friday, 10 May 13
  • 6. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Friday, 10 May 13
  • 7. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ UIST 2008 JCDL 2008 Friday, 10 May 13
  • 8. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ My PhD Bates, M. J. (1979a). Idea tactics. Journal of the American Society for Information Science, 30(5):280–289. Bates, M. J. (1979b). Information search tactics. Journal of the American Society for Information Science, 30(4):205–214. Belkin, N. J., Marchetti, P. G., and Cool, C. (1993). Braque: design of an interface to support user interaction in information retrieval. Information Processing and Management, 29(3): 325–344. Friday, 10 May 13
  • 9. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ My PhD Wilson, M. L., schraefel, m. c., and White, R. W. (2009). Evaluating advanced search interfaces using established information-seeking models. Journal of the American Society for Information Science and Technology, 60(7):1407–1422. Friday, 10 May 13
  • 10. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Search User Interface Design Friday, 10 May 13
  • 11. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ MyTeam Horia Maior Matthew Pike Jon Hurlock Paul BrindleyZenah Alkubaisy Chaoyu (Kelvin)Ye (Study 1) Mathew Wilson (Study 2) Friday, 10 May 13
  • 12. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Extended Searching Sessions and Evaluating Sensemaking Success About Me Study 1:The Real Nature of Sessions Study 2: Evaluating Sensemaking Success Friday, 10 May 13
  • 13. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ People Searching the Web Elsweiler, D.,Wilson M. L. and Kirkegaard-Lunn, B. (2011) Understanding Casual-leisure Information Behaviour. In Spink,A. and Heinstrom, J. (Eds) New Directions in Information Behaviour. Emerald Group Publishing Limited, pp 211-241. Friday, 10 May 13
  • 14. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ The Search Communities Ingwersen, P., Jarvelin, K., 2005.The turn: integration of information seeking and retrieval in context. Springer, Berlin, Germany. The IR Community •Focused on Accuracy •Are these results relevant? •How many are relevant? •Did we get all the relevant ones? Friday, 10 May 13
  • 15. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ The Search Communities The IS Community •Focused on Success •Did they find the right result? •How long did they take •How many interactions? Ingwersen, P., Jarvelin, K., 2005.The turn: integration of information seeking and retrieval in context. Springer, Berlin, Germany. Friday, 10 May 13
  • 16. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ The Search Communities The IB Community •Focused on Quality •Did they do a good job? •How did the UI affect the task? •Was the higher level motivating task achieved more successfully? Ingwersen, P., Jarvelin, K., 2005.The turn: integration of information seeking and retrieval in context. Springer, Berlin, Germany. Friday, 10 May 13
  • 17. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ The Search Communities “Relatively” well known “Naively estimated” - Study 1 “Simplistically” measured - Study 2 Friday, 10 May 13
  • 18. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ WorkTasks • Work tasks - typically considered work-led information- intensive activities the lead to searching • Can be out-of-work - like planning holidays, or buying a car • We’ve begun looking at motivating ‘tasks’ outside of work Friday, 10 May 13
  • 19. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Casual Leisure WorkTasks behaviours documented so far. 4.1 Need-less browsing Much like the desire to pass time at the television, we saw many examples (some shown in Table 3) of people passing time typically associated with the ‘browsing’ keyword. 1) ... I’m not even *doing* anything useful... just browsing eBay aimlessly... 2) to do list today: browse the Internet until fasting break time.. 3) ... just got done eating dinner and my family is watch- ing the football. Rather browse on the laptop 4) I’m at the dolphin mall. Just browsing. Table 3: Example tweets where the browsing activ- ity is need-less. From the collected tweets it is clear that often the inform- ation-need in these situations are not only fuzzy, but typi- cally absent. The aim appears to be focused on the activity, where the measure of success would be in how much they D d a 5 h f o S i b a d f t s W t Wilson, M. L. and Elsweiler, D. (2010) Casual-leisure Searching: the Exploratory Search scenarios that break our current models. In: 4th HCIR Workshop ,Aug 22 2010. pp 28-31. Friday, 10 May 13
  • 20. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ People Searching the Web Elsweiler, D.,Wilson M. L. and Kirkegaard-Lunn, B. (2011) Understanding Casual-leisure Information Behaviour. In Spink,A. and Heinstrom, J. (Eds) New Directions in Information Behaviour. Emerald Group Publishing Limited, pp 211-241. Friday, 10 May 13
  • 21. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ • Traditionally examined by analysing logs for stats • In the 90s, suggested they are broken by ~25mins - More recently by ~5mins • BUT evidence shows web use typically interleaves tasks - AND tabs make this all much harder • Become a big focus as Dagstuhls/workshops Sessions Friday, 10 May 13
  • 22. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ SearchTrails • Aimed at finding common end locations for queries • An interesting step towards sessions though • most involved some trail features (not query+click) White, Ryen W., and Steven M. Drucker. "Investigating behavioral variability in web search." in Proc WWW 2007 .ACM Friday, 10 May 13
  • 23. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Top Sessions as Seen by Bing Bailey et al, User task understanding: a web search engine perspective, NII Shonan, 8 Oct 2012 Friday, 10 May 13
  • 24. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Top Sessions as Seen by Bing Bailey et al, User task understanding: a web search engine perspective, NII Shonan, 8 Oct 2012 Friday, 10 May 13
  • 25. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Top Sessions as Seen by Bing Bailey et al, User task understanding: a web search engine perspective, NII Shonan, 8 Oct 2012 Friday, 10 May 13
  • 26. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1: Investigating Extended Sessions What on earth is happening here? Friday, 10 May 13
  • 27. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1: Interview Method Send & Preprocess History Interview Recording, Cards, Card Sorts, Marked history file, log data A history artefact - approx 300 items How would you define a session? Mark out history into sessions, starting recently + Create ‘cards’ of varying types of ‘sessions’ Open Card Sort + Close Card Sort 10mins 20-30mins 30-50mins 15-20Cards Friday, 10 May 13
  • 28. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1: Data • Rich discussion of ~20 Sessions per participant • Currently: 7 participants and ~120 sessions - richly described and compared • Aiming for: 12 participants and 200+ sessions at first Friday, 10 May 13
  • 29. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1: Questions for Sessions 1)Where was this done (e.g. work vs home vs mobile) 2)With who (collaborative?) 3)For who (shared task?) 4)Devices involved (whether devices affect things) 5)Length of the Session (how do they define long?) 6)Successful or not (for future measurement insights) At some point: tried to learn these for each session Friday, 10 May 13
  • 30. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1:A Card Friday, 10 May 13
  • 31. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1:A Card Friday, 10 May 13
  • 32. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1: Card Sorting • We aimed first to let them define the dimensions - this lets us see how they define things - how do they self-categorise different sessions • We then had some targeted card sorts - For who, duration, difficulty, importance, location - whats short vs long? - whats important vs not? - how do people divide work vs home etc Friday, 10 May 13
  • 33. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1: Example Card Sorts Friday, 10 May 13
  • 34. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Friday, 10 May 13
  • 35. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1: Preliminary Findings • avg 21 cards per person, inc ~8 sessions of 5+mins - ~4 work & ~4 leisure • 18.6% of those extended sessions involved task switches • avg length: 17.5mins avg #queries: 3.55 • short: third said <30s, third said <1m third said <30m • long: third said >1hour, third said >5mins Friday, 10 May 13
  • 36. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1: Preliminary Findings • longest sessions: entertainment, work prep, news, shopping • longest leisure: 22-76mins youtube, 28mins news • most important: work, money, urgent shopping • lest important: leisure, entertainment, free time • most difficult: technical work prep Friday, 10 May 13
  • 37. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1: Preliminary Findings • Huge divide over where sessions start or stop - many people considered a session to span a large break - paused and left in tabs • One person divided a single topical episode by phases - and phases were sessions - e.g. broadening/confused stage vs successful focus stage • One person divided single topical episode by major sources - moved from web searching to video searching on same topic What is a session? Implications for where/when to measure success Friday, 10 May 13
  • 38. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1:What is a session? Single topic - changing purpose Friday, 10 May 13
  • 39. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1:What is a session? Single topic - pausing sessions Friday, 10 May 13
  • 40. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1:What is a session? Low-query extended sessions Friday, 10 May 13
  • 41. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1: Other observations • Seeing an informal relationship between who tasks are for - and skewed importance - including for another person, or for a group - and slow sequential interactions (as talk to others) • Seeing a strong low-query correlation with entertainment - seeing serious-leisure more similar to work tasks • Hard tasks have high query loads, - and are related to rare or new areas Friday, 10 May 13
  • 42. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 1: Summary • We’re beginning to get some real insight into real sessions • Already identifying examples where time-splitting isnt sufficient - but intention changing is common • We’re seeing possible common patterns of overlapping sessions • We havent finished! Friday, 10 May 13
  • 43. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2: Evaluating Sensemaking “Simplistically” measured - Study 2 Wilson, M. J. and Wilson, M. L. (2012) A Comparison ofTechniques for Measuring Sensemaking and Learning within Participant-Generated Summaries. In: JASIST (accepted). Friday, 10 May 13
  • 44. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2:“Simplistically” measured • If learning is closed: then a quiz - “closed” determines WHAT should be learned - can measure recall, but also recognising if cued by Q. • If learning is open: a) sub-topic count (integer) & topic quality (judged likert) b) simple count of facts (integer) and statements (integer) • These do not measure how “good” the learning was Friday, 10 May 13
  • 45. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2: Measuring “Depth” of Learning • A theory from Education • As learning improves you progress up the diagram • You begin to ‘understand’ - then critically ‘analyze’ - then ‘evaluate’ information etc. Image from: http://www.nwlink.com/~donclark/hrd/bloom.html Friday, 10 May 13
  • 46. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2: Developed 3 Scales • 12 participants performed 3 learning tasks - mix of high and low prior knowledge • 1) Write summary of knowledge, 2) Learn, 3) Write summary • 36 pairs of pre/post summaries - 18 high prior knowledge - 18 low prior knowledge Friday, 10 May 13
  • 47. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2: Developed 3 Scales • Inductive GroundingTheory analysis • 3 rounds of 6 high and 6 low pairs analysed by 2 researchers • Validated by an external judge • Until high Fleiss Kappa scores i.e.‘substantial agreement’ Friday, 10 May 13
  • 48. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2: Measure 1: D-Qual scale ranging from irrelevant or useless facts (0 points) to facts that showed a level of technical understanding (3 points). The emphasis of usefulness in this measure meant that it was closer to the “understanding” level of Bloom’s revised taxonomy, rather than simply “remembering”. It was important to differentiate between the two levels as many poor summaries, as determined by the authors during the coding session, simply listed many redundantly obvious facts (“A labrador is a dog”) rather than describing them in sentences and summaries. For D-Qual, the judges achieved a Fleiss kappa of 0.64. Rating Description 0 Facts are irrelevant to the subject; Facts hold no useful information or advice. 1 Facts are generalised to the overall subject matter; Facts hold little useful information or advice. 2 Facts fulfil the required information need and are useful. 3 A level of technical detail is given via at least one key term associated with the technology of the subject; Statistics are given. Table 1: Quality of Facts (D-Qual). Many of the better summaries interpreted facts into more intelligent statements. To identify this, D-Intrp (Table 2) measured summaries in how they synthesised facts and statements to draw conclusions and deductions (Bloom’s “analysing”) using a 3-point scale. Measure understanding rather than remembering Friday, 10 May 13
  • 49. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2: Measure 2: D-Intrp Rating Description 0 Facts contained within one statement with no association. 1 Association of two useful or detailed facts: ‘A -> B’ 2 Association of multiple useful or detailed facts: ‘A+B->C’; ‘A->B->C’; ‘A->B∴C’ Table 2: Interpretation of data into statements (D-Intrp). D-Crit reflected Bloom’s concept of “evaluating” by identifying statements that compared facts, or used facts to raise questions about other statements. The measurement for D-Crit was either true (1 point) or false (0 points), as shown in Table 3. A Fleiss kappa of 0.74 was achieved. Measure analysing capabilities Friday, 10 May 13
  • 50. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2: Measure 3: D-Crit Measure evaluating capabilities Table 2: Interpretation of data into statements (D-Intrp). D-Crit reflected Bloom’s concept of “evaluating” by identifying statements that compared facts, or used facts to raise questions about other statements. The measurement for D-Crit was either true (1 point) or false (0 points), as shown in Table 3. A Fleiss kappa of 0.74 was achieved. Rating Description 0 Facts are listed with no further thought or analysis. 1 Both advantages and disadvantages listed; Comparisons drawn between items; Participant deduced his or her own questions. Table 3: Use of critique (D-Crit). We did not produce a scale for level three of Anderson’s revised version of Bloom’s taxonomy, “applying”, since the act of writing a summary would not involve the participant to carry out a procedure that has been learned. This level of learning was thus not identifiable in our corpus of summaries. Similarly, the highest level, “creating”, also goes beyond writingFriday, 10 May 13
  • 51. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2: Evaluating these measures Compare against Counting &Topic measures measure depth (‘T-Depth’), each topic was measured on a 4-point scale ranging from not covered (0 points) to detailed focused coverage (3 points) and averaged. As the process of learning is primarily internal it is difficult to measure it objectively. For this reason our measures of learning focused on the difference between pre- and post-task knowledge held by the participant. Code Measurement Scale D-Qual Recall of facts 0 – 3 points D-Intrp Interpretation of data into statements 0 – 2 points D-Crit Critique 0 – 1 point F-Fact Number of facts Count F-State Number of statements Count F-Ratio Ratio of facts per statement Average T-Count Number of topics covered (breadth of knowledge) Count T-Depth Level of topic focus (depth of knowledge) 0 – 3 points, averaged Table 4: Outline of coding scheme used for analysis. 5 Results Before beginning, the data from two participants were removed from the analysis. A first-pass sanity check over the collected summaries revealed that they had misunderstood the tasks set. One chose to describe their own feelings and history relating to the task topic, rather than trying to answer the task. Another described what they intended to search for in their • Can you differentiate pre- & post- task summaries? • Can you differentiate high & low prior knowledge? • How long do summaries need to be? Friday, 10 May 13
  • 52. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2:Analysing summaries Pre-task example Friday, 10 May 13
  • 53. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2:Analysing summaries Post-task example Friday, 10 May 13
  • 54. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2: Resultsknowledge, especially for pre-task summaries, which can possibly be explained that the participants who wrote shorter summaries based on high prior knowledge are more likely to concentrate on a single topic. All Pre-task Post-task D-Qual U(68) = 537.5, p = 0.32 U(34) = 125, p = 0.28 U(34) = 148, p = 0.46 D-Intrp U(68) = 642, p = 0.21 U(34) = 145, p = 0.47 U(34) = 174, p = 0.16 D-Crit U(68) = 570, p = 0.47 U(34) = 140, p = 0.47 U(34) = 144.5, p = 0.49 F-Fact t(66) = -0.4, p = 0.35 t(32) = -0.75, p = 0.23 t(32) = -0.25, p = 0.4 F-State t(66) = -0.21, p = 0.42 t(32) = -0.4, p = 0.35 t(32) = -0.17, p = 0.43 F-Ratio t(66) = 0.2, p = 0.42 t(32) = 0.31, p = 0.38 t(32) = -0.04, p = 0.48 T-Count t(66) = -0.35, p = 0.36 t(32) = 0.43, p = 0.34 t(32) = -1.01, p = 0.16 T-Depth U(68) = 721, p = 0.04 * U(34) = 194.5, p = 0.04 * U(34) = 168, p = 0.21 Table 12: Comparing high and low prior knowledge in shorter summaries. * Indicates significant results. All Pre-task Post-task D-Qual U(68) = 390, p = 0.01 * U(34) = 89.5, p = 0.03 * U(34) = 113.5, p = 0.18 D-Intrp U(68) = 497.5, p = 0.16 U(34) = 158.5, p = 0.29 U(34) = 95, p = 0.06 D-Crit U(68) = 693.5, p = 0.08 U(34) = 189, p = 0.05 * U(34) = 154, p = 0.32 F-Fact t(66) = 1.62, p = 0.06 t(32) = 0.64, p = 0.26 t(32) = 1, p = 0.16 F-State t(66) = 1, p = 0.16 t(32) = 0.29, p = 0.39 t(32) = 0.79, p = 0.22 F-Ratio t(66) = 0.86, p = 0.2 t(32) = 0.31, p = 0.38 t(32) = 0.21, p = 0.42 T-Count t(66) = 3.44, p = 0.0005 * t(32) = 1.92, p = 0.03 * t(32) = 2.82, p = 0.004 * T-Depth U(68) = 572, p = 0.48 U(34) = 163, p = 0.25 U(34) = 142, p = 0.48 Table 13: Comparing high and low prior knowledge in longer summaries. * Indicates significant results. Pretty obvious - as you can see Friday, 10 May 13
  • 55. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2: Results • 1) Most measures could identify learning (between pre-post) - more robust with longer summaries the summaries and the prior knowledge held by the participant should be taken in to consideration. Table 14 provides an overview of the strengths and weaknesses of each measure and recommendations are made below. While serving as a guide readers should refer back to the full text in our results section for more detail before using them in a study. Identifies Learning Identifies Prior Knowledge Ignores Length High Low Short Long Pre Post Short Long Pre Post D-Qual        D-Intrp     D-Crit     F-Fact       F-State     F-Ratio     T-Count       T-Depth      Table 14: Overview of measure suitability. If participants have written shorter summaries (here averaged to around 90 words) then learning is only really noticeable if those participants began with low prior knowledge, where measures such as the quality of facts (D-Qual), simple fact and statement counting (F-Fact, F- State) and topic coverage (T-Count) can be used to determine an increase of knowledge. If short summaries are written based on high prior knowledge then only simple fact and Friday, 10 May 13
  • 56. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2: Results • 2) Only some were good at identifying prior knowledge - these required long pre-task summaries to be written the summaries and the prior knowledge held by the participant should be taken in to consideration. Table 14 provides an overview of the strengths and weaknesses of each measure and recommendations are made below. While serving as a guide readers should refer back to the full text in our results section for more detail before using them in a study. Identifies Learning Identifies Prior Knowledge Ignores Length High Low Short Long Pre Post Short Long Pre Post D-Qual        D-Intrp     D-Crit     F-Fact       F-State     F-Ratio     T-Count       T-Depth      Table 14: Overview of measure suitability. If participants have written shorter summaries (here averaged to around 90 words) then learning is only really noticeable if those participants began with low prior knowledge, where measures such as the quality of facts (D-Qual), simple fact and statement counting (F-Fact, F- State) and topic coverage (T-Count) can be used to determine an increase of knowledge. If short summaries are written based on high prior knowledge then only simple fact and Friday, 10 May 13
  • 57. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2: Results • 3) Our measures were the most robust to length of summary - others require pushing participants beyond 200 words the summaries and the prior knowledge held by the participant should be taken in to consideration. Table 14 provides an overview of the strengths and weaknesses of each measure and recommendations are made below. While serving as a guide readers should refer back to the full text in our results section for more detail before using them in a study. Identifies Learning Identifies Prior Knowledge Ignores Length High Low Short Long Pre Post Short Long Pre Post D-Qual        D-Intrp     D-Crit     F-Fact       F-State     F-Ratio     T-Count       T-Depth      Table 14: Overview of measure suitability. If participants have written shorter summaries (here averaged to around 90 words) then learning is only really noticeable if those participants began with low prior knowledge, where measures such as the quality of facts (D-Qual), simple fact and statement counting (F-Fact, F- State) and topic coverage (T-Count) can be used to determine an increase of knowledge. If short summaries are written based on high prior knowledge then only simple fact and Friday, 10 May 13
  • 58. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Study 2: Conclusions • We proposed a new measure based on depth of learning - demonstrating higher levels of thinking • This was more robust to size of written summary, - good at long and short, while measuring learning - able to determine if someone has existing high knowledge • All measures did surprisingly well, for measuring learning • Ours was most robust for determining prior knowledge level • Future work: behaviour between good vs bad learners Friday, 10 May 13
  • 59. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Talk Summary • Search communities are trying to move beyond simple tasks - more than result quality, and time to target • Current focusing on understanding sessions - which has primarily been splitting logs by time gaps • Our work 1) moving beyond assumptions about sessions 2) introducing new methods to evaluate sensemaking Friday, 10 May 13
  • 60. Dr Max L.Wilson http://cs.nott.ac.uk/~mlw/ Talk Summary • There’s a long way to go before search engines know what we’re doing beyond a query (and immediate refinements) - there’s a long way before we do • Also - we still need to measure: - success in decision making (like online shopping) - success in entertainment sessions Friday, 10 May 13