Transcript of "Log Analysis to Understand Medical Professionals' Image Searching Behaviour"
Log Analysis to Understand Medical Professionals Image Searching BehaviourTheodora TsikrikaHenning MüllerCharles E. Kahn
Overview• Medical image retrieval• Motivation of our work• Methods• Log file analysis• Search strategies• Frequent information needs • Use as topics for a retrieval benchmark• Conclusions 2
Medical image retrieval • Medical professionals frequently and increasingly search for visual information (images, videos) • Particularly radiologists often search for images • Internet search increasingly replaces search in reference books and discussions with colleagues • Images are important for differential diagnosis, finding explications for unclear visual patterns • Different types of image search systems • Text-based search for images 3
Motivation• Knowing search tasks, goals and formulations of user groups for information retrieval is important • To build new IR systems or benchmark existing ones• Several surveys have been performed• Log file analyses were done as well • MedLine log files, not really for images • HONmedia search, less focused as not radiologists, but rather general public, health professionals• Image search on the Internet for radiologists has increased strongly 4
Log file analysis • Session level, query level, term level • Search logs have received much attention to learn more on user behavior • Bad example: release of AOL log, privacy!! • Amount of information differs, IP addresses, time stamps • Session level is interesting as much is learned on behavior, query modifications, even satisfaction • Terms added, removed, changed? • Query and term level often focus on 5
Methods• ARRS Goldminer made a log file available • 25’000 consecutive searches of medical professionals • Search system is very popular with radiologists • Allows search terms, selection of gender, age and modality• Search term normalization • All lower case, removing special characters, quotes • Manual work: “xray”, “x-ray”, “x ray” all equals “xray”• Removal of identical consecutive queries• No time stamps available, no IP address 6
Results of the analysis • 23’033 queries after preprocessing, 14’413 of these are unique queries (63%) • Query length 2.24 words, 2.46 for unique queries • Similar to web search, one term less than MedLine • Imaging modalities: • MRI (586), CT (425), ultrasound (199), xray (139), PET (34), PET/CT (13), angiography (13), echo (11), radiography (10), tomography (6), fMRI (3), PET/MRI (1) • This despite the possibility to filter for modalities 7
Query modification • 5713 consecutive query pairs sharing at least one term, assumed to be single session 9
Use of terms for topics inImageCLEF • ImageCLEF, image retrieval benchmark • Using images and text as queries, 17 groups participated in 2012• Taking most frequent searches, at least two terms• Radiologist ranked these search terms by usefulness in radiology• Most useful terms were checked to find whether documents in PubMedCentral fulfill the need• 30 most useful, most frequent, available 10
Conclusions • Analysis of log files can help understand user behavior • Help build better systems based on user models and analyze current approaches, also shortcomings • Time stamps and user identification are important for query session analysis • We used implicit knowledge for this • People do not know all details of systems • Search for modalities in text and through filters • Depending on results, users change terms 11
Questions? • More information can be found at • http://www.khresmoi.eu/ • http://medgift.hevs.ch/ • http://publications.hevs.ch/ • Contact: • Henning.firstname.lastname@example.org 12
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.