Slides from TPDL2019.
Abstract - The websites of Cultural Heritage institutions attract the full range of users, from professionals to novices, for a variety of tasks. However, many institutions are reporting high bounce rates and therefore seeking ways to better engage users. The analysis of transaction logs can provide insights into users’ searching and navigational behaviours and support engagement strategies. In this paper we present the results from a transaction log analysis of web server logs representing user-system interactions from the seven websites of National Museums Liverpool (NML). In addition, we undertake an exploratory cluster analysis of users to identify potential user groups that emerge from the data. We compare this with previous studies of NML website users.
Link to paper:
https://link.springer.com/chapter/10.1007/978-3-030-30760-8_7
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Analysis of transaction logs from National Museums Liverpool
1. Analysis of Transaction
Logs from National
Museums Liverpool
David Walsh, Mark M Hall, Paul Clough,
Frank Hopfgartner and Jonathan Foster
Edge Hill University & Martin-Luther-University Halle-Wittenberg
& Sheffield University & Peak Indicators
TPDL 2019, Oslo
2. Context
What to Search for ????
General Public
49.7%Non-Professionals
/ Hobbyists
26.9%
Students
6.5%
Others
5.1%
Teachers
4.9%
Academics
4.9%
Museum Staff
2%
Walsh, D., Hall, M., Clough, P., Foster, J.: The ghost in the museum website: investigating the general public’s interactions with museum websites. In:
International Conference on Theory and Practice of Digital Libraries, Springer (2017)
3. This study
Aim - Investigate how representative of the full website audience the survey
respondents are.
RO1 - Conduct Transaction log analysis.
RO2 - Cluster the web log data.
RO3 - Identify if the clusters represent any of the known user groups?
?=
Walsh, D., Hall, M., Clough, P., Foster, J.: The ghost in the museum website: investigating the general public’s interactions with museum websites. In:
International Conference on Theory and Practice of Digital Libraries, Springer (2017)
5. Experiment Overview
● Server logs extracted for Jan-Mar 2017
● User-based (multi-session) clustering of log data
● Transaction log analysis conducted
● Georeferenced the logs
● Log files cleaned
6. TLA Findings
● 586,868 page requests.
○ 321,174 unique users (multi-sessions groups)
Day Mon Tue Wed Thur Fri Sat Sun Total
Requests 81k 100k 101k 97k 85k 55k 66k 586,868
% 13.88 17.09 17.26 16.58 14.59 9.37 11.23 100
7. TLA Findings
Museum Request
ISM 97,686
Other Pages 92,433
WML 86,516
Walker 73,194
Maritime 68,912
Events 58,273
MOL 54,697
Ladylever 24,607
Shop 21,740
Sudley 8,810
Total 586,868
8. TLA Findings
Requests by page type.
Country Requests Queries
UK 307,347 181,903
US 120,584 43,062
Denmark 32,012 9,098
Germany 16,878 7,846
Australia 15,805 4,012UK City/Town Requests Queries
Manchester 40,992 20,696
Liverpool 37,804 23,014
London 32,012 9,098
Runcorn 16,878 7,846
Sheffield 15,805 4,012
... ... ...
Total 307,347 181,903
213
COUNTRIES
9. Sessionisation method
He, D., Göker, A.: Detecting session boundaries from web user logs. In Proceedings of the BCS-IRSG 22nd annual
colloquium on information retrieval research (2000) 57-66
14. Clustering Methodology
1. Cluster users not sessions.
(26 columns of data including: IP; User Agent; Location details; Total counts
for requests: session, page types visited, and query counts.)
2. Run elbow curve
3. Scale data
4. Cluster (by page type and queries counts by user)
Attempted clustering methods:
● K-means
● K-modes (k-prototypes)
● DBScan
15. Cluster Classification Principles
User group
characteristic
Log data
Motivation Starting level page (first page URI in session)
Domain / CH Knowledge Page type and queries
Task Page type and possibly queries
Location IP (reversed) identifying country, region and city
Frequency of visits Repeat visits (sessions), queries, length of session
16. Findings from preliminary clustering
Single page viewer High all round
searcher
Event visitor Single query general
page visitor
Deep level browser General museum
visitor
Known item
searcher
1.0 50 3.5 1.2
6 17.5 20
17. Potential mapping to known user groups
Cluster # Users Cluster label Potential user group
1 172,692 Single page viewers Currently un-documented
user-group called “Bouncers”
2 46 High all round searchers Non-Professionals (Hobbyists)
3 4,162 Event visitors Teachers / General Public
4 45,282 Single query general page
visitors
General Public (Pre-Visit) /
Teachers
5 292 Deep level browsers Museum Staff
6 290 General museum visitors General Public / Students
7 2,966 Known item searchers Academics (Experts) /
Non-Professionals (Hobbyists)
18. Conclusion
=
Cluster analysis indicates that the earlier survey study is
representative.
Cluster analysis extends the survey results with a
behavioural dimension
19. Future work
● Explore the behaviors of the clustered groups in more
detail and enhance the known user definitions.
● Extend clustering to look at other data such as location
and museum/gallery accessed.
● Explore clustering just those we think are GP and see if
sub-groups emerge.
20. Thank you for your attention
Link to the full paper :
https://link.springer.com/chapter/10.1007/978-3-030-30760-8_7