SlideShare a Scribd company logo
1 of 27
Download to read offline
ScotLex-1, Edinburgh, 08.04.2016
Carolin Müller-Spitzer & Sascha Wolfer
A QUANTITATIVE VIEW ON DICTIONARY USE:
POTENTIALS AND LIMITATIONS OF LOG FILE ANALYSES
• Lew (2015a): „Until fairly recently, dictionary users were not usually of
central concern in the process of dictionary making […].”
• Advantages of focusing on the user:
 Discover the challenges users face when accessing and using dictionaries
 user instruction, usability
 Learn how users are working with the dictionary
 Discover what users are interested in the most/least
 Test preconceptions of the lexicographer about the users
 User studies enable us to make better dictionaries.
RESEARCH INTO DICTIONARY USE
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 2
Lew, R. (2015a). Dictionaries and their users. In P. Hanks & G.-M. De Schryver (Ed.), International Handbook of Modern
Lexis and Lexicography (1–9). Berlin/Heidelberg: Springer.
• Main aim: Collect empirical data to gain insights into dictionary usage
• Multiple methods of data collection:
(Web) questionnaires, eye tracking studies, usability studies,
log file analyses, …
• Choice of method depends on the research question we want to
address.
 Lew, R. (2015b). Opportunities and limitations of user studies. In C. Tiberius & C. Müller-
Spitzer (Hrsg.), Research into dictionary use / Wörterbuchbenutzungsforschung. 5.
Arbeitsbericht des wissenschaftlichen Netzwerks „Internetlexikografie“ (Bd. 2/2015, S. 6–
16). Mannheim: Institut für Deutsche Sprache. Abgerufen von http://pub.ids-
mannheim.de/laufend/opal/pdf/opal15-2.pdf
 Müller-Spitzer, C. (2014). Using Online Dictionaries. Berlin, New York: De Gruyter.
RESEARCH INTO DICTIONARY USE
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 3
• Log files: Protocols of search requests or article
look-ups.
• Varying amount of information:
 Minimum: Article ID, Timestamp
 User information, article history, technical information (e.g.,
browser, device), ...
 Some log files are already aggregated (e.g., per hour).
• Take care of the legal framework of your country: What
kind of information are you allowed to use without
explicit user consent?
LOG FILE ANALYSES
604.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses
• Bergenholtz, H., & Johnsen, M. (2005). Log Files as a Tool for Improving Internet Dictionaries. Hermes, 34,
117–141.
• Bergenholtz, H., & Johnson, M. (2007). Log files can and should be prepared for a functionalistic approach.
Lexikos, 17, 1–21.
• Verlinde, S., & Binon, J. (2010). Monitoring Dictionary Use in the Electronic Age. In A. Dykstra & T. Schoonheim
(Hrsg.), Proceedings of the XIV Euralex International Congress (S. 1144–1151). Ljouwert: Afûk.
• Hult, A.-K. (2012). Old and New User Study Methods Combined ‒ Linking Web Questionnaires with Log Files
from the Swedish Lexin Dictionary. Oslo. Universitetet i Oslo, Institutt for lingvistiske og nordiske studier. In J.
M. Torjusen & R. V. Fjeld (Hrsg.), Proceedings of the 15th EURALEX International Congress 2012 (S. 922–928).
Oslo, Norway. Abgerufen von http://www.euralex.org/elx_proceedings/Euralex2012/pp922-928%20Hult.pdf
• Schoonheim, T., Tiberius, C., Niestadt, J., & Tempelaars, R. (2012). Dictionary Use and Language Games:
Getting to Know the Dictionary as Part of the Game. In R. Vatvedt Fjeld & J. M. Torjusen (Hrsg.), Proceedings of
the 15th EURALEX International Congress. 7-11 August 2012 (S. 974–979). Oslo: Department of Linguistics and
Scandinavian Studies: University of Oslo.
• De Schryver, G.-M., Joffe, D., Joffe, P., & Hillewaert, S. (2006). Do dictionary users really look up frequent
words?—on the overestimation of the value of corpus-based lexicography. Lexikos, 16, 67–83.
• Koplenig, A., Meyer, P., & Müller-Spitzer, C. (2014). Dictionary users do look up frequent words. A log file
analysis. In C. Müller-Spitzer (Hrsg.), Using Online Dictionaries (S. 229–250). Berlin, Boston: de Gruyter.
LOG FILE ANALYSES: PREVIOUS
RESEARCH
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 7
• The Wikimedia foundation provides log files for all
their sites, including all the different language editions
of Wiktionary.
 https://dumps.wikimedia.org/other/pagecounts-raw/
STUDIES USING WIKTIONARY LOG FILES
804.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses
• One file per hour with all projects.
• Approx. 66 GB (gzipped) per month.
DATA PREPARATION
904.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses
Downloaded files
Relevant rows
(e.g. „de.d“)
Daily
aggregates
Weekly
aggregates
Yearly
aggregates
Additional information (some
extracted from Wiktionary)
• part-of-speech
• # of senses
• headword frequency
• ...
DATA PREPARATION
1004.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses
Page POS Frequency Visits 2013
Tribüne Noun 11,072 230,720
fakultativ Adjektive 497 133,381
Tribunal Noun 11,072 61,728
Grandezza Noun 1,222 20,475
reflektieren Verb 7,961 19,736
... ... ... ...
Visits per 1 million visits
1,723.3
996.3
461.1
153.0
147.4
...
• Are more frequent words visited more frequently?
• Are polysemic words visited more frequently than
monosemic words?
• How can we investigate temporal effects on visiting
frequency?
• What portions of Wiktionary stay „in the dark“
(i.e., are not visited at all or very seldom)?
• Data base: German language edition of Wiktionary
RESEARCH QUESTIONS
1104.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses
• If we compile a general dictionary from scratch, does it make
sense to include more frequent words first?
• Log-file analyses from Wiktionary and DWDS log files suggest:
Yes, words that occur more frequently in every-day language are
also visited more frequently.
CORPUS AND LOOK-UP FREQUENCY
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 12
• Corpus frequency still matters if most frequent words
are excluded.
CORPUS AND LOOK-UP FREQUENCY
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 13
10,000 most
frequent words
A
B
10,000 words randomly
sampled from rest
10,000 most frequent
words from rest
34%
56%
successful
searches
• Are polysemic words visited more often than monosemic
words?
• Challenge: Polysemic words are also more frequent. So, we have
to control for the effect of frequency just shown.
POLYSEMIC AND MONOSEMIC WORDS
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 14
monosemic
polysemic
POLYSEMIC AND MONOSEMIC WORDS
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 15
• Effect of frequency still visible.
• Effect of polysemy
• Interaction effect: Polysemy
contrast tends to be more
pronounced in higher frequency
bands (especially in the highest
decile)
• If we want to extract temporary effects, we have to
take time into consideration.
 Interactive visualisation (German Wiktionary, more to come):
http://www.owid.de/plus/wikivi2015/
• We employed a trend-residualisation technique.
 Calculate the current trend of visitation frequency.
 Calculate the deviations from this trend („residuals“) at
specific points in time.
TEMPORARY EFFECTS ON LOOK-UP
FREQUENCY
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 16
TEMPORARY EFFECTS: EXAMPLE
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 17
TEMPORARY EFFECTS: EXAMPLES
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 18
TEMPORARY EFFECTS: EXAMPLES
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 19
TEMPORARY EFFECTS: EXAMPLES
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 20
TEMPORARY EFFECTS: ‚LARMOYANT‘
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 21
„Der ist jetzt aber richtig sauer. Das passt dem gar
nicht. Und wenn ich das richtig deute, blickt er da eher
Richtung Toni Kroos. Das ist ihm ein bisschen zu
larmoyant... und ... der ist vielleicht noch eher im
Freundschaftsspielmodus …“
He is really peeved now. That really doesn‘t suit him. And
if I interpret this correctly, he is looking into the direction
of Toni Kroos. That‘s a little too lachrymose for him.
And... maybe, he‘s more in exhibition mode …“
• How many and which articles are not visited at all?
 We consider the years 2013, 2014 and 2015.
 Account for the fact that the number of articles is rising.
THE DARK SIDE OF WIKTIONARY
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 22
THE DARK SIDE OF WIKTIONARY
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 23
?
• Approx. 25,000 articles were not visited
during 2013, 2014 and 2015.
 Mostly newer
 Mostly non-German
 German idioms
 Inflected forms
THE DARK SIDE OF WIKTIONARY
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 24
• Log files are well suited to investigate effects on the
„macro user“ level:
 Corpus frequency and look-up frequency
 Polysemy and look-up frequency
 Temporary effects
 „Dark side“ of dictionaries
 Collaborative dictionaries: Look-up and revision frequency
 …
SUMMARY
2504.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses
• Lew (2015b: 11-12): „[…] we need to be aware of the limitations of
the approach.
 One such limitation is that server log files will rarely tell us what the context
of dictionary use is:
 what activity the user is involved in,
 what particular problem they are trying to solve,
 and the levels of success and satisfaction achieved in the consultation.
 Nothing is known about the user, either, such as their age, languages spoken,
proficiency in them, or professional background. […]
 Issues of data privacy can also be a limiting factor in log file analysis.“
OUTLOOK / LIMITATIONS
2604.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses
• Little can be inferred from a small number of log file
events.
 Research based on individual cases is virtually impossible.
 Log file analyses work best if many cases are available for
longer periods.
Quantitative methods
• Log files might be integrated with other methodologies
to gain an even broader insight into dictionary usage.
 Test hypotheses generated by log file analyses with methods
that assess individual performances or preferences.
OUTLOOK / LIMITATIONS
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 27
THANK YOU.
04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 28
BONUS SLIDE: REVISIONS
2904.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses
English Wiktionary German Wiktionary

More Related Content

Similar to Carolin Müller-Spitzer & Sascha Wolfer - A quantitative view on dictionary use: Potentials and limitations of log file analyses

Presentation - First International Library Staff Exchange Week, Zagreb
Presentation - First International Library Staff Exchange Week, ZagrebPresentation - First International Library Staff Exchange Week, Zagreb
Presentation - First International Library Staff Exchange Week, ZagrebIva Vrkic
 
Improving Description through Collaboration: The Ethnomusicological Video for...
Improving Description through Collaboration: The Ethnomusicological Video for...Improving Description through Collaboration: The Ethnomusicological Video for...
Improving Description through Collaboration: The Ethnomusicological Video for...Jenn Riley
 
How Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community WebinarHow Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community WebinarCrossref
 
Audiovisual collections, the spoken word and user needs of scholars in the Hu...
Audiovisual collections, the spoken word and user needs of scholars in the Hu...Audiovisual collections, the spoken word and user needs of scholars in the Hu...
Audiovisual collections, the spoken word and user needs of scholars in the Hu...roelandordelman.nl
 
The more things change, the more they stay the same...”: Why digital journals...
The more things change, the more they stay the same...”: Why digital journals...The more things change, the more they stay the same...”: Why digital journals...
The more things change, the more they stay the same...”: Why digital journals...Pratt_Symposium
 
Sharing an Open Methodology for Building Domain-specific Corpora for EAP
Sharing an Open Methodology for Building Domain-specific Corpora for EAP Sharing an Open Methodology for Building Domain-specific Corpora for EAP
Sharing an Open Methodology for Building Domain-specific Corpora for EAP Alannah Fitzgerald
 
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Alannah Fitzgerald
 
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...Franck Michel
 
A journey into academic journals and databases: services, policies, standards...
A journey into academic journals and databases: services, policies, standards...A journey into academic journals and databases: services, policies, standards...
A journey into academic journals and databases: services, policies, standards...Mansour Esmaeil Zaei
 
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Daniel Vila Suero
 
Open Science, Open Data: towards a new transparent and reproducible ecosystem
Open Science, Open Data:   towards a new transparent and reproducible ecosystemOpen Science, Open Data:   towards a new transparent and reproducible ecosystem
Open Science, Open Data: towards a new transparent and reproducible ecosystemLIBER Europe
 
Stronger together: community initiatives in journal management
Stronger together: community initiatives in journal managementStronger together: community initiatives in journal management
Stronger together: community initiatives in journal managementJisc
 
Transportation spring-2012
Transportation spring-2012Transportation spring-2012
Transportation spring-2012Bruce Slutsky
 
Dynamics of Web: Analysis and Implications from Search Perspective
Dynamics of Web: Analysis and Implications from Search  PerspectiveDynamics of Web: Analysis and Implications from Search  Perspective
Dynamics of Web: Analysis and Implications from Search PerspectiveNattiya Kanhabua
 
Oa and academic integrity for ph d students 2016
Oa and academic integrity for ph d students   2016Oa and academic integrity for ph d students   2016
Oa and academic integrity for ph d students 2016Lars Figenschou
 
Annotated Bibliography Of Language Documentation
Annotated Bibliography Of Language DocumentationAnnotated Bibliography Of Language Documentation
Annotated Bibliography Of Language DocumentationSarah Marie
 
Access to and specifics of detailed national LFS data – the case of Slovenia
Access to and specifics of detailed national LFS data – the case of SloveniaAccess to and specifics of detailed national LFS data – the case of Slovenia
Access to and specifics of detailed national LFS data – the case of SloveniaArhiv družboslovnih podatkov
 
Access to and specifics of detailed national LFS data – the case of Slovenia
Access to and specifics of detailed national LFS data – the case of SloveniaAccess to and specifics of detailed national LFS data – the case of Slovenia
Access to and specifics of detailed national LFS data – the case of SloveniaArhiv družboslovnih podatkov
 

Similar to Carolin Müller-Spitzer & Sascha Wolfer - A quantitative view on dictionary use: Potentials and limitations of log file analyses (20)

Presentation - First International Library Staff Exchange Week, Zagreb
Presentation - First International Library Staff Exchange Week, ZagrebPresentation - First International Library Staff Exchange Week, Zagreb
Presentation - First International Library Staff Exchange Week, Zagreb
 
Improving Description through Collaboration: The Ethnomusicological Video for...
Improving Description through Collaboration: The Ethnomusicological Video for...Improving Description through Collaboration: The Ethnomusicological Video for...
Improving Description through Collaboration: The Ethnomusicological Video for...
 
How Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community WebinarHow Libraries Use Publisher Metadata - Crossref Community Webinar
How Libraries Use Publisher Metadata - Crossref Community Webinar
 
Audiovisual collections, the spoken word and user needs of scholars in the Hu...
Audiovisual collections, the spoken word and user needs of scholars in the Hu...Audiovisual collections, the spoken word and user needs of scholars in the Hu...
Audiovisual collections, the spoken word and user needs of scholars in the Hu...
 
Research Data Lifecycle: Role of Data Services
Research Data Lifecycle: Role of Data ServicesResearch Data Lifecycle: Role of Data Services
Research Data Lifecycle: Role of Data Services
 
The more things change, the more they stay the same...”: Why digital journals...
The more things change, the more they stay the same...”: Why digital journals...The more things change, the more they stay the same...”: Why digital journals...
The more things change, the more they stay the same...”: Why digital journals...
 
Sharing an Open Methodology for Building Domain-specific Corpora for EAP
Sharing an Open Methodology for Building Domain-specific Corpora for EAP Sharing an Open Methodology for Building Domain-specific Corpora for EAP
Sharing an Open Methodology for Building Domain-specific Corpora for EAP
 
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
 
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
 
A journey into academic journals and databases: services, policies, standards...
A journey into academic journals and databases: services, policies, standards...A journey into academic journals and databases: services, policies, standards...
A journey into academic journals and databases: services, policies, standards...
 
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
 
Open Science, Open Data: towards a new transparent and reproducible ecosystem
Open Science, Open Data:   towards a new transparent and reproducible ecosystemOpen Science, Open Data:   towards a new transparent and reproducible ecosystem
Open Science, Open Data: towards a new transparent and reproducible ecosystem
 
Stronger together: community initiatives in journal management
Stronger together: community initiatives in journal managementStronger together: community initiatives in journal management
Stronger together: community initiatives in journal management
 
Transportation spring-2012
Transportation spring-2012Transportation spring-2012
Transportation spring-2012
 
Dynamics of Web: Analysis and Implications from Search Perspective
Dynamics of Web: Analysis and Implications from Search  PerspectiveDynamics of Web: Analysis and Implications from Search  Perspective
Dynamics of Web: Analysis and Implications from Search Perspective
 
Oa and academic integrity for ph d students 2016
Oa and academic integrity for ph d students   2016Oa and academic integrity for ph d students   2016
Oa and academic integrity for ph d students 2016
 
Oct 15 NISO Webinar: 21st Century Resource Sharing: Which Inter-Library Loan ...
Oct 15 NISO Webinar: 21st Century Resource Sharing: Which Inter-Library Loan ...Oct 15 NISO Webinar: 21st Century Resource Sharing: Which Inter-Library Loan ...
Oct 15 NISO Webinar: 21st Century Resource Sharing: Which Inter-Library Loan ...
 
Annotated Bibliography Of Language Documentation
Annotated Bibliography Of Language DocumentationAnnotated Bibliography Of Language Documentation
Annotated Bibliography Of Language Documentation
 
Access to and specifics of detailed national LFS data – the case of Slovenia
Access to and specifics of detailed national LFS data – the case of SloveniaAccess to and specifics of detailed national LFS data – the case of Slovenia
Access to and specifics of detailed national LFS data – the case of Slovenia
 
Access to and specifics of detailed national LFS data – the case of Slovenia
Access to and specifics of detailed national LFS data – the case of SloveniaAccess to and specifics of detailed national LFS data – the case of Slovenia
Access to and specifics of detailed national LFS data – the case of Slovenia
 

Recently uploaded

If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalFabian de Rijk
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Delhi Call girls
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Pooja Nehwal
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...amilabibi1
 
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedDelhi Call girls
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCamilleBoulbin1
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 

Recently uploaded (18)

If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 

Carolin Müller-Spitzer & Sascha Wolfer - A quantitative view on dictionary use: Potentials and limitations of log file analyses

  • 1. ScotLex-1, Edinburgh, 08.04.2016 Carolin Müller-Spitzer & Sascha Wolfer A QUANTITATIVE VIEW ON DICTIONARY USE: POTENTIALS AND LIMITATIONS OF LOG FILE ANALYSES
  • 2. • Lew (2015a): „Until fairly recently, dictionary users were not usually of central concern in the process of dictionary making […].” • Advantages of focusing on the user:  Discover the challenges users face when accessing and using dictionaries  user instruction, usability  Learn how users are working with the dictionary  Discover what users are interested in the most/least  Test preconceptions of the lexicographer about the users  User studies enable us to make better dictionaries. RESEARCH INTO DICTIONARY USE 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 2 Lew, R. (2015a). Dictionaries and their users. In P. Hanks & G.-M. De Schryver (Ed.), International Handbook of Modern Lexis and Lexicography (1–9). Berlin/Heidelberg: Springer.
  • 3. • Main aim: Collect empirical data to gain insights into dictionary usage • Multiple methods of data collection: (Web) questionnaires, eye tracking studies, usability studies, log file analyses, … • Choice of method depends on the research question we want to address.  Lew, R. (2015b). Opportunities and limitations of user studies. In C. Tiberius & C. Müller- Spitzer (Hrsg.), Research into dictionary use / Wörterbuchbenutzungsforschung. 5. Arbeitsbericht des wissenschaftlichen Netzwerks „Internetlexikografie“ (Bd. 2/2015, S. 6– 16). Mannheim: Institut für Deutsche Sprache. Abgerufen von http://pub.ids- mannheim.de/laufend/opal/pdf/opal15-2.pdf  Müller-Spitzer, C. (2014). Using Online Dictionaries. Berlin, New York: De Gruyter. RESEARCH INTO DICTIONARY USE 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 3
  • 4. • Log files: Protocols of search requests or article look-ups. • Varying amount of information:  Minimum: Article ID, Timestamp  User information, article history, technical information (e.g., browser, device), ...  Some log files are already aggregated (e.g., per hour). • Take care of the legal framework of your country: What kind of information are you allowed to use without explicit user consent? LOG FILE ANALYSES 604.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses
  • 5. • Bergenholtz, H., & Johnsen, M. (2005). Log Files as a Tool for Improving Internet Dictionaries. Hermes, 34, 117–141. • Bergenholtz, H., & Johnson, M. (2007). Log files can and should be prepared for a functionalistic approach. Lexikos, 17, 1–21. • Verlinde, S., & Binon, J. (2010). Monitoring Dictionary Use in the Electronic Age. In A. Dykstra & T. Schoonheim (Hrsg.), Proceedings of the XIV Euralex International Congress (S. 1144–1151). Ljouwert: Afûk. • Hult, A.-K. (2012). Old and New User Study Methods Combined ‒ Linking Web Questionnaires with Log Files from the Swedish Lexin Dictionary. Oslo. Universitetet i Oslo, Institutt for lingvistiske og nordiske studier. In J. M. Torjusen & R. V. Fjeld (Hrsg.), Proceedings of the 15th EURALEX International Congress 2012 (S. 922–928). Oslo, Norway. Abgerufen von http://www.euralex.org/elx_proceedings/Euralex2012/pp922-928%20Hult.pdf • Schoonheim, T., Tiberius, C., Niestadt, J., & Tempelaars, R. (2012). Dictionary Use and Language Games: Getting to Know the Dictionary as Part of the Game. In R. Vatvedt Fjeld & J. M. Torjusen (Hrsg.), Proceedings of the 15th EURALEX International Congress. 7-11 August 2012 (S. 974–979). Oslo: Department of Linguistics and Scandinavian Studies: University of Oslo. • De Schryver, G.-M., Joffe, D., Joffe, P., & Hillewaert, S. (2006). Do dictionary users really look up frequent words?—on the overestimation of the value of corpus-based lexicography. Lexikos, 16, 67–83. • Koplenig, A., Meyer, P., & Müller-Spitzer, C. (2014). Dictionary users do look up frequent words. A log file analysis. In C. Müller-Spitzer (Hrsg.), Using Online Dictionaries (S. 229–250). Berlin, Boston: de Gruyter. LOG FILE ANALYSES: PREVIOUS RESEARCH 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 7
  • 6. • The Wikimedia foundation provides log files for all their sites, including all the different language editions of Wiktionary.  https://dumps.wikimedia.org/other/pagecounts-raw/ STUDIES USING WIKTIONARY LOG FILES 804.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses • One file per hour with all projects. • Approx. 66 GB (gzipped) per month.
  • 7. DATA PREPARATION 904.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses Downloaded files Relevant rows (e.g. „de.d“) Daily aggregates Weekly aggregates Yearly aggregates Additional information (some extracted from Wiktionary) • part-of-speech • # of senses • headword frequency • ...
  • 8. DATA PREPARATION 1004.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses Page POS Frequency Visits 2013 Tribüne Noun 11,072 230,720 fakultativ Adjektive 497 133,381 Tribunal Noun 11,072 61,728 Grandezza Noun 1,222 20,475 reflektieren Verb 7,961 19,736 ... ... ... ... Visits per 1 million visits 1,723.3 996.3 461.1 153.0 147.4 ...
  • 9. • Are more frequent words visited more frequently? • Are polysemic words visited more frequently than monosemic words? • How can we investigate temporal effects on visiting frequency? • What portions of Wiktionary stay „in the dark“ (i.e., are not visited at all or very seldom)? • Data base: German language edition of Wiktionary RESEARCH QUESTIONS 1104.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses
  • 10. • If we compile a general dictionary from scratch, does it make sense to include more frequent words first? • Log-file analyses from Wiktionary and DWDS log files suggest: Yes, words that occur more frequently in every-day language are also visited more frequently. CORPUS AND LOOK-UP FREQUENCY 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 12
  • 11. • Corpus frequency still matters if most frequent words are excluded. CORPUS AND LOOK-UP FREQUENCY 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 13 10,000 most frequent words A B 10,000 words randomly sampled from rest 10,000 most frequent words from rest 34% 56% successful searches
  • 12. • Are polysemic words visited more often than monosemic words? • Challenge: Polysemic words are also more frequent. So, we have to control for the effect of frequency just shown. POLYSEMIC AND MONOSEMIC WORDS 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 14 monosemic polysemic
  • 13. POLYSEMIC AND MONOSEMIC WORDS 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 15 • Effect of frequency still visible. • Effect of polysemy • Interaction effect: Polysemy contrast tends to be more pronounced in higher frequency bands (especially in the highest decile)
  • 14. • If we want to extract temporary effects, we have to take time into consideration.  Interactive visualisation (German Wiktionary, more to come): http://www.owid.de/plus/wikivi2015/ • We employed a trend-residualisation technique.  Calculate the current trend of visitation frequency.  Calculate the deviations from this trend („residuals“) at specific points in time. TEMPORARY EFFECTS ON LOOK-UP FREQUENCY 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 16
  • 15. TEMPORARY EFFECTS: EXAMPLE 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 17
  • 16. TEMPORARY EFFECTS: EXAMPLES 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 18
  • 17. TEMPORARY EFFECTS: EXAMPLES 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 19
  • 18. TEMPORARY EFFECTS: EXAMPLES 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 20
  • 19. TEMPORARY EFFECTS: ‚LARMOYANT‘ 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 21 „Der ist jetzt aber richtig sauer. Das passt dem gar nicht. Und wenn ich das richtig deute, blickt er da eher Richtung Toni Kroos. Das ist ihm ein bisschen zu larmoyant... und ... der ist vielleicht noch eher im Freundschaftsspielmodus …“ He is really peeved now. That really doesn‘t suit him. And if I interpret this correctly, he is looking into the direction of Toni Kroos. That‘s a little too lachrymose for him. And... maybe, he‘s more in exhibition mode …“
  • 20. • How many and which articles are not visited at all?  We consider the years 2013, 2014 and 2015.  Account for the fact that the number of articles is rising. THE DARK SIDE OF WIKTIONARY 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 22
  • 21. THE DARK SIDE OF WIKTIONARY 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 23 ?
  • 22. • Approx. 25,000 articles were not visited during 2013, 2014 and 2015.  Mostly newer  Mostly non-German  German idioms  Inflected forms THE DARK SIDE OF WIKTIONARY 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 24
  • 23. • Log files are well suited to investigate effects on the „macro user“ level:  Corpus frequency and look-up frequency  Polysemy and look-up frequency  Temporary effects  „Dark side“ of dictionaries  Collaborative dictionaries: Look-up and revision frequency  … SUMMARY 2504.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses
  • 24. • Lew (2015b: 11-12): „[…] we need to be aware of the limitations of the approach.  One such limitation is that server log files will rarely tell us what the context of dictionary use is:  what activity the user is involved in,  what particular problem they are trying to solve,  and the levels of success and satisfaction achieved in the consultation.  Nothing is known about the user, either, such as their age, languages spoken, proficiency in them, or professional background. […]  Issues of data privacy can also be a limiting factor in log file analysis.“ OUTLOOK / LIMITATIONS 2604.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses
  • 25. • Little can be inferred from a small number of log file events.  Research based on individual cases is virtually impossible.  Log file analyses work best if many cases are available for longer periods. Quantitative methods • Log files might be integrated with other methodologies to gain an even broader insight into dictionary usage.  Test hypotheses generated by log file analyses with methods that assess individual performances or preferences. OUTLOOK / LIMITATIONS 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 27
  • 26. THANK YOU. 04.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses 28
  • 27. BONUS SLIDE: REVISIONS 2904.04.2016 ScotLex-1 - Müller-Spitzer & Wolfer - Log file analyses English Wiktionary German Wiktionary