The document discusses applying word sense disambiguation (WSD) techniques to Old English texts. It provides background on Old English and available digital resources. Experiments are conducted on 10 Old English words using Naive Bayes and Maximum Entropy classifiers with bag-of-words and collocation features. Results show accuracies from 75-84%, comparable to human performance on modern English WSD tasks. While challenges exist in applying modern NLP to historical languages, this work demonstrates the potential of WSD on Old English.
towards mulitlingual cultural lexicography. the russian dialect dictionary as...eveline wandl-vogt
the presentation introduces into a new collaboration between the russian academy of sciences and the austrian academy of sciences. the project described is the russian dialect dictionary. the presentation introduces first time into the main project ideas concerning research infrastructures, interoperability and accessability. the transformation process is discussed against the background of open science, citizen science and open innovation.
added value of the collaboration and first steps towards cultural lexicography are discussed on the example of the collaboration with the natural history museum in vienna and on the reusability of common names for cultural heritage instituions such as europeana.
What does it mean to tell stories without narratives? Do the formal effects of poetry - at perhaps as basic a level as lineation - alter the content of the poem? What about the formal effects of media - literally, the means by which a poem or story is transmitted? Does the technology used to record and transmit a thought have an impact on the thought itself, in production or reception? Can the very same thoughts be expressed and understood via cuneiform tablets, papyrus scrolls, printed paper codices (bound books), and even web pages? Or does something else change as the material of communication changes?
This lecture considers the Anglo Saxon poetry found in early Medieval manuscripts - only four of which are known to exist - and asks us to consider the categories we typically apply to poetry, such as Epic, Lyric, and Elegiac, in the historical and material contexts of the Anglo Saxon world. Do these categories still apply?
This document discusses the evolution of the English language over time through influences like invasions, cultural borrowing, and social/historical forces. Key aspects that changed include pronunciation, spelling, grammar, and vocabulary. The document traces the development of Old English and its Germanic roots, the influence of invasions like the Celts, Vikings, and Normans. It also covers how Christianity impacted English through reintroducing Latin and fostering monastic learning environments.
The document discusses the various foreign influences on Old English, including the Celtic, Roman, Scandinavian, and Latin influences. It describes how the Celtic language influenced place names and natural features. Latin was introduced through both direct Roman influence and later through Christianity. Many words relating to religion, education, and literature entered Old English through Latin. Scandinavian influence increased after Danish settlers began arriving in England in the 8th century. Their language introduced many everyday words and had a significant impact on grammar and syntax. Overall, the document examines the periods of contact between Old English and other languages and analyzes the extent and types of words borrowed into the English vocabulary from these foreign sources over time.
This document discusses English word stress. It defines stress as syllables that are longer, louder, and higher in pitch. Stress placement in English words derives from the language's history, with words of Germanic origin typically stressed on the first syllable and words from Latin and Greek often stressed differently. Prefixes, suffixes, compound words, numbers and other grammatical categories can affect stress placement in predictable ways. The document provides numerous examples to illustrate these stress patterns.
Elizaveta Kuzmenko - Morphological Analysis for Russian: Integration and Com...AIST
This document discusses morphological analysis and part-of-speech tagging for Russian. It compares the performance of four different taggers - Pymorphy2, Freeling, TreeTagger, and Mystem. The taggers were evaluated on their ability to identify lemmas, parts-of-speech, and full morphological tags against a reference standard from the Russian National Corpus. TreeTagger demonstrated the highest precision rates overall, while Pymorphy and Freeling achieved higher recall. The document concludes that combining the strengths of these individual taggers could help build an improved morphological tagger for Russian.
Mikhail Korobov - Morphological Analyzer and Generator for Russian and Ukrain...AIST
This document summarizes the Morphological Analyzer and Generator for Russian and Ukrainian Languages called pymorphy2. Pymorphy2 is a Python library that performs morphological analysis and generation tasks like lemmatization and inflection. It uses dictionaries containing over 5 million Russian word forms and their tags. Pymorphy2 encodes this data efficiently using DAFSA to allow fast lookups with low memory usage. It also handles out-of-vocabulary words and provides probabilistic part-of-speech tagging estimates. An evaluation found pymorphy2 and a competing tool made less than 1% errors on test data. Future work may include improving support for special cases and additional languages.
towards mulitlingual cultural lexicography. the russian dialect dictionary as...eveline wandl-vogt
the presentation introduces into a new collaboration between the russian academy of sciences and the austrian academy of sciences. the project described is the russian dialect dictionary. the presentation introduces first time into the main project ideas concerning research infrastructures, interoperability and accessability. the transformation process is discussed against the background of open science, citizen science and open innovation.
added value of the collaboration and first steps towards cultural lexicography are discussed on the example of the collaboration with the natural history museum in vienna and on the reusability of common names for cultural heritage instituions such as europeana.
What does it mean to tell stories without narratives? Do the formal effects of poetry - at perhaps as basic a level as lineation - alter the content of the poem? What about the formal effects of media - literally, the means by which a poem or story is transmitted? Does the technology used to record and transmit a thought have an impact on the thought itself, in production or reception? Can the very same thoughts be expressed and understood via cuneiform tablets, papyrus scrolls, printed paper codices (bound books), and even web pages? Or does something else change as the material of communication changes?
This lecture considers the Anglo Saxon poetry found in early Medieval manuscripts - only four of which are known to exist - and asks us to consider the categories we typically apply to poetry, such as Epic, Lyric, and Elegiac, in the historical and material contexts of the Anglo Saxon world. Do these categories still apply?
This document discusses the evolution of the English language over time through influences like invasions, cultural borrowing, and social/historical forces. Key aspects that changed include pronunciation, spelling, grammar, and vocabulary. The document traces the development of Old English and its Germanic roots, the influence of invasions like the Celts, Vikings, and Normans. It also covers how Christianity impacted English through reintroducing Latin and fostering monastic learning environments.
The document discusses the various foreign influences on Old English, including the Celtic, Roman, Scandinavian, and Latin influences. It describes how the Celtic language influenced place names and natural features. Latin was introduced through both direct Roman influence and later through Christianity. Many words relating to religion, education, and literature entered Old English through Latin. Scandinavian influence increased after Danish settlers began arriving in England in the 8th century. Their language introduced many everyday words and had a significant impact on grammar and syntax. Overall, the document examines the periods of contact between Old English and other languages and analyzes the extent and types of words borrowed into the English vocabulary from these foreign sources over time.
This document discusses English word stress. It defines stress as syllables that are longer, louder, and higher in pitch. Stress placement in English words derives from the language's history, with words of Germanic origin typically stressed on the first syllable and words from Latin and Greek often stressed differently. Prefixes, suffixes, compound words, numbers and other grammatical categories can affect stress placement in predictable ways. The document provides numerous examples to illustrate these stress patterns.
Elizaveta Kuzmenko - Morphological Analysis for Russian: Integration and Com...AIST
This document discusses morphological analysis and part-of-speech tagging for Russian. It compares the performance of four different taggers - Pymorphy2, Freeling, TreeTagger, and Mystem. The taggers were evaluated on their ability to identify lemmas, parts-of-speech, and full morphological tags against a reference standard from the Russian National Corpus. TreeTagger demonstrated the highest precision rates overall, while Pymorphy and Freeling achieved higher recall. The document concludes that combining the strengths of these individual taggers could help build an improved morphological tagger for Russian.
Mikhail Korobov - Morphological Analyzer and Generator for Russian and Ukrain...AIST
This document summarizes the Morphological Analyzer and Generator for Russian and Ukrainian Languages called pymorphy2. Pymorphy2 is a Python library that performs morphological analysis and generation tasks like lemmatization and inflection. It uses dictionaries containing over 5 million Russian word forms and their tags. Pymorphy2 encodes this data efficiently using DAFSA to allow fast lookups with low memory usage. It also handles out-of-vocabulary words and provides probabilistic part-of-speech tagging estimates. An evaluation found pymorphy2 and a competing tool made less than 1% errors on test data. Future work may include improving support for special cases and additional languages.
Alenka Šauperl: Abstracts for scientific papers ÚISK FF UK
The document discusses abstracts for scientific papers. It defines an abstract as a brief, objective representation of a document's contents without added interpretation or criticism. The document outlines best practices for writing abstracts, including following a structure of purpose, method, results, and conclusions. It also notes that journal editors may have their own guidelines for abstract structure. The document analyzes abstracts from information science and materials science journals, finding that information science abstracts average 6 sentences while materials science average 8 sentences. It recommends a 6 sentence structure for information science that includes background, purpose, method, results, and conclusions.
This document summarizes several demonstrations presented at a linguistics conference. It describes projects integrating historical lexical databases through linked open data, allowing tracing of word meanings and concepts over time. It also summarizes demonstrations of tools for searching treebanks and researching morphosyntactic dialects in historical Dutch texts. Finally, it provides brief updates on the status of treebank search applications GrETEL and PaQU, and plans to upgrade the morphological research tool MIMORE.
Getting ready for the unexpected!
Disasters do not care about borders or organizational barriers.
Discuss with us in Kristiansand, Norway, May 24-27 2015
More Info: http://iscram2015.uia.no/
Johannes Hercher Developer Linking Data presentation Fusepool Fusepool SME project
1) The document discusses linking library data to broader web data using Fusepool and identifiers like GND identifiers.
2) It provides an example of searching a library catalog using GND identifiers to expand search beyond local holdings.
3) The document demonstrates how Fusepool can be used to import GND subject headings, match documents to those concepts to annotate documents, and then build services on top of the annotated data.
1) The document discusses linking library data to broader web data using Fusepool and identifiers like GND identifiers.
2) It provides an example of searching library subject indexes enhanced with GND identifiers to search beyond local holdings.
3) The author demonstrates how Fusepool can be used to import GND subject headings, configure an SMA dictionary, match documents to concepts to enrich metadata, and review and build services on the results.
Arthur N. Olsen is applying for a promotion to the position of chief librarian at the University of Agder Library. In his application letter, he outlines his qualifications based on the criteria in the regulations, including: publications in peer-reviewed journals; leadership and participation in research projects; development of advanced documentation and information systems; teaching experience including bibliographic instruction; and documented library development and experimental work. He provides details and references for each of these areas to support his application for the promotion.
Finding and managing process engineering informationThomas Hapke
The document discusses various strategies and resources for finding and managing process engineering information. It begins by outlining some common information challenges in academic research, such as ensuring comprehensive searching and coping with information overload. It then provides details on searching subject-specific databases, using reference management software, consulting encyclopedias and other reference works, and searching for substance property data. The document emphasizes the importance of orientation before searching, using multiple information sources, and thinking about how found information will be further processed. It also introduces resources available at the TUHH library, such as databases, subject gateways, and reference management software to help address these information challenges.
This document discusses the analysis of Celtic "Princely Sites" using Geographic Information Systems (GIS). It begins by defining "Princely Sites" and providing examples, like the Heuneburg and Glauberg sites. The data and methodology used in the GIS analyses are described. The document then discusses rethinking the concept of "Princely Sites" through GIS analyses, looking at aspects like viewsheds, routes, and relationships between sites and their hinterlands. It explores how GIS can provide new insights into questions around centralization of power and the roles of these sites. The summary concludes by mentioning the project aims to better understand settlement patterns and social organization through these spatial analyses.
Finding and managing engineering informationThomas Hapke
This document provides guidance on finding and managing engineering information. It discusses several strategies and resources for conducting research, including subject gateways, library catalogs, reference databases, and search techniques. It emphasizes the importance of orientation before searching, using multiple information sources, and carefully selecting search terms. The document also covers accessing full texts through interlibrary loan, publishers' portals, and link resolvers. Overall, it aims to help researchers cope with information challenges and find relevant information and documents for their academic work.
This document summarizes a talk given by Elena Simukovic on dealing with research data for PhD students. The talk discussed the university's expectations for handling research data according to good scientific practice, including making data accessible for 10 years. It provided information on publishing dissertations and research data through the university's repositories. A survey of PhD students found most had not deposited their data in repositories before. The talk recommended PhD students publish their dissertation and research data with support available through the research data management coordinator.
The document summarizes the work of the Science Signs Project in Scotland to develop new British Sign Language (BSL) signs for science vocabulary. The project has created over 700 new signs covering topics up to Intermediate 2 level science. It aims to increase deaf students' access to science education and exams by translating questions and concepts into BSL. The project involves a team of scientists, teachers and linguists and has launched a website to disseminate the new signs.
This document describes work to identify terms from the Environment Ontology (EnvO) in text descriptions within the Encyclopedia of Life (EOL) and annotate EOL pages with those terms. Researchers developed a dictionary-based tagger using EnvO terms to automatically annotate over 234,000 EOL taxa pages with over 1.9 million EnvO tags. The annotations link habitat and other environmental descriptions in EOL pages to standardized EnvO terms, helping integrate biodiversity data across sources.
SHEBANQ project (half-way) as a use case in querying language resources. The corpus is the text of the Hebrew Bible with linguistic features, packaged in de special text database and converted to LAF
Scholar voices 1 - international scholars perspective of UK librariesnmjb
The document discusses the journey and experiences of Dr. Jarka Glassey, a Slovak chemical engineer who became a senior lecturer in the UK. It describes her educational background in Slovakia, including studying chemical engineering and completing internships. It then details her path to obtaining a PhD from Newcastle University in the UK and taking on roles as a lecturer and director. The document also contrasts the library resources and culture she experienced as a student in Slovakia in the late 1980s to the current state of the library at her alma mater in Slovakia.
We introduce four research questions that can be addressed using log files of online dictionaries:
(1) Are words that occur more frequently in everyday language also looked up more frequently in a dictionary? (2) Are polysemic words visited more frequently than monosemic words? (3) How can we investigate temporal effects on visiting frequency? (4) What portions of Wiktionary stay “in the dark” (i.e., are not visited at all or very infrequently)? For almost all analyses of log file data, additional information is necessary, like corpus frequency of headwords or information that can be extracted from the dictionary article itself (e.g, part-of-speech of the headword or number of senses). We will focus on the methodological side of the analyses, proposing a quantitative view on the data. Apart from that, we will also discuss what limitations we face when dealing with log file data.
This document summarizes the history and initiatives of the Nordic Network for Interdisciplinary Environmental Studies (NIES). NIES began in 2007 with a dozen researchers from three Nordic countries and has since grown to over 100 researchers from five Nordic countries. NIES supports research at the intersection of the humanities and environment. It has hosted several international conferences and workshops and helped develop new research projects and publications. Current initiatives include a journal on environmental humanities and developing a Nordic masters program and European cooperation in the field.
Finding and managing engineering informationThomas Hapke
The document discusses finding and managing engineering information. It provides guidance on systematically searching for information using subject-specific databases and reference management software to cope with information overload. The document also discusses using encyclopedias, standards, and other reference works for orientation and exploring subject gateways and databases to find full texts of articles and books. It emphasizes the importance of orientation before searching, using multiple information sources, and considering how information will be further processed when researching.
This document describes the development of the first Norwegian Academic Wordlist (AKA list) for the Norwegian Bokmål variety. The researchers created a 100-million word academic corpus from the University of Oslo archive and tested two methods (Gothenburg method and Gardner & Davies method) to identify academic vocabulary. They compared wordlists produced from each method by measuring coverage in test corpora. The resulting wordlist identifies vocabulary that frequently appears across academic disciplines and will help Norwegian students engage with academic texts.
1. EXPERT Winter School Partner IntroductionsRIILP
The document provides information about the University of Wolverhampton's Research Group in Computational Linguistics and Statistical Cybermetrics Research Group. It discusses the groups' expertise in various areas of natural language processing and information retrieval. Key personnel are mentioned, including Ruslan Mitkov, Constantin Orasan, and Mike Thelwall. Ongoing and past projects funded by sources like the EC and NBME are summarized.
A Short Review Of The Application Of 3D Documentation Methods On Selected UW ...Lisa Brewer
This document describes a conference on underwater archaeology held in Kiel, Germany from November 21-23, 2014. The conference was organized by students, graduates, and doctoral candidates and brought together international presenters to share research on topics related to underwater archaeology. It provides background information on the university hosting the conference, including its strengths in marine sciences and archaeology. It also welcomes participants and wishes them successful scientific exchanges.
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
More Related Content
Similar to Word Sense Disambiguation in Old English
Alenka Šauperl: Abstracts for scientific papers ÚISK FF UK
The document discusses abstracts for scientific papers. It defines an abstract as a brief, objective representation of a document's contents without added interpretation or criticism. The document outlines best practices for writing abstracts, including following a structure of purpose, method, results, and conclusions. It also notes that journal editors may have their own guidelines for abstract structure. The document analyzes abstracts from information science and materials science journals, finding that information science abstracts average 6 sentences while materials science average 8 sentences. It recommends a 6 sentence structure for information science that includes background, purpose, method, results, and conclusions.
This document summarizes several demonstrations presented at a linguistics conference. It describes projects integrating historical lexical databases through linked open data, allowing tracing of word meanings and concepts over time. It also summarizes demonstrations of tools for searching treebanks and researching morphosyntactic dialects in historical Dutch texts. Finally, it provides brief updates on the status of treebank search applications GrETEL and PaQU, and plans to upgrade the morphological research tool MIMORE.
Getting ready for the unexpected!
Disasters do not care about borders or organizational barriers.
Discuss with us in Kristiansand, Norway, May 24-27 2015
More Info: http://iscram2015.uia.no/
Johannes Hercher Developer Linking Data presentation Fusepool Fusepool SME project
1) The document discusses linking library data to broader web data using Fusepool and identifiers like GND identifiers.
2) It provides an example of searching a library catalog using GND identifiers to expand search beyond local holdings.
3) The document demonstrates how Fusepool can be used to import GND subject headings, match documents to those concepts to annotate documents, and then build services on top of the annotated data.
1) The document discusses linking library data to broader web data using Fusepool and identifiers like GND identifiers.
2) It provides an example of searching library subject indexes enhanced with GND identifiers to search beyond local holdings.
3) The author demonstrates how Fusepool can be used to import GND subject headings, configure an SMA dictionary, match documents to concepts to enrich metadata, and review and build services on the results.
Arthur N. Olsen is applying for a promotion to the position of chief librarian at the University of Agder Library. In his application letter, he outlines his qualifications based on the criteria in the regulations, including: publications in peer-reviewed journals; leadership and participation in research projects; development of advanced documentation and information systems; teaching experience including bibliographic instruction; and documented library development and experimental work. He provides details and references for each of these areas to support his application for the promotion.
Finding and managing process engineering informationThomas Hapke
The document discusses various strategies and resources for finding and managing process engineering information. It begins by outlining some common information challenges in academic research, such as ensuring comprehensive searching and coping with information overload. It then provides details on searching subject-specific databases, using reference management software, consulting encyclopedias and other reference works, and searching for substance property data. The document emphasizes the importance of orientation before searching, using multiple information sources, and thinking about how found information will be further processed. It also introduces resources available at the TUHH library, such as databases, subject gateways, and reference management software to help address these information challenges.
This document discusses the analysis of Celtic "Princely Sites" using Geographic Information Systems (GIS). It begins by defining "Princely Sites" and providing examples, like the Heuneburg and Glauberg sites. The data and methodology used in the GIS analyses are described. The document then discusses rethinking the concept of "Princely Sites" through GIS analyses, looking at aspects like viewsheds, routes, and relationships between sites and their hinterlands. It explores how GIS can provide new insights into questions around centralization of power and the roles of these sites. The summary concludes by mentioning the project aims to better understand settlement patterns and social organization through these spatial analyses.
Finding and managing engineering informationThomas Hapke
This document provides guidance on finding and managing engineering information. It discusses several strategies and resources for conducting research, including subject gateways, library catalogs, reference databases, and search techniques. It emphasizes the importance of orientation before searching, using multiple information sources, and carefully selecting search terms. The document also covers accessing full texts through interlibrary loan, publishers' portals, and link resolvers. Overall, it aims to help researchers cope with information challenges and find relevant information and documents for their academic work.
This document summarizes a talk given by Elena Simukovic on dealing with research data for PhD students. The talk discussed the university's expectations for handling research data according to good scientific practice, including making data accessible for 10 years. It provided information on publishing dissertations and research data through the university's repositories. A survey of PhD students found most had not deposited their data in repositories before. The talk recommended PhD students publish their dissertation and research data with support available through the research data management coordinator.
The document summarizes the work of the Science Signs Project in Scotland to develop new British Sign Language (BSL) signs for science vocabulary. The project has created over 700 new signs covering topics up to Intermediate 2 level science. It aims to increase deaf students' access to science education and exams by translating questions and concepts into BSL. The project involves a team of scientists, teachers and linguists and has launched a website to disseminate the new signs.
This document describes work to identify terms from the Environment Ontology (EnvO) in text descriptions within the Encyclopedia of Life (EOL) and annotate EOL pages with those terms. Researchers developed a dictionary-based tagger using EnvO terms to automatically annotate over 234,000 EOL taxa pages with over 1.9 million EnvO tags. The annotations link habitat and other environmental descriptions in EOL pages to standardized EnvO terms, helping integrate biodiversity data across sources.
SHEBANQ project (half-way) as a use case in querying language resources. The corpus is the text of the Hebrew Bible with linguistic features, packaged in de special text database and converted to LAF
Scholar voices 1 - international scholars perspective of UK librariesnmjb
The document discusses the journey and experiences of Dr. Jarka Glassey, a Slovak chemical engineer who became a senior lecturer in the UK. It describes her educational background in Slovakia, including studying chemical engineering and completing internships. It then details her path to obtaining a PhD from Newcastle University in the UK and taking on roles as a lecturer and director. The document also contrasts the library resources and culture she experienced as a student in Slovakia in the late 1980s to the current state of the library at her alma mater in Slovakia.
We introduce four research questions that can be addressed using log files of online dictionaries:
(1) Are words that occur more frequently in everyday language also looked up more frequently in a dictionary? (2) Are polysemic words visited more frequently than monosemic words? (3) How can we investigate temporal effects on visiting frequency? (4) What portions of Wiktionary stay “in the dark” (i.e., are not visited at all or very infrequently)? For almost all analyses of log file data, additional information is necessary, like corpus frequency of headwords or information that can be extracted from the dictionary article itself (e.g, part-of-speech of the headword or number of senses). We will focus on the methodological side of the analyses, proposing a quantitative view on the data. Apart from that, we will also discuss what limitations we face when dealing with log file data.
This document summarizes the history and initiatives of the Nordic Network for Interdisciplinary Environmental Studies (NIES). NIES began in 2007 with a dozen researchers from three Nordic countries and has since grown to over 100 researchers from five Nordic countries. NIES supports research at the intersection of the humanities and environment. It has hosted several international conferences and workshops and helped develop new research projects and publications. Current initiatives include a journal on environmental humanities and developing a Nordic masters program and European cooperation in the field.
Finding and managing engineering informationThomas Hapke
The document discusses finding and managing engineering information. It provides guidance on systematically searching for information using subject-specific databases and reference management software to cope with information overload. The document also discusses using encyclopedias, standards, and other reference works for orientation and exploring subject gateways and databases to find full texts of articles and books. It emphasizes the importance of orientation before searching, using multiple information sources, and considering how information will be further processed when researching.
This document describes the development of the first Norwegian Academic Wordlist (AKA list) for the Norwegian Bokmål variety. The researchers created a 100-million word academic corpus from the University of Oslo archive and tested two methods (Gothenburg method and Gardner & Davies method) to identify academic vocabulary. They compared wordlists produced from each method by measuring coverage in test corpora. The resulting wordlist identifies vocabulary that frequently appears across academic disciplines and will help Norwegian students engage with academic texts.
1. EXPERT Winter School Partner IntroductionsRIILP
The document provides information about the University of Wolverhampton's Research Group in Computational Linguistics and Statistical Cybermetrics Research Group. It discusses the groups' expertise in various areas of natural language processing and information retrieval. Key personnel are mentioned, including Ruslan Mitkov, Constantin Orasan, and Mike Thelwall. Ongoing and past projects funded by sources like the EC and NBME are summarized.
A Short Review Of The Application Of 3D Documentation Methods On Selected UW ...Lisa Brewer
This document describes a conference on underwater archaeology held in Kiel, Germany from November 21-23, 2014. The conference was organized by students, graduates, and doctoral candidates and brought together international presenters to share research on topics related to underwater archaeology. It provides background information on the university hosting the conference, including its strengths in marine sciences and archaeology. It also welcomes participants and wishes them successful scientific exchanges.
Similar to Word Sense Disambiguation in Old English (20)
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfSelcen Ozturkcan
Ozturkcan, S., Berndt, A., & Angelakis, A. (2024). Mending clothing to support sustainable fashion. Presented at the 31st Annual Conference by the Consortium for International Marketing Research (CIMaR), 10-13 Jun 2024, University of Gävle, Sweden.
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Word Sense Disambiguation in Old English
1. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
1 / 25
"God Wat þæt Ic Eom God"
Word Sense Disambiguation in Old English
Bamberg, Staatsbibliothek, Msc.Nat.1 (9th century)
Martin Wunderlich and Alexander Fraser (LMU M nchen)
Paul Sander Langeslag (University of G ttingen)
2. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
2 / 25
Can we apply WSD
techniques to a
historical language
like Old English
and
what are the
specific challenges?
3. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
3 / 25
Overview
●
Background on the Old English language
●
NLP and historical languages – problems and
opportunities
●
Old English digital resources
●
WSD methodologies applied here
●
Experiments and results
●
Summary and discussion
4. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
4 / 25
Background on the OE language 1
●
Spoken ca. 450 – 1100 AD
●
A Germanic language:
„God Wat þæt Ic Eom God‟
→ „Gott weiß, dass ich gut bin‟
(„God knows I'm good‟ - David Bowie)
●
5 cases, 3 genders, 3 numbers (singual, dual, plural)
An example:
– „Seo cwen geseah þone guman.‟ *
– „Se guma geseah þa cwen.‟ **
(from Crystal, 2010)
* „The woman saw the man.‟ ** „The man saw the woman‟
5. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
5 / 25
Background on the OE language 2
●
Initially a runic alphabet known as „futhorc‟
(after the first letters -ᚠᚢᚦᚩᚱᚳ)
●
...keeping Thorn ᚦ and Wynn ƿ and adding Latin
●
24 letter alphabet:
a æ b c d ð e f ᵹ/g h i l m n o p r s/ſ t þ u ƿ/w x y
●
Introduced around 600 AD
6. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
6 / 25
Background on the OE language 3
Migrations and settlements:
https://www.uni-due.de/SHE/Germanic_Migration_to_Britain.gif
(site maintained by Prof. Raymond Hickey, Chair of Linguistics)
7. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
7 / 25
NLP & historical languages: problems
●
Stopword lists
●
POS taggers
●
Word and sentence tokenizers
●
Standard tools and libraries
●
Shared tasks with prepared training data
●
Existing research
8. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
8 / 25
NLP & historical languages: problems
●
Stopword lists
●
POS taggers
●
Word and sentence tokenizers
●
Standard tools and libraries
●
Shared tasks with prepared training data
●
Existing research … well, a bit ...
9. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
9 / 25
NLP & historical languages: related work
●
Annotation projection in Germanic languages with parallel bible texts
(Sukhareva and Chiarcos, 2014)
●
Application of existing NLP tools to ancient Italian
(Pennacchiotti and Zanzotto, 2008)
●
Tagging Old East Slavonic texts
(Meyer, 2011)
●
POS tagging Early Modern German texts
(Bollmann, 2013)
●
Projection of tags from contemporary EN to ME
(Moon and Baldridge, 2007)
10. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
10 / 25
NLP & historical languages: opportunities
1.Digital corpora & dictionaries/lexicons do exist
(incl. OE Wikipedia: https://ang.wikipedia.org/wiki/H%C4%93afodtramet)
2.Static corpus
3.Few existing NLP applications → lots to explore
11. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
11 / 25
Old English digital resources: corpora
●
York-Toronto-Helsinki Parsed Corpus of Old
English prose (YCOE); ca. 1.5 million words
●
York-Toronto-Helsinki Parsed Corpus of Old
English poetry (YCOEP); 71,490 words
●
Dictionary of Old English Corpus in Electronic
Form (DOEC); ca. 3.8 million words
→ all available through the University of Oxford Text Archive
(http://www.ota.ahds.ac.uk/);
12. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
12 / 25
Old English digital resources: dictionary
Dictionary of Old English (DOE) corpus stats:
Number of HTML documents 3,037
Token count 3,786,753
Type count 343,135
Token count / type count ca. 11
Total number of sentences 234113
Average sentence length 5.5
Minimum sentence length 1
Maximum sentence length 263
Compare to Brown
corpus:
ca. 1 Mio tokens and ca.
50.000 types (T/T = 20)
Spelling variations. e.g.
„wundarlic‟, „wundorlic‟,
„wunderlic‟
12568 DOE entries for the letters from A to G
(http://tapor.library.utoronto.ca/doe/)
13. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
13 / 25
WSD methodologies 1
Criteria for selecting the target terms:
➔
minimum count 200, minimum length 3 characters
➔
non-Latin (i.e. no „dictum‟, „confundantur‟, „magister‟...)
➔
common nouns
➔
no proper nouns (e.g. no „Egypta‟, „Micel‟, „Iulianus‟...)
14. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
14 / 25
WSD methodologies 2
Target terms: Target term Token count in
DOE corpus
Basic translation
Anweald 242 Power, realm, order of
angels
Fultum 574 Help, aid, remedy
Fæder 416 Father, lord (relig.)
For 955 Movement, journey...
Eadigan 263 To bless, to make happy
Boc 567 Book, volume, legal doc
Ban 314 Bone, ivory
Are 308 Honour, mercy, property
Andlang 1743 Continuous, upright
Dryhten 261 Lord (worldly & relig.), chief
100 concordance
matches each
(random selection)
15. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
15 / 25
WSD methodologies 3
Selected word senses of "bōc":
(http://tapor.library.utoronto.ca/doe/dict/indices/headwordsd.html#E03007)
A. book
A.1. in general, without particular reference to form or content
Lk (WSCp) 4.17: he þa boc unfeold
B. major division of a larger work
JnArgGl (Li) 3: ðis uutedlice godspell aurat in ðær meigð æfter
ðon in Pathma ealond þæt boc ðæra sighðana eac awrat.
D. legal document
Birch 862: Þis is ðæs landes boc æt Duntune ðe Eadred cyng
edniwon gæbocodæ sanctæ trinitate & Sanctæ Pætræ &
Sanctæ Paule into ealdan mynstræ.
16. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
16 / 25
WSD methodologies 4
From corpus to feature vectors – bag-of-words model with fixed size
token window
from Ch 540 (Birch 862):
17. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
17 / 25
Implementation
●
Libraries used:
– Mallet (NLP and ML library)
– Jsoup (HTML processing)
●
Own implementation:
– Parsing of corpus and dictionary data
– Feature extraction and instance creation
– Pipes for baseline classifiers (Mallet additions)
– Metrics, summarization and output of results
...and much more...
18. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
18 / 25
Experiments and results 1
●
Baseline 1: most frequent class.
– Accuracy: 0.67
●
Baseline 2: random class.
– Accuracy: 0.44
Human annotators' upper and lower bounds: 0.75 – 0.97
(Gale et al., 1992)
19. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
19 / 25
Experiments and results 2
One-vs-all classification
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
A vs. notA - Naive Bayes
Accuracy
Avg Precision
Avg Recall
Avg F1
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
A vs. notA - Naive Bayes
Accuracy
Lin Reg trend
Avg Precision
Avg Recall
Avg F1
20. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
20 / 25
Experiments and results 3
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
A vs. notA - MaxEnt
Accuracy
Avg Precision
Avg Recall
Avg F1
One-vs-all classification
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
A vs. notA - MaxEnt
Accuracy
Lin Reg trend
Avg Precision
Avg Recall
Avg F1
21. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
21 / 25
WSD methodologies 3
Selected word senses of "bōc":
(http://tapor.library.utoronto.ca/doe/dict/indices/headwordsd.html#E03007)
A. book
A.1. in general, without particular reference to form or content
Lk (WSCp) 4.17: he þa boc unfeold
B. major division of a larger work
JnArgGl (Li) 3: ðis uutedlice godspell aurat in ðær meigð æfter
ðon in Pathma ealond þæt boc ðæra sighðana eac awrat.
D. legal document
Birch 862: Þis is ðæs landes boc æt Duntune ðe Eadred cyng
edniwon gæbocodæ sanctæ trinitate & Sanctæ Pætræ &
Sanctæ Paule into ealdan mynstræ.
23. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
23 / 25
Summary
●
Historical languages: interesting, rewarding and difficult to work with
●
WSD does give satisfactory results even without stemming etc.
●
Best WSD performance: NB (F1), one vs. all, window size: ??
●
Annotated data set (available on website)
●
Baseline classifiers as contributions to MALLET
●
Possible extensions:
– More advanced vector representations
– Bootstrapping
– Train classifiers based on other corpora
– Distributional thesaurus (DT)?
●
Acknowledgements:
Winfried Rudolf, Göttingen & Juan Carmona Ramirez, Jena
24. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
24 / 25
Thanks a lot for your attention!
Any questions?
Paul S. Langeslag, Göttingen
New book: Seasons in the Literatures of the Medieval North
Alexander Fraser, München
25. GSCL, Essen – 2015-09-30 WSD in Old English – Martin Wunderlich, Alexander Fraser, Paul Sander Langeslag
Additional material: http://www.cis.uni-muenchen.de/~martinw/
25 / 25
References
● Mark Stevenson. Word sense disambiguation : the case for combinations of knowledge sources. CSLI
studies in computational linguistics. CSLI Publ., Stanford, Calif., 2003.
● D. Yarowsky. Word sense disambiguation. In Alexander Clark, editor, The handbook of computational
linguistics and natural language processing, Blackwell handbooks in linguistics. Wiley-Blackwell, Oxford
[u.a.], 1. publ. Edition, 2010.
● D. Crystal. The Cambridge Encyclopedia of Language. The Cambridge Encyclopedia of Language.
Cambridge University Press, 2010.
● Clara Cabezas, Philip Resnik, and Jessica Stevens. Supervised sense tagging using support vector machi
nes. In The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguati-
on Systems, SENSEVAL ’01, pages 59–62, Stroudsburg, PA, USA, 2001. Association for Computational
Linguistics.
● Andrew Kachites McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu,
2002.
● Marcel Bollmann. Pos tagging for historical texts with sparse training data. In Proceedings of the 7th
Linguistic Annotation Workshop and Interoperability in Discourse, pages 11–18, Sofia, Bulgaria, August
2013. Association for Computational Linguistics.
● Taesun Moon and Jason Baldridge. Part-of-speech tagging for middle English through alignment and
projection of parallel diachronic texts. In Proceedings of the 2007 Joint Conference on Empirical Me- thods
in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages
390–399, Prague, Czech Republic, June 2007. Association for Computational Linguistics.
● Roland Meyer. New wine in old wineskins? - tagging old russian via annotation projection from modern
translations. Russian Linguistics, 35(2):267–281, 2011.
● Marco Pennacchiotti and Fabio Massimo Zanzotto. Natural Language Processing across time: an empi
rical investigation on Italian, volume 5221, pages 371–382. Springer, 2008.