Data description & conversion
Application and examples
Lemon-aid: using lemon to aid quantitative
historical linguistic an...
Data description & conversion
Application and examples
Goals
Convert dictionary and wordlist data into the Lexicon Model
f...
Data description & conversion
Application and examples
Talk map
Data description & conversion
Ontological model
Applicatio...
Data description & conversion
Application and examples
Overview
Data
Conversion
QuantHistLing project
Quantitative Histori...
Data description & conversion
Application and examples
Overview
Data
Conversion
QuantHistLing data
QuantHistLing aims to d...
Data description & conversion
Application and examples
Overview
Data
Conversion
QuantHistLing data
Simple data output form...
Data description & conversion
Application and examples
Overview
Data
Conversion
QuantHistLing data format
The first row fol...
Data description & conversion
Application and examples
Overview
Data
Conversion
Conversion from QHL-to-LD
We convert the Q...
Data description & conversion
Application and examples
Overview
Data
Conversion
Implementation of QHL data in Lemon
9 Stev...
Data description & conversion
Application and examples
Overview
Data
Conversion
Implementation of QHL data in Lemon
10 Ste...
Data description & conversion
Application and examples
Overview
Data
Conversion
Implementation of QHL data in Lemon
11 Ste...
Data description & conversion
Application and examples
Overview
Data
Conversion
Implementation of QHL data in Lemon
12 Ste...
Data description & conversion
Application and examples
Overview
Data
Conversion
Implementation of QHL data in Lemon
13 Ste...
Data description & conversion
Application and examples
Overview
Data
Conversion
Implementation of QHL data in Lemon
Accord...
Data description & conversion
Application and examples
Application
Examples
Conclusion
Application
A major goal in histori...
Data description & conversion
Application and examples
Application
Examples
Conclusion
Application
As a first step, we have...
Data description & conversion
Application and examples
Application
Examples
Conclusion
Return all triples
returns over 3.8...
Data description & conversion
Application and examples
Application
Examples
Conclusion
Pairs of languages in the translati...
Data description & conversion
Application and examples
Application
Examples
Conclusion
Implementation of QHL data in Lemon...
Data description & conversion
Application and examples
Application
Examples
Conclusion
Languages in the translation graph ...
Data description & conversion
Application and examples
Application
Examples
Conclusion
Queries
Can be easily extended to i...
Data description & conversion
Application and examples
Application
Examples
Conclusion
Conclusion
From data being digitize...
Data description & conversion
Application and examples
Application
Examples
Conclusion
Future work - simplify!
Build algor...
Data description & conversion
Application and examples
Application
Examples
Conclusion
Future work - annotate!
Use NLP Int...
Data description & conversion
Application and examples
Application
Examples
Conclusion
Future work - link!
Link to further...
Data description & conversion
Application and examples
Application
Examples
Conclusion
Many thanks!
Organiziers and partic...
Upcoming SlideShare
Loading in …5
×

Lemon-aid: using Lemon to aid quantitative historical linguistic analysis

2,561 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,561
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Lemon-aid: using Lemon to aid quantitative historical linguistic analysis

  1. 1. Data description & conversion Application and examples Lemon-aid: using lemon to aid quantitative historical linguistic analysis Steven Moran & Martin Br¨ummer University of Zurich & University of Leipzig 1 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  2. 2. Data description & conversion Application and examples Goals Convert dictionary and wordlist data into the Lexicon Model for Ontologies, aka lemon Leverage Linked Data (LD) to combine disparate lexical resources (50+) from the QuantHistLing research unit Resulting LD resources provide researchers with: more linguistic data in the Linguistic Linked Open Data Cloud (LLOD) a translation graph to query across the underlying lexicons and dictionaries to extract semantically-aligned wordlists 2 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  3. 3. Data description & conversion Application and examples Talk map Data description & conversion Ontological model Application and examples 3 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  4. 4. Data description & conversion Application and examples Overview Data Conversion QuantHistLing project Quantitative Historical Linguistics (QuantHistLing) research unit aims to uncover and clarify phylogenetic relationships between native South American languages using quantitative methods http://quanthistling.info/ There are two main objectives of the project: Digitalization of lexical resources on South American languages and The development of computer-assisted methods and algorithms to quantitatively analyze the digitized data 4 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  5. 5. Data description & conversion Application and examples Overview Data Conversion QuantHistLing data QuantHistLing aims to digitize around 500 works, most of which are currently only available in print and many of which are the only resources available for the languages that they describe http://quanthistling.info/index.php?id=resources 5 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  6. 6. Data description & conversion Application and examples Overview Data Conversion QuantHistLing data Simple data output format that contains metadata (prefixed with “@”) and tab-delimited lexical output 6 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  7. 7. Data description & conversion Application and examples Overview Data Conversion QuantHistLing data format The first row following the metadata contains the data header with the fields: QLCID, HEAD, HEAD DOCULECT, TRANSLATION, TRANSLATION DOCULECT They correspond respectively to the internal QLC unique identifier, the headword in the dictionary, the doculect of the headword (or in other words the language which this particular document describes), the translation for the given headword, and the doculect that the translation is given in For each resource a data dump with the same format is provided by the project 7 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  8. 8. Data description & conversion Application and examples Overview Data Conversion Conversion from QHL-to-LD We convert the QLC data into Linked Data that conforms to the Lemon model with a simple Python script Lemon is an ontological model for modeling lexicons and machine-readable dictionaries for linking to the Semantic Web and the Linked Data cloud http://lemon-model.net/ Lemon developers also active in the W3C Ontology-Lexica Community Group Goal is to “develop models for the representation of lexica (and machine readable dictionaries) relative to ontologies” http://www.w3.org/community/ontolex/ 8 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  9. 9. Data description & conversion Application and examples Overview Data Conversion Implementation of QHL data in Lemon 9 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  10. 10. Data description & conversion Application and examples Overview Data Conversion Implementation of QHL data in Lemon 10 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  11. 11. Data description & conversion Application and examples Overview Data Conversion Implementation of QHL data in Lemon 11 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  12. 12. Data description & conversion Application and examples Overview Data Conversion Implementation of QHL data in Lemon 12 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  13. 13. Data description & conversion Application and examples Overview Data Conversion Implementation of QHL data in Lemon 13 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  14. 14. Data description & conversion Application and examples Overview Data Conversion Implementation of QHL data in Lemon According to the Lemon model, senses should link to an ontological entity like a DBpedia resource However, we only have the strings representing the word forms which is not enough data to reliably link to other knowledge bases we argue that it is linguistically correct to link the senses of the entries (instead of their word forms) to their respective translations so the ‘sense’ resources serve a purpose, even if they don’t link the meaning of the entry 14 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  15. 15. Data description & conversion Application and examples Application Examples Conclusion Application A major goal in historical-comparative linguistics is the identification of cognates, i.e. sets of words in genealogically related languages that have been derived from a common word or root (e.g. English ‘is’, German ‘ist’, Latin ‘est’, from Indo-European ‘esti’). Modeling dictionaries and lexicons in a pivot ontology using overlaps in translations is one way to merge several resources into one RDF graph for querying and extracting semantically-aligned wordlists, which can then be used as input into computational historical linguistics tools such as LingPy (List and Moran, 2013). 15 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  16. 16. Data description & conversion Application and examples Application Examples Conclusion Application As a first step, we have converted the QHL data into RDF and it is available online through a SPARQL endpoint. http://linked-data.org/sparql/ (preliminary) http://quanthistlist.info/lod/ (coming soon) a dump is available at http://linked-data.org/datasets/ Querying the combined dictionaries and lexicons is straightforward 16 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  17. 17. Data description & conversion Application and examples Application Examples Conclusion Return all triples returns over 3.8 million triples 17 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  18. 18. Data description & conversion Application and examples Application Examples Conclusion Pairs of languages in the translation graph that contain written forms for the lexical sense “casa” 18 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  19. 19. Data description & conversion Application and examples Application Examples Conclusion Implementation of QHL data in Lemon 19 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  20. 20. Data description & conversion Application and examples Application Examples Conclusion Languages in the translation graph that contain written forms for the lexical sense “casa” 66 language pairs marked entry shows word forms are not normalized and contain data that can be analyzed further 20 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  21. 21. Data description & conversion Application and examples Application Examples Conclusion Queries Can be easily extended to incorporate entire wordlists, such as the Swadesh list (Swadesh, 1952) or Leipzig-Jakarta list (Tadmor et al., 2010) The combination of disparate data from many dictionaries and lexicons is a first step in a computational historical linguistics pipeline Results are given in the source documents’ orthographic representations They must be normalized into an interlingual pivot, such as the International Phonetic Alphabet, if phonetic or phonemic analysis is to be applied to the data Next step before producing phonetic alignments and cognate judgements based on metrics and algorithms for calculating lexical similarity 21 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  22. 22. Data description & conversion Application and examples Application Examples Conclusion Conclusion From data being digitized and extracted from print resources, we are creating machine-readable lexicons that are both interoperable with each other (we link semantic senses using the Lemon ontology model) and with other linguistics sources We also interlink the resulting dictionary resources with other language resources in the LLOD via ISO639-3 codes 22 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  23. 23. Data description & conversion Application and examples Application Examples Conclusion Future work - simplify! Build algorithms that identify semantically similar translation-pairs from terse translations Identify that doculect translations like “coarsely grind” “grind up, crush well” “grind lightly (chili pepper, millet for a quick snack)” “grind lightly (groundnuts) with stones” for different languages can be mapped to a simpler form such as “to crush/grind” for initial comparative analysis 23 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  24. 24. Data description & conversion Application and examples Application Examples Conclusion Future work - annotate! Use NLP Interchange Format (Hellmann et al., 2012) to keep track of where information in the dictionaries comes from - or in other words, use NIF combined with Lemon to annotate the QHL data sources for provenance 24 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  25. 25. Data description & conversion Application and examples Application Examples Conclusion Future work - link! Link to further resources that contain linguistic and non-linguistic information Typological data and geographic variables that may provide useful information for determining the genealogical and geographical relatedness of languages 25 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing
  26. 26. Data description & conversion Application and examples Application Examples Conclusion Many thanks! Organiziers and participants of LDL-2013 QuantHistLing Research Unit - Univeristy of Marburg (Michael Cysouw, PI) University of Zurich, University of Marburg and University of Leipzig 26 Steven Moran & Martin Br¨ummer Lemon-aid: using lemon for QuantHistLing

×