Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
WP3:	Linguis,cs	–	Text	
Sjef	Barbiers		
Partners		
	INT,	MI,	RU,	RUG,	UU,	VU	
	
Today’s	demo’s	
	Integra,ng	Diachronous	Co...
Integra,ng	Diachronous	Conceptual	Lexicons	
through	Linked	Open	Data	
	
Isa	Maks1,	Marieke	van	Erp1,	Piek	Vossen1,	Rinke	H...
xsd:string
ontolex:
LexicalEntry
rdfs:label
penn:Tag ontolex:LexicalSense
ontolex:Form
olia:hasTag
ontolex:sense
ontolex:c...
Resources	
1600	 EmbodiedEmo,ons	
	h`ps://www.esciencecenter.nl/project/from-
sen,ment-mining-to-mining-embodied-
emo,ons	...
Query	expansion	
Finding	occupa(ons	in	historic	texts	
‘small	farmers’ 	 		
	
	
	
	
	
	
En	van	de	schamelheid	zijner	plagg...
GrETEL,	PaQU	
Gertjan	van	Noord	(RUG)		–	Jan	Odijk	(UU)	
•  Web	applica,ons:	search	in	treebanks	
–  Treebank	=	text	in	wh...
Development	Plan	&	Status	
			
GrETEL	 PaQU	
Base	 CLARIN-NL	 CLARIN-NL	
Own	Corpus	 CLARIAH	 CLARIN-NL	
Metadata	 CLARIAH...
Research	Done		
•  PhD	on	verb	clustering	in	Dutch:	Augus,nus	(2015):		
•  acquisi,on	of	the	words	zeer,	heel,	erg	(`very’...
Hun	–	Zij/Ze	as	subject	
Per	million	
words	
WriQen	 Spoken	
hun	
	
0	 20	
zij	(mv)	 343	 360	
ze	(mv)	 1481	 4107	
9	
•  ...
Hem/’m	–	Hij/ie	as	subject	
•  ‘m	rare,	only	in	spoken	corpus,	only	in	Flanders,	
only	in	unprepared	speech	(a-h)		
Per	mi...
References	
L.	Augus,nus	(2015):	Complement	Raising	and	Cluster	Forma,on	in	Dutch:	A	
treebank-supported	inves,ga,on,		PhD...
Upgrade	MIMORE		
	Marc	Kemps	Snijders	-	Sjef	Barbiers	(Meertens)	
•  Morphosyntac,c	research	tool	for	three	Dutch	dialect	...
Search	for	subject	doubling		
in	the	Syntac7c	Atlas	of	the		
Dutch	Dialects	(SAND)
Show	the	data	underlying	the	map	
	of	subject	doubling	2.singular	
Save	as	corpus
Search	for	a	poten7ally	
correla7ng	property	in	a	different	
database	(DIDDD):		
ar7cle-demonstra7ve	sequences
POS	tag	specifica7on	of	Search	
D(art,def)	followed	by	D(dem,def)
Result	list	with	(a.o.).		
KWIC	concordance	and	POS	tags	
Save	as	corpus
Saved	data	sets	in	workspace
Geographic	distribu7on	
of	the	two	phenomena
Upcoming SlideShare
Loading in …5
×

2016 05-20-clariah-wp3

205 views

Published on

Overview of the current status of WP3

Published in: Education
  • Be the first to comment

  • Be the first to like this

2016 05-20-clariah-wp3

  1. 1. WP3: Linguis,cs – Text Sjef Barbiers Partners INT, MI, RU, RUG, UU, VU Today’s demo’s Integra,ng Diachronous Conceptual Lexicons through Linked Open Data – VU GrETEL, PaQU: Searching Tree Banks – RUG / UU / Leuven MIMORE in Nederlab: Morphosyntac,c dialect research – Meertens
  2. 2. Integra,ng Diachronous Conceptual Lexicons through Linked Open Data Isa Maks1, Marieke van Erp1, Piek Vossen1, Rinke Hoekstra, Nicoline van der Sijs 1:Fac. of Humani,es, Vrije Universiteit Amsterdam (WP3) 2:Computer Science Department, Vrije Universiteit Amsterdam (WP4) 3:Meertens Ins,tuut Amsterdam( WP2) Integra(on and enrichment of several exis(ng historical conceptual lexicons, matching the ontologies, using linked open data principles. Enables: •  tracing changes in word meanings and concepts over (me •  query expansion •  natural language processing of historical Dutch texts.
  3. 3. xsd:string ontolex: LexicalEntry rdfs:label penn:Tag ontolex:LexicalSense ontolex:Form olia:hasTag ontolex:sense ontolex:canonicalForm ontolex:Formontolex:otherForm lemon-cltl:Usage xsd:date xsd:date lemon:Sense Definition ontolex:Lexical Concept ontolex:definition ontolex:isSenseOf lemon-cltl:periodEnd ontolex:usage xsd:string skos:prefLabel skos:Concept skos:related lemon-cltl:periodStart dbo:Place lemon-cltl:geographicArea dbo:Thing dct:subject skos:concept is a ontolex:reference lemon-cltl: SpatioTemporalScope lemon-cltl:scope lexinfo:Register lexinfo:register Prefixes: ontolex: http://www.w3.org/ns/lemon/ontolex# lexinfo: http://www.lexinfo.net/ontology/2.0/lexinfo# penn: http://purl.org/olia/penn.owl# olia: http://purl.org/olia/olia.owl# xsd: http://www.w3.org/2001/XMLSchema# skos: http://www.w3.org/2004/02/skos# dct: http://purl.org/dc/terms/ dbo: http://dbpedia.org/ontology/ lemon:cltl: additional modeling (in progress) Ontology or classifica,on Which concept? Is it a plant, an occupa,on, an emo,on, etc. ? Words Which words can express these concepts? part-of-speech, form variants, spelling variants? When In which period are these words used? Where In which part of the Netherlands or Belgium is this word used? Provenance Which source provided the informa,on? Modelling the lexicons as linked open data
  4. 4. Resources 1600 EmbodiedEmo,ons h`ps://www.esciencecenter.nl/project/from- sen,ment-mining-to-mining-embodied- emo,ons emo,ons 1650 Meijers Meijers Woordenschat (1669) all domains 1800 HISCO h`p://historyofwork.iisg.nl/ occupa,on 1850 Brouwers Brouwers Thesaurus (1987) all domains 1885 Pland h`ps://www.meertens.knaw.nl/pland/ plants 1950 ODWN h`p://www.cltl.nl/results/demos/open- source-dutch-wordnet/ all domains other resources will be added in the future
  5. 5. Query expansion Finding occupa(ons in historic texts ‘small farmers’ En van de schamelheid zijner plaggen had er de heikeuter nog eerst den langen weg te gaan tot de burgers van Venlo, eer hij de winst van zijn arbeid ingeruild zag tegen 't noodige voor een schraal bestaan. (Felix Ru`en, 1918, Ons mooie Limburg, DBNL) Hisco [occupa7on-65111-small farming] kleinboer kleinlandbouwer keuterboer …........ Brouwers [concept?] keuterboer heikeuter landbouwer ….........
  6. 6. GrETEL, PaQU Gertjan van Noord (RUG) – Jan Odijk (UU) •  Web applica,ons: search in treebanks –  Treebank = text in which each sentence has a syntac,c parse •  With interfaces designed for linguists •  Enables syntac,c research •  Applica,ons language-independent but need language- specific components –  PaQu, GrETEL: Dutch only –  PolyGrETEL: mul,ple languages
  7. 7. Development Plan & Status GrETEL PaQU Base CLARIN-NL CLARIN-NL Own Corpus CLARIAH CLARIN-NL Metadata CLARIAH CLARIAH Analysis Component CLARIAH CLARIN-NL + CLARIAH More formats CLARIAH CLARIAH Interface CLARIAH CLARIN-NL More Corpora CLARIAH CLARIAH GREEN = done ORANGE= par,al RED= TO DO
  8. 8. Research Done •  PhD on verb clustering in Dutch: Augus,nus (2015): •  acquisi,on of the words zeer, heel, erg (`very’): Odijk (2015, 2016) •  norma,ve and non-norma,ve variants of 12 Dutch construc,ons : Odijk (2015), van Noord & Odijk (2016) •  agreement in copular construc,ons: Van Eynde et al. (2016)
  9. 9. Hun – Zij/Ze as subject Per million words WriQen Spoken hun 0 20 zij (mv) 343 360 ze (mv) 1481 4107 9 •  Hun very rare, only in spoken corpus, only in NL, only in unprepared speech (a-i)
  10. 10. Hem/’m – Hij/ie as subject •  ‘m rare, only in spoken corpus, only in Flanders, only in unprepared speech (a-h) Per million words WriQen Spoken hem/ ‘m 0 101 hij 2703 2686 ie 55 1919 10
  11. 11. References L. Augus,nus (2015): Complement Raising and Cluster Forma,on in Dutch: A treebank-supported inves,ga,on, PhD KU Leuven, Belgium. Odijk, J. (2015) 'Linguis,c Research with PaQU' Computa(onal Linguis(cs in The Netherlands journal 5, p. 3-14 [pdf] Odijk, J. (2016) ‘A Use Case for Linguis,c Research on Dutch with CLARIN’, in K. De Smedt (ed.), Selected Papers from the CLARIN Annual Conference 2015, 45-61. [Abstract and Fulltext] Odijk, J. (2015), 'Zoeken naar Construc,es', presenta,on and poster held at the DRONGO Language Fes,val, Utrecht, 26 September 2015. [presenta,on] [poster] Noord, G. van & J. Odijk (2016). `Goed of Fout: Wat gebruikt men feitelijk?’, presenta,on at the `Grote Taaldag' (TIN), Utrecht, 6 February 2016. [handout] [pptx] [pdf] Van Eynde, F, L. Augus,nus & V. Vandeghinste (2016).'Number agreement in copular construc,ons: A treebank-based inves,ga,on'.doi:10.1016/j.lingua.2016.02.001 to appear in Lingua. [URL]
  12. 12. Upgrade MIMORE Marc Kemps Snijders - Sjef Barbiers (Meertens) •  Morphosyntac,c research tool for three Dutch dialect databases (CLARIN) •  Integra,on into Nederlab: Interface - MTAS •  Integra,on of the SAND maps from SAND Volumes I and II •  Workspace with opera,ons on virtual collec,ons
  13. 13. Search for subject doubling in the Syntac7c Atlas of the Dutch Dialects (SAND)
  14. 14. Show the data underlying the map of subject doubling 2.singular Save as corpus
  15. 15. Search for a poten7ally correla7ng property in a different database (DIDDD): ar7cle-demonstra7ve sequences
  16. 16. POS tag specifica7on of Search D(art,def) followed by D(dem,def)
  17. 17. Result list with (a.o.). KWIC concordance and POS tags Save as corpus
  18. 18. Saved data sets in workspace
  19. 19. Geographic distribu7on of the two phenomena

×