SPARQL – optimizationsfortablejoins All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants http://tinyurl.com/2uhuow9
DBpediaExtraction Framework isverymature (5 years, 15 developers)
Configurationover Code, Templates will allowWiktionariansto update Parsers
Creationofdatasets: Wortschatz Converted in 2009: Matthias Quasthoff, Sebastian Hellmann und Konrad Höffner: StandardizedMultilingual Language Resources forthe Web of Data: http://corpora.uni-leipzig.de/rdf 3rd prizeatthe LOD Triplification Challenge, Graz, 2009 What was missing?
Other datasetsto link to!
Wikipedia as a linkingpartner not suited
Wiktionary, Wortschatz, OLiAcanbecometheCrystallizationpointfor a LinguisticLinked Data Web Fourmajortypes:
Interlinking Wortschatz: Research andUse Case Iterated Co-occurencescanbedonewith SPARQL Wiktionaryand Wortschatz canbeloaded in the same database Interestingquestions:
Can webuildtoolsthathelpsWiktionaryeditors (Suggestions)?
Wiktionary links Words acrosslanguages. Are thereanysimilarpatterns?
Can wevalidatetheWiktionary RDF dumpwith Wortschatz?
Open Licences – Focus of LOD2 and OKFN http://ckan.net/ CKAN is an open registry of data and content packages. Harnessing the CKAN software, this site makes it easy to find, share and reuse content and data, especially in ways that are machine automatable. Working Group on Open Data in Linguistics http://wiki.okfn.org/wg/linguistics
Founded on Nov 2010
Membership open, pleasejoin
Standardized Formats: Part 1 – Corpora http://www.sfb632.uni-potsdam.de/~d1/paula/doc/ PAULA XML is the PotsdamerAustauschformatfürlinguistische Annotation ("Potsdam Interchange Format for Linguistic Annotation"). It is an XML-based standoff representation format, which has been designed to represent data with heterogeneous annotation layers produced by different tools. For visualization and querying of PAULA XML data, the database ANNIS can be used. Christian Chiarcosatwork: PAULA will become POWLA and will beusedforrepresentationofcorporaannotations.
Standardized Formats: Part 2 – the Web Bottomlayerofthe NLP2RDF stackcanbereused: An ontologytorepresent Strings (formerlythe SSO). In hislatestbook, Wikinomics, Don Tapscottexplainsdeepchanges in technology, demographicsandbusiness.
URIs torepresent Strings e.g. http://nlp2rdf.org/example/Don_Tapscott
Relation betweenStrings: previous, next, sub, super
http://nlp2rdf.org/example/Don isa subStringoftheabove