DBpedia	
  as	
  Gaeilge	
  
Chapter	
  Mee*ng,	
  9th	
  January	
  2015	
  
@en LOD Cloud
DBpedia	
  
??????
@ga LOD Cloud
Creating DBpedia as Gaeilge Chapter
Wikipedia	
  
Infoboxes	
  
DBpedia	
  Mappings	
   DBpedia	
  Triples	
   Applica:ons	
  
DBpedia as Gaeilge
Chapter Workflow
Wikipedia	
  
Infoboxes	
  
DBpedia	
  Mappings	
   DBpedia	
  Triples	
   Applica:ons	
  
DBpedia as Gaeilge
Wikipedia Structure & Infoboxes
How Data is Structured in Wikipedia
SOURCE: http://stats.wikimedia.org/EN/SummaryGA.htm
SOURCE: http://ga.wikipedia.org/wiki/Terry_Pratchett
Terry Pratchett Vicipéid Page
SOURCE: http://ga.wikipedia.org/wiki/Terry_Pratchett
Infobox
(Bosca Sonraí Scríbhneoir)
Terry	
  Pratche?	
  
SOURCE: http://ga.wikipedia.org/wiki/Terry_Pratchett
Infobox
(Bosca Sonraí Scríbhneoir)
Vicipéid Infobox Template
Editing the Infobox: Visual UI
SOURCE: http://ga.wikipedia.org/wiki/Terry_Pratchett
From Wikipedia to DBpedia…
Wikipedia	
  
Infoboxes	
  
DBpedia	
  Mappings	
   DBpedia	
  Triples	
   Applica:ons	
  
DBpedia as Gaeilge
•  What are DBpedia mappings?
•  How to create those mappings?
•  What is the current status?
What is the DBpedia Ontology?
Terry Pratchett is an Artist of subclass …
Writer subclass has label:
@en: writer
@ga: scríbhneoir
@fr: écrivain
SOURCE: http://mappings.dbpedia.org/server/ontology/classes/
Ontology Labels & Comments
•  Current	
  ontology:	
  685	
  classes,	
  2795	
  proper:es	
  
•  Star:ng	
  point:	
  Label	
  &	
  comment	
  transla:ons	
  
for	
  the	
  classes?	
  
•  These	
  labels	
  and	
  comments	
  are	
  available	
  to	
  all	
  
DBpedia	
   queries,	
   not	
   just	
   those	
   from	
  
ga.dbpedia	
  
	
  
Mapping	
  Process
?
SOURCE: http://mappings.dbpedia.org/server/statistics/ga/
Wikipedia	
  
Infoboxes	
  
DBpedia	
  Mappings	
   DBpedia	
  Triples	
   Applica:ons	
  
DBpedia as Gaeilge
•  DBpedia Extraction Framework
•  SPARQL Endpoint
Extracting Triples
Extracted Terry Pratchett Triples
SOURCE: http://ga.wikipedia.org/wiki/Terry_Pratchett
SOURCE: http://ga.dbpedia.org/sparql
Raw Extractions
(not from Ontology Mappings)
SOURCE: http://ga.dbpedia.org/sparql
SPARQL Endpoint Interface
SOURCE: http://ga.dbpedia.org/sparql
Query Result
SPARQL	
  Endpoint	
  
SOURCE: http://ga.dbpedia.org/sparql
SPARQL	
  Endpoint	
  
SOURCE: http://ga.dbpedia.org/sparql
DBpedia as Gaeilge Use Cases
Wikipedia	
  
Infoboxes	
  
DBpedia	
  Mappings	
   DBpedia	
  Triples	
   Applica:ons	
  
DBpedia as Gaeilge
A Linked Data proof-of-concept
SOURCE: http://apps.dri.ie/locationLODer/
Video: http://www.bbc.com/news/entertainment-arts-25324655
Article: http://www.irishtimes.com/culture/books/david-butler-great-writers-enrich-experience-even-
the-mundane-1.2082759
Image: http://tvtropes.org/pmwiki/pmwiki.php/Creator/TerryPratchett
http://ga.dbpedia.org/resource/Terry_Pratchett
Archives
What can the Chapter do?
Wikipedia	
  
Infoboxes	
  
DBpedia	
  Mappings	
   DBpedia	
  Triples	
   Applica:ons	
  
DBpedia as Gaeilge
•  Adding	
  Infoboxes	
  to	
  
Vicipéid	
  
•  Transla:ng	
  DBpedia	
  
Ontology	
  (Labels	
  &	
  
Proper:es)	
  
•  Crea:ng	
  Wikipedia	
  
Infobox	
  to	
  Ontology	
  
Mappings
•  Create	
  
ga.dbpedia.org	
  
instance	
  (Insight)	
  
•  Crea:ng	
  
applica:ons	
  
What can we do?
Who can help?
•  Linked	
  Data	
  specialists	
  
•  Irish-­‐speaking	
  soware	
  developers	
  
•  Vicipéid	
   community,	
   students,	
   translators-­‐in-­‐
training	
  
•  Translators,	
  editors,	
  linguists,	
  cultural	
  scholars	
  
Cumas	
  Gaeilge	
  
Linked	
  Data	
  Knowledge	
  
Editors	
   Translators	
  
Cultural	
  Scholars	
   Linguists	
  
Translators	
  in	
  training	
   Students	
  
	
  	
  	
  	
  	
  Irish-­‐speaking	
  
Soware	
  Developers	
  
Data	
  Scien:sts	
  
Linked	
  Data	
  Academics	
  Groups	
  &	
  Possible	
  
Tasks	
  
Vicipéid	
  Editors	
  
Initial Work on DBpedia Gaeilge
ü Instance	
  ga.dbpedia	
  created	
  
ü Chapter	
  website	
  created	
  	
  
ü Experiments	
  with	
  automa:c	
  transla:ons	
  of	
  
ontology	
  class	
  labels	
  
 	
  
	
  	
  
	
  	
  
SMT
system
English	
  Labels	
   Irish	
  Labels	
   Comments	
  
canadian	
  football	
  
team	
  
foireann	
  sacair	
  cheanada	
  
Correct	
  gramma*cal	
  case,	
  football	
  translated	
  as	
  
soccer	
  
database	
   bunachar	
  sonraí	
   correct	
  
television	
  personality	
   pearsantacht	
  teiliIse	
   Correct	
  gramma*cal	
  case	
  for	
  2	
  nouns	
  together	
  
broadcast	
  network	
   craoladh	
  líonra	
  
Specific	
  term	
  required	
  here,	
  'líonra	
  craolacháin'	
  on	
  
focal.ie,	
  but	
  stáisiún	
  in	
  general	
  use.	
  En.	
  word	
  order.	
  
government	
  agency	
  
rialtas	
  an	
  
ghníomhaireacht	
  
English	
  word	
  order,	
  changes	
  meaning	
  en*rely,	
  
incorrect	
  case	
  used	
  
military	
  structure	
   struchtúr	
  míleata	
   Correct	
  word	
  order	
  but	
  ambiguous	
  descrip*on	
  
radio	
  program	
   raidió	
  ríomhchlár	
  
English	
  word	
  order,	
  domain	
  compu*ng	
  instead	
  of	
  
media	
  for	
  program	
  
English Irish
Next Steps & Discussion?
Contacts
http://ga.dbpedia.org
Bianca Pereira
bianca.pereira@insight-centre.org
Caoilfhionn Lane
caoilfhionn.lane@insight-centre.org

DBpedia as Gaeilge Chapter