Elena González-Blanco (egonzalezblanco@flog.uned.es)
Laboratorio de Innovación en Humanidades Digitales, UNED, Spain
Principal Investigator ERC project Poetry Standardization and Linked Open
Data - POSTDATA
Mara Mănăilescu (mmanailescu@linhd.uned.es)
Laboratorio de Innovación en Humanidades Digitales, UNED, Spain
Salvador Ros (sros@scc.uned.es)
Escuela Technica Superior de Ingineria Informática, UNED, Spain
From syllables, lines and stanzas to
linked open data: standardization, interoperability
and multilingual challenges for
digital humanities
Who we are
• LINHD – a pioneer Digital Humanities centre in
Spanish language, from 2014
• We are part of CenterNet (ADHO)
• Completely self-sustained
• Collaboration in diversity
• LINHD: a bridge between different fields of
knowledge
 Philologists
 Software Developers
 Natural Language Processing Experts
 Ontologists & LOD technologists
Our researchers
3
The DH paradigm applied to poetry
Traditional Research Digital Humanities
Counting syllables with fingers Automatic syllable counter
Comparing 2-3 poems at the
same time
Comparing thousands of poems
in parallel
Looking for similar evidences by
reading all papers and books
Finding multiple evidences
through one search engine
Individual research
Collaborative, interdisciplinary
research
Poetry Modelling. Why?
POSTDATA is structured in three
axes
• Philology
Abstract model of
philological shared
elements
• Technolog
yOntology for Poetry
as Linked Open Data
• Society
Platform for Poetry
storage and analysis
Historical developments
• Middle Ages and Renaissance - the powerful
influence of Latin applied to Romance languages
• The musical analysis system was followed as a
valid system
• The international context is richer, especially in
English, with two prominent schools: 1) A
traditional approach based on stress and classical
feet; and 2) A generative approach based on the
terminology and concepts shown through text
grids that take into account word boundaries
Philological standardization is
the starting point
How to get interoperability
A common data model for all
existing resources
Latin
poetry
Spanish
poetry
Galician
poetry
Italian
poetry
French
poetry
Occitan
poetry
Analysing Data Models
Relational Database Conceptual Model
XML Schema Model Conceptual Model
Analysing Data Models
Analysing Data Models
Perl Scritp structure Conceptual Model
How to overcome these diferences?
• Development of a Metadata Application Profile
for the European Poetry community
• LOD technology
POSTDATA platform is a virtual research
environment for researchers and other
users
Online
User-friendly
Different profiles
Virtual research
environment for
poetry scholarly
editing
Repository
Combination of
tools, which
specifically oriented
to poetry analysis
Lab
A social platform to
find new projects,
partners and ideas
Community
Manage textual editions
• A selection of text encoding options based on
TEI-XML tags to build DTDs or schemas
• A complex query system, like XQuery and
• A UX user interface to visualize and publish the
text edition in different formats with search
possibilities, using XSLT and other technologies
(such as Javascript, PHP and CSS)
User-friendly environment
• To design a publication environment suitable for
different types of users
• To train academic researchers and professional
in digital humanities through webinars, tutorials,
and learning guides
• To embed all the content available in this virtual
environment into a linked-data framework
User Interface
Develop an easy to use interface:
 LOD powerful search engine
 a digital edition platform
 a poetry lab to play with apps and tools
 a visualization layer: geolocalization,
timelines, graphs.
Conclusions
• Many Digital Humanities and culture-based projects
shown shared problems
• At the conceptualization level it is necessary to map and
translate philological concepts and make information
exchangeable
• At a technological level there is a need for
interoperability and standard solutions
• Interoperability by: using open data formats, APIs,
exchange languages and standardized metadata
egonzalezblanco@flog.uned.es
& LINHD Team
http://postdata.linhd.es/
http://linhd.uned.es
@linhduned

Linked open data: standardization, interoperability and multilingual challenges for digital humanities_TEEM 2016

  • 1.
    Elena González-Blanco (egonzalezblanco@flog.uned.es) Laboratoriode Innovación en Humanidades Digitales, UNED, Spain Principal Investigator ERC project Poetry Standardization and Linked Open Data - POSTDATA Mara Mănăilescu (mmanailescu@linhd.uned.es) Laboratorio de Innovación en Humanidades Digitales, UNED, Spain Salvador Ros (sros@scc.uned.es) Escuela Technica Superior de Ingineria Informática, UNED, Spain From syllables, lines and stanzas to linked open data: standardization, interoperability and multilingual challenges for digital humanities
  • 2.
    Who we are •LINHD – a pioneer Digital Humanities centre in Spanish language, from 2014 • We are part of CenterNet (ADHO) • Completely self-sustained • Collaboration in diversity
  • 3.
    • LINHD: abridge between different fields of knowledge  Philologists  Software Developers  Natural Language Processing Experts  Ontologists & LOD technologists Our researchers 3
  • 4.
    The DH paradigmapplied to poetry Traditional Research Digital Humanities Counting syllables with fingers Automatic syllable counter Comparing 2-3 poems at the same time Comparing thousands of poems in parallel Looking for similar evidences by reading all papers and books Finding multiple evidences through one search engine Individual research Collaborative, interdisciplinary research
  • 5.
  • 6.
    POSTDATA is structuredin three axes • Philology Abstract model of philological shared elements • Technolog yOntology for Poetry as Linked Open Data • Society Platform for Poetry storage and analysis
  • 7.
    Historical developments • MiddleAges and Renaissance - the powerful influence of Latin applied to Romance languages • The musical analysis system was followed as a valid system • The international context is richer, especially in English, with two prominent schools: 1) A traditional approach based on stress and classical feet; and 2) A generative approach based on the terminology and concepts shown through text grids that take into account word boundaries
  • 8.
  • 9.
    How to getinteroperability
  • 10.
    A common datamodel for all existing resources Latin poetry Spanish poetry Galician poetry Italian poetry French poetry Occitan poetry
  • 11.
    Analysing Data Models RelationalDatabase Conceptual Model
  • 12.
    XML Schema ModelConceptual Model Analysing Data Models
  • 13.
    Analysing Data Models PerlScritp structure Conceptual Model
  • 14.
    How to overcomethese diferences? • Development of a Metadata Application Profile for the European Poetry community • LOD technology
  • 16.
    POSTDATA platform isa virtual research environment for researchers and other users Online User-friendly Different profiles Virtual research environment for poetry scholarly editing Repository Combination of tools, which specifically oriented to poetry analysis Lab A social platform to find new projects, partners and ideas Community
  • 17.
    Manage textual editions •A selection of text encoding options based on TEI-XML tags to build DTDs or schemas • A complex query system, like XQuery and • A UX user interface to visualize and publish the text edition in different formats with search possibilities, using XSLT and other technologies (such as Javascript, PHP and CSS)
  • 18.
    User-friendly environment • Todesign a publication environment suitable for different types of users • To train academic researchers and professional in digital humanities through webinars, tutorials, and learning guides • To embed all the content available in this virtual environment into a linked-data framework
  • 19.
    User Interface Develop aneasy to use interface:  LOD powerful search engine  a digital edition platform  a poetry lab to play with apps and tools  a visualization layer: geolocalization, timelines, graphs.
  • 20.
    Conclusions • Many DigitalHumanities and culture-based projects shown shared problems • At the conceptualization level it is necessary to map and translate philological concepts and make information exchangeable • At a technological level there is a need for interoperability and standard solutions • Interoperability by: using open data formats, APIs, exchange languages and standardized metadata
  • 21.

Editor's Notes

  • #2 From syllables, lines and stanzas to linked open data: standardization, interoperability and multilingual challenges for digital humanities
  • #3 We received a first grant from the university but now we are self sustained by projects. Collaboration in diversity: a team that combine teams with technical and philological knowledge and multicultural: Spain, Portugal, Romania, Italia and collaborators from France
  • #5 The basic idea of POSTDATA_ transform the way of research in Philology. This is the vision of POSTDATA to transform the way we do research in humanities into a digital humanities ecosystem, where syllable--‐counting would be an automated process, thousands of poems could be compared at the same time, information could be found at a click through a search engine and research becomes more collaborative.
  • #6 POSTDATA project aims at shortening the digital gap among poetry and technology, looking for interoperability solutions and an interdisciplinary work with innovative results beyond the current state-of-the art. It is based on the application of semantic web technologies to link and publish poetic datasets in a structured way in the linked data cloud.
  • #8 1 - regardless of their different linguistic traits and verse structures which led to complex situations, such as the creation of conceptual genres that only exist in some traditions. End - The models described are just an example of the idiosyncrasy that can be observed in each literary tradition.
  • #9 Each literary work has a set of common characteristics or metadata (such as author, title, date of composition or language) that are shared by all of them. There is also a more limited set of properties, which are specific for poetry. The problem, however, is that each literary tradition has evolved in a particular way, creating different names and conceptualization systems for similar phenomena. so it is necessary at a first stage to standardize metadata and map vocabularies and terminology at the philological level. From the philological point of view, the main objective of POSTDATA proposal is to develop a conceptual standardization to build an abstract model for poetry representation based on existing philological concepts taken from projects, handbooks and corpora from the different traditions.
  • #10 As you can see in this timeline, which represents the evolution of IT and internet, the first poetic repertoires or indexes were written in paper at the end of the 19th century, but they were soon transformed into computer data when the PC era arrived. Some years later, databases became the best solution to classify and analyze poem data. It is difficult to link data in poetic repertoires as there is a lack of standardization in two levels: philological and technological. Semantic web has proven to be an effective technology as enables us to make information interoperable just by adding a semantic layer of data without altering the sources’ internal structure.
  • #11 Every digital repertoire or database has a model whose parts are identified with the different elements of the common designed ontology. The addition of a “semantic” layer of data to existing different databases and digital resources devoted to poetry boosts interoperability among them and lets scholars develop innovative comparative studies, which were not possible to carry out before. 21 repertoires
  • #12 During the process of reverse engineering we standardize, i.e. Call the same concepts by the same name (working together with the philologist) Try to call the same names to tables or properties as classes or terms that already exist
  • #15 By providing a MAP, the Poetic Community will be able to publish data in the Linked Open Data ecosystem and be interoperable with many other communities (e.g. Arts libraries, geographic resources and others to be established yet).
  • #16 Chronological approach: Phase 1) creation of an ontological comon model for repertoires and poetry databases, 2) creation of a digital scholarly edition environment for working with poetry texts, 3) creation of a poetry lab using the latest AI (artificial intelligence) and machine learing technologies in order to create apps and tools to analyze poetry, 4) creation of a common shared interface to recover all the information gathered inside the project as linked data and make it browsable and searchable with different ways of visualization.
  • #17 researchers with academic purposes who want to work on critical digital editions, non- experienced uses that want to read, share and learn more about poetic traditions and also companies who will use this resource for different application in fields like education, psychology, tourism or cultural purposes. integrating previously existing tools that have been developed by other research teams at previous projects, and develop new ones. The innovation lies in the application context for this combination of tools, which specifically oriented to poetry analysis. From collaboration great ideas will emerge, our purpose is to share our experience and passion with researchers all over the world.
  • #18 Intro - We will create a resource that lets users make their own choices following a three step process
  • #19 1. academic users, social users (users with not academic purposes, but literary interests), and professional users (like teachers, musicians, tourist operators, Apps developers, etc.). 3. to let users perform faceted searches with different levels of complexity, based on SPARQL queries, with multiple visualization possibilities
  • #20 The final product of POSTDATA project is to develop an easy to user interface, which contains: The visualization layer will be included at the end with different visualization possibilities
  • #21 These common problems are not just limited to poetry but as well other áreas of digital humanities so we hope the experience and results of this Project will inform others and at least provide them with some food for thought End - POSTDATA does not propose a new method for analyzing poetry, but an abstract model based on a working methodology supported by a double standardization system, both at philological and technological levels.