SlideShare a Scribd company logo
Sustainable operability:
Keeping complex
linguistic resources alive

Menzo Windhouwer
Alexis Dimitriadis
Sustainability of text resources
   All electronic resources are accessed with the mediation
    of appropriate software.
   Text editors, and web browsers, are generic: they can
    be used, with satisfactory results, with any document in a
    supported format.
   Documents are (more or less) portable across software
    that supports their storage format.

   Sustainability can be safeguarded by relying on
    documented, standard formats for their encoding.



                                               CLIN 20 - February 5, 2010
The problem with databases
   Databases are not portable in the sense that text
    documents are:
   The data and relational structure of databases can be
    stored in (semi-)standard SQL format, or exported to
    other formats.
   But databases are typically accessed through a custom-
    made user interface. Preserving the data, therefore,
    does not preserve the complete resource.
   In this talk, we focus on (typological) databases.




                                               CLIN 20 - February 5, 2010
Operability of complex resources
   The general problem: Complex resources depend on
    custom software. Without the software, the resource is
    not usable and hence not truly preserved.
   We will call a resource operable if suitable access or
    management software (operating software) exists for it.
   While all electronic resources depend on software for
    their operability, complex resources are particularly
    vulnerable because they lack an economy of scale.




                                              CLIN 20 - February 5, 2010
Outline

   The problem of sustainable operability

   Sustainable operability of typological databases

   The IDDF architecture

   The Typological Database System

   The TDS Curator project




                                               CLIN 20 - February 5, 2010
Next

   The problem of sustainable operability

   Sustainable operability of typological databases

   The Typological Database System

   The IDDF architecture

   The TDS Curator project




                                               CLIN 20 - February 5, 2010
Typological databases
   Contain high-level, summary information about selected
    phenomena in a large number of languages.
   May include example sentences with interlinear gloss
    annotations.
   Are implemented on a variety of software platforms
    (Filemaker, MS Access, MySQL, 4th Dimension, Excel
    spreadsheets, custom software), and may or may not
    have a web interface.




                                             CLIN 20 - February 5, 2010
Databases are diverse:
   Original schema snippet from TDN database
     Field   Values        Metadata
     V105    0, 1, 9, 99   ATTRIBUTIVE ADJECTIVES ARE RELATIVE CLAUSES
     V168    0, 1, 9, 99   PRED LOC = ZERO + LOC PP
     V204    0, 1, 9, 99   PRED ADJ = COP VS. PRED LOC = VERB (NONCOP)


   Phoneme inventory (SPIN database)




                                                                 CLIN 20 - February 5, 2010
Databases are diverse:
   User interface snippet for the Anatyp database




                                                     CLIN 20 - February 5, 2010
Typological databases – their fate
Completed databases are subject to the usual perils:

   Gradual obsolescence of db software, OS, or hardware.
   Sudden disappearance due to incompatible software
    updates, retirement of legacy servers, or hardware
    failure.
   Gradual fall into unusability, with the dissipation of the
    insider knowledge needed to utilize a poorly documented
    database.



                                               CLIN 20 - February 5, 2010
A data dump is insufficient
   Why not just export a database’s tables in some
    standard format (tab-separated Unicode text, or even a
    dump in “standard” SQL)?

   This would still be deficient in
    1. Completeness of content and documentation
    2. Operability




                                             CLIN 20 - February 5, 2010
Completeness
   The meaning of table contents, and their
    interrelationships, are not explicitly given in the data
    tables.
      Completeness example: TDN database
   DocumentationMetadata normally stored in the relational
      Field Values   is not
    schema 1, 9, 99 ATTRIBUTIVE ADJECTIVES ARE RELATIVE CLAUSES
      V105 0,

     V168 0, 1, 9, 99 PRED LOC = ZERO + LOC PP
      Often, documentation is embedded in the     user interface. Some is only in
       V204heads9, 99 creators. = COP VS. PRED LOC = VERB (NONCOP)
       the 0, 1, of its PRED ADJ

   In the OAIS terminology, data tables alone are rarely
    “independently understandable”.




                                                             CLIN 20 - February 5, 2010
Operability
       Presentation example: Anatyp database
   The presentation (user interface) of a database
    suggests how its content is to be interpreted.
   The navigation structure and application logic also
    encode information about the data.
     While there are general-purpose browsers for relational database
      tables, they are a very poor way to view a complex database:
     They allow viewing one table at a time, but do not automatically
      select, project or join tables appropriately into views.
     Even with foreign key information, there is no way to determine which
      joins or projections are “appropriate”.




                                                           CLIN 20 - February 5, 2010
Our approach to sustainable operability
1.   Map resources to a sufficiently rich format at time of
     archiving.
2.   Maintain generic software that can provide browsing
     and query access to all archived resources in an
     application domain.




                                               CLIN 20 - February 5, 2010
Next

   The problem of sustainable operability

   Sustainable operability of typological databases

   The IDDF architecture

   The Typological Database System

   The TDS Curator project




                                               CLIN 20 - February 5, 2010
The Integrated Data and Documentation Format

   Data, structuring information and documentation are
    combined into an integrated, XML-based standardized
    format, the Integrated Data and Document Format
    (IDDF).
   Software is provided that can manage IDDF-encoded
    resources in a generic way, just as a text editor or
    corpus tool can manage arbitrary conforming resources.
   New generations of management software can be
    provided in the future, utilizing the self-describing nature
    of the IDDF and an economy of scale.


                                                  CLIN 20 - February 5, 2010
IDDF structure
   Two major sections:
    1.   Metadata section:
             provides the (loose) data schema
             documents the elements in the schema
    2.   Data section:
             contains the actual data

   Hierarchical, semi-structured data model
   Network of hierarchical units, a.k.a. semantic contexts




                                                     CLIN 20 - February 5, 2010
IDDF: metadata
   For each data element:
     A label and a description
     One or more links
        to other elements

        to external resources, e.g., a knowledge base

     Data types:
        A semantic data type for the element, e.g. UPPC

        A semantic (key) value data type, e.g. interlinear glossed text tier

     An (partial) enumeration of possible values:
        The literal (key) value

        A label and a description

        One or more links

             to other elements

             to external resources


                                                             CLIN 20 - February 5, 2010
IDDF: system architecture




                              User




                                                         Tool
       renderer   User interface




                                                                generic
                     Application Programming Interface




                                            Knowledge
                         IDDF                 base




                   Resource creation



                   domain specific

                                                                   CLIN 20 - February 5, 2010
Next

   The problem of sustainability operability

   Sustainable operability of typological databases

   The IDDF architecture

   The Typological Database System

   The TDS Curator project




                                                CLIN 20 - February 5, 2010
The Typological Database System
   The Typological Database System (TDS) provides
    integrated access to multiple, independently created
    typological databases.
     Provide an interface that will help users find relevant data.
     Allow users to interpret the data they are presented with.

   The system behaves, as much as possible, as a single
    database.
    Various differences between the component databases
    must be dealt with.




                                                             CLIN 20 - February 5, 2010
TDS: system architecture
              1060 & BB
    online

                                       Navigating and searching

                                       Querying                   Reasoning




                                                                                         TDS Workbench
                                                   TDS data
                                                                           Global
                                                                          domain
                                                                          ontology
                                       Enriching

                                       Merging                             Local
                          DTL engine




                                                                            DTL
                                       Transforming                     specifications
    offline




                                       Importing

                                                                        meta data

                                          component databases




                                                                                                         CLIN 20 - February 5, 2010
Next

   The problem of sustainable operability

   Sustainable operability of typological databases

   The IDDF architecture

   The Typological Database System

   The TDS Curator project




                                               CLIN 20 - February 5, 2010
The TDS-Curator project
   CLARIN-NL Call 1 project
   “TDS Curator will make the TDS into a sustainable service that conforms to
    CLARIN infrastructural requirements.”
   Partners:
      Utrecht University
      DANS
      Max-Planck-Institute for Psycholinguistics
   May – November, 2010




                                                            CLIN 20 - February 5, 2010
Work packages




                             User




                                                        Tool
      renderer   User interface




                                                               generic
                    Application Programming Interface




                                           Knowledge
                        IDDF                 base




                  Resource creation
                    TDS import



                  domain specific

                                                                  CLIN 20 - February 5, 2010
IDDF-based sustainable operability




                                      languagelink.let.uu.nl   DANS
Figure thanks to Dirk Roorda (DANS)                                   CLIN 20 - February 5, 2010
Summary
   Sustainable operability is a challenge for complex resources like
    typological databases
   A corner stone of a solution is a generic format that is rich enough to
    allow operability by generic tools
   The Typological Database System provides a promising
    architecture, which has been applied to more then a dozen of
    typological databases
   The propriety data model of the TDS can be turned into an open
    format, i.e. the Integrated Data and Documentation Format
   In the CLARIN-NL TDS Curator project this more generic setup will
    be realized

   It will be interesting to also use this generic system outside the
    domain of linguistic typology or even linguistics
                                                          CLIN 20 - February 5, 2010
http://languagelink.let.uu.nl/tds/




       Thanks for your attention!




                                 CLIN 20 - February 5, 2010
IDDF: top-level structure
<iddf:warehouse xmlns:iddf="http://.../ns/iddf">
  <iddf:meta>
       <iddf:scope id="tds" type="warehouse">
             ...
      </iddf:scope>
      <iddf:notion id="n1" name="language“ scope="tds“
                           type="root" key-datatype="enum">
             <iddf:label>Language</iddf:label>
             <iddf:description>
                    One of the world’s languages
             </iddf:description>
             ...
      </iddf:notion>
      ...
  </iddf:meta>
  <iddf:data xmlns:tds="..." ...>
      <tds:language iddf:notion="n1" key="...">
             ...
      </tds:language>
      ...

  </iddf:data>
                                            CLIN 20 - February 5, 2010
</iddf:warehouse>
IDDF: metadata example
<iddf:notion id="n7" name="vowel" scope="SyllTyp">
   <iddf:label>Vowel</iddf:label>
   <iddf:description>
       Is the segment a vowel?
   </iddf:description>
   <iddf:link type="concept" rel="as“ href="...owl#vowel"/>
   <iddf:link type="concept" rel="to“
                          href="...owl#vocalicFeatureNode"/>
   <iddf:values datatype="enum">
       <iddf:value>
              <iddf:literal>+</iddf:literal>
              <iddf:description>
                     The segment is a vowel.
              </iddf:description>
       </iddf:value>
       …
   </iddf:values>
</iddf:notion>
                                             CLIN 20 - February 5, 2010
IDDF: data example
<iddf:data xmlns:tds="…/ns/iddf/tds" … >
   <tds:language key="l-iso-tba"
             iddf:notion="n1" iddf:sources="SyllTyp UPSID">
       <tds:identification
             iddf:notion="n2" iddf:sources="SyllTyp UPSID">
              <tds:name
             iddf:notion="n3" iddf:sources="SyllTyp UPSID">
                     <iddf:value srcs="SyllTyp">
                            Wari’ (Tubar&#227;o)
                     </iddf:value>
                     <iddf:value srcs="UPSID">
                            Huari
                     </iddf:value>
              </tds:name>
               …
      </tds:identification>
      …
  </tds:language>
  …
</iddf:data>                                 CLIN 20 - February 5, 2010
CLIN 20 - February 5, 2010

More Related Content

What's hot

Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)sones GmbH
 
Syllabus for screening test 10+2 lecturer in computer sciences..
Syllabus for screening test 10+2 lecturer in computer sciences..Syllabus for screening test 10+2 lecturer in computer sciences..
Syllabus for screening test 10+2 lecturer in computer sciences..Ashish Sharma
 
From Provider to Portal - a chain of interoperability
From Provider to Portal - a chain of interoperabilityFrom Provider to Portal - a chain of interoperability
From Provider to Portal - a chain of interoperabilityAndy Powell
 
Designing and developing vocabularies in RDF
Designing and developing vocabularies in RDFDesigning and developing vocabularies in RDF
Designing and developing vocabularies in RDFOpen Data Support
 
Data Management Planning at the DCC: a human factor
Data Management Planning at the DCC: a human factorData Management Planning at the DCC: a human factor
Data Management Planning at the DCC: a human factorMartin Donnelly
 
ORDBMS Comparative Report
ORDBMS Comparative ReportORDBMS Comparative Report
ORDBMS Comparative Reporterawat
 
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...European Data Forum
 

What's hot (8)

Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)
 
Syllabus for screening test 10+2 lecturer in computer sciences..
Syllabus for screening test 10+2 lecturer in computer sciences..Syllabus for screening test 10+2 lecturer in computer sciences..
Syllabus for screening test 10+2 lecturer in computer sciences..
 
From Provider to Portal - a chain of interoperability
From Provider to Portal - a chain of interoperabilityFrom Provider to Portal - a chain of interoperability
From Provider to Portal - a chain of interoperability
 
Database Concepts
Database ConceptsDatabase Concepts
Database Concepts
 
Designing and developing vocabularies in RDF
Designing and developing vocabularies in RDFDesigning and developing vocabularies in RDF
Designing and developing vocabularies in RDF
 
Data Management Planning at the DCC: a human factor
Data Management Planning at the DCC: a human factorData Management Planning at the DCC: a human factor
Data Management Planning at the DCC: a human factor
 
ORDBMS Comparative Report
ORDBMS Comparative ReportORDBMS Comparative Report
ORDBMS Comparative Report
 
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
 

Viewers also liked

LDL 2012 - Linking to ISOcat Data Categories
LDL 2012 - Linking to ISOcat Data CategoriesLDL 2012 - Linking to ISOcat Data Categories
LDL 2012 - Linking to ISOcat Data CategoriesMenzo Windhouwer
 
What do cats have to do with explicit semantics?
What do cats have to do with explicit semantics?What do cats have to do with explicit semantics?
What do cats have to do with explicit semantics?Menzo Windhouwer
 
Microsoft: Software + Servicios
Microsoft: Software + ServiciosMicrosoft: Software + Servicios
Microsoft: Software + Serviciosacabralinfogroup
 
Za co jesteś wdzięczny?
Za co jesteś wdzięczny?Za co jesteś wdzięczny?
Za co jesteś wdzięczny?Gazeta Pomorska
 
Vip10750行程报价单诺维奇出发3天2晚0802010
Vip10750行程报价单诺维奇出发3天2晚0802010Vip10750行程报价单诺维奇出发3天2晚0802010
Vip10750行程报价单诺维奇出发3天2晚0802010guest55ae8d4
 
On the way to a Relation Registry for ISOcat data categories
On the way to a Relation Registry for ISOcat data categoriesOn the way to a Relation Registry for ISOcat data categories
On the way to a Relation Registry for ISOcat data categoriesMenzo Windhouwer
 
A CMD Core Model for CLARIN Web Services
A CMD Core Model for CLARIN Web ServicesA CMD Core Model for CLARIN Web Services
A CMD Core Model for CLARIN Web ServicesMenzo Windhouwer
 
Plains Indian Teepee
Plains Indian TeepeePlains Indian Teepee
Plains Indian Teepeeguest59aa42b
 
Test dla przedszkolaków
Test dla przedszkolakówTest dla przedszkolaków
Test dla przedszkolakówGazeta Pomorska
 
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...Menzo Windhouwer
 

Viewers also liked (18)

LDL 2012 - Linking to ISOcat Data Categories
LDL 2012 - Linking to ISOcat Data CategoriesLDL 2012 - Linking to ISOcat Data Categories
LDL 2012 - Linking to ISOcat Data Categories
 
What do cats have to do with explicit semantics?
What do cats have to do with explicit semantics?What do cats have to do with explicit semantics?
What do cats have to do with explicit semantics?
 
The ISO-DCR
The ISO-DCRThe ISO-DCR
The ISO-DCR
 
Królowie szos
Królowie szosKrólowie szos
Królowie szos
 
Narzekasz?
Narzekasz?Narzekasz?
Narzekasz?
 
Microsoft: Software + Servicios
Microsoft: Software + ServiciosMicrosoft: Software + Servicios
Microsoft: Software + Servicios
 
Za co jesteś wdzięczny?
Za co jesteś wdzięczny?Za co jesteś wdzięczny?
Za co jesteś wdzięczny?
 
Use of ISOcat within CMDI
Use of ISOcat within CMDIUse of ISOcat within CMDI
Use of ISOcat within CMDI
 
Definicje Alternatywne
Definicje AlternatywneDefinicje Alternatywne
Definicje Alternatywne
 
Vip10750行程报价单诺维奇出发3天2晚0802010
Vip10750行程报价单诺维奇出发3天2晚0802010Vip10750行程报价单诺维奇出发3天2晚0802010
Vip10750行程报价单诺维奇出发3天2晚0802010
 
On the way to a Relation Registry for ISOcat data categories
On the way to a Relation Registry for ISOcat data categoriesOn the way to a Relation Registry for ISOcat data categories
On the way to a Relation Registry for ISOcat data categories
 
Znani i popularni
Znani i popularniZnani i popularni
Znani i popularni
 
A CMD Core Model for CLARIN Web Services
A CMD Core Model for CLARIN Web ServicesA CMD Core Model for CLARIN Web Services
A CMD Core Model for CLARIN Web Services
 
ISOcat to LMF to TEI
ISOcat to LMF to TEIISOcat to LMF to TEI
ISOcat to LMF to TEI
 
Plains Indian Teepee
Plains Indian TeepeePlains Indian Teepee
Plains Indian Teepee
 
Nie Tylko Dla Kobiet
Nie Tylko Dla KobietNie Tylko Dla Kobiet
Nie Tylko Dla Kobiet
 
Test dla przedszkolaków
Test dla przedszkolakówTest dla przedszkolaków
Test dla przedszkolaków
 
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
 

Similar to Sustainable operability: Keeping complex linguistic resources alive.

Soeren okfn greece meetup
Soeren okfn greece meetupSoeren okfn greece meetup
Soeren okfn greece meetupOKFN-GR
 
Presentation on BCS Database Products January 2011
Presentation on BCS Database Products   January 2011Presentation on BCS Database Products   January 2011
Presentation on BCS Database Products January 2011Ian Barrodale
 
xldb2012_wed_0950_TimFrazier
xldb2012_wed_0950_TimFrazierxldb2012_wed_0950_TimFrazier
xldb2012_wed_0950_TimFrazierTim Frazier
 
Open Data Conference - Sören Auer - Linked Open Data
Open Data Conference - Sören Auer - Linked Open DataOpen Data Conference - Sören Auer - Linked Open Data
Open Data Conference - Sören Auer - Linked Open DataOpening-up.eu
 
KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)
KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)
KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)AI4BD GmbH
 
dsnotify presentation at www2010
dsnotify presentation at www2010 dsnotify presentation at www2010
dsnotify presentation at www2010 Niko Popitsch
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 
Survey of Object Oriented Database
Survey of Object Oriented DatabaseSurvey of Object Oriented Database
Survey of Object Oriented DatabaseEditor IJMTER
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefRobert Grossman
 
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)EUDAT
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data ApplicationsEUCLID project
 

Similar to Sustainable operability: Keeping complex linguistic resources alive. (20)

LOD2: State of Play WP1: Requirements, Design & LOD2 Stack Prototype
LOD2: State of Play WP1: Requirements, Design & LOD2 Stack PrototypeLOD2: State of Play WP1: Requirements, Design & LOD2 Stack Prototype
LOD2: State of Play WP1: Requirements, Design & LOD2 Stack Prototype
 
Soeren okfn greece meetup
Soeren okfn greece meetupSoeren okfn greece meetup
Soeren okfn greece meetup
 
LOD2 - Creating Knowledge out of Interlinked Data - General Presentation
LOD2 - Creating Knowledge out of Interlinked Data - General PresentationLOD2 - Creating Knowledge out of Interlinked Data - General Presentation
LOD2 - Creating Knowledge out of Interlinked Data - General Presentation
 
Database Management System 1
Database Management System 1Database Management System 1
Database Management System 1
 
Presentation on BCS Database Products January 2011
Presentation on BCS Database Products   January 2011Presentation on BCS Database Products   January 2011
Presentation on BCS Database Products January 2011
 
Project seminar
Project seminarProject seminar
Project seminar
 
27 fcs157al1 (1)
27 fcs157al1 (1)27 fcs157al1 (1)
27 fcs157al1 (1)
 
xldb2012_wed_0950_TimFrazier
xldb2012_wed_0950_TimFrazierxldb2012_wed_0950_TimFrazier
xldb2012_wed_0950_TimFrazier
 
The necessity of metadata for linked open data and its contribution to policy...
The necessity of metadata for linked open data and its contribution to policy...The necessity of metadata for linked open data and its contribution to policy...
The necessity of metadata for linked open data and its contribution to policy...
 
Open Data Conference - Sören Auer - Linked Open Data
Open Data Conference - Sören Auer - Linked Open DataOpen Data Conference - Sören Auer - Linked Open Data
Open Data Conference - Sören Auer - Linked Open Data
 
LOD2 General Presentation 2012
LOD2 General Presentation 2012LOD2 General Presentation 2012
LOD2 General Presentation 2012
 
KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)
KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)
KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)
 
dsnotify presentation at www2010
dsnotify presentation at www2010 dsnotify presentation at www2010
dsnotify presentation at www2010
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Survey of Object Oriented Database
Survey of Object Oriented DatabaseSurvey of Object Oriented Database
Survey of Object Oriented Database
 
LOD2 Webinar Series: LIMES
LOD2 Webinar Series: LIMESLOD2 Webinar Series: LIMES
LOD2 Webinar Series: LIMES
 
Limes webinar
Limes webinarLimes webinar
Limes webinar
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster Relief
 
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 

Recently uploaded

Event Report - IBM Think 2024 - It is all about AI and hybrid
Event Report - IBM Think 2024 - It is all about AI and hybridEvent Report - IBM Think 2024 - It is all about AI and hybrid
Event Report - IBM Think 2024 - It is all about AI and hybridHolger Mueller
 
Pitch Deck Teardown: Terra One's $7.5m Seed deck
Pitch Deck Teardown: Terra One's $7.5m Seed deckPitch Deck Teardown: Terra One's $7.5m Seed deck
Pitch Deck Teardown: Terra One's $7.5m Seed deckHajeJanKamps
 
Raising Seed Capital by Steve Schlafman at RRE Ventures
Raising Seed Capital by Steve Schlafman at RRE VenturesRaising Seed Capital by Steve Schlafman at RRE Ventures
Raising Seed Capital by Steve Schlafman at RRE VenturesAlejandro Cremades
 
A Brief Introduction About Jacob Badgett
A Brief Introduction About Jacob BadgettA Brief Introduction About Jacob Badgett
A Brief Introduction About Jacob BadgettJacobBadgett
 
Unveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptx
Unveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptxUnveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptx
Unveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptxmy Pandit
 
Copyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to KnowCopyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to KnowMiriam Robeson
 
Falcon Invoice Discounting Setup for Small Businesses
Falcon Invoice Discounting Setup for Small BusinessesFalcon Invoice Discounting Setup for Small Businesses
Falcon Invoice Discounting Setup for Small BusinessesFalcon investment
 
Revolutionizing Industries: The Power of Carbon Components
Revolutionizing Industries: The Power of Carbon ComponentsRevolutionizing Industries: The Power of Carbon Components
Revolutionizing Industries: The Power of Carbon ComponentsConnova AG
 
Innomantra Viewpoint - Building Moonshots : May-Jun 2024.pdf
Innomantra Viewpoint - Building Moonshots : May-Jun 2024.pdfInnomantra Viewpoint - Building Moonshots : May-Jun 2024.pdf
Innomantra Viewpoint - Building Moonshots : May-Jun 2024.pdfInnomantra
 
How Do Venture Capitalists Make Decisions?
How Do Venture Capitalists Make Decisions?How Do Venture Capitalists Make Decisions?
How Do Venture Capitalists Make Decisions?Alejandro Cremades
 
Cracking the Change Management Code Main New.pptx
Cracking the Change Management Code Main New.pptxCracking the Change Management Code Main New.pptx
Cracking the Change Management Code Main New.pptxWorkforce Group
 
FEXLE- Salesforce Field Service Lightning
FEXLE- Salesforce Field Service LightningFEXLE- Salesforce Field Service Lightning
FEXLE- Salesforce Field Service LightningFEXLE
 
The Ultimate Guide to IPTV App Development Process_ Step-By-Step Instructions
The Ultimate Guide to IPTV App Development Process_ Step-By-Step InstructionsThe Ultimate Guide to IPTV App Development Process_ Step-By-Step Instructions
The Ultimate Guide to IPTV App Development Process_ Step-By-Step InstructionsWHMCS Smarters
 
Hyundai capital 2024 1q Earnings release
Hyundai capital 2024 1q Earnings releaseHyundai capital 2024 1q Earnings release
Hyundai capital 2024 1q Earnings releaseirhcs
 
The Inspiring Personality To Watch In 2024.pdf
The Inspiring Personality To Watch In 2024.pdfThe Inspiring Personality To Watch In 2024.pdf
The Inspiring Personality To Watch In 2024.pdfinsightssuccess2
 
HR and Employment law update: May 2024.
HR and Employment law update:  May 2024.HR and Employment law update:  May 2024.
HR and Employment law update: May 2024.FelixPerez547899
 
Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024Equinox Gold Corp.
 
12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdf
12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdf12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdf
12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdfSOFTTECHHUB
 
Global Interconnection Group Joint Venture[960] (1).pdf
Global Interconnection Group Joint Venture[960] (1).pdfGlobal Interconnection Group Joint Venture[960] (1).pdf
Global Interconnection Group Joint Venture[960] (1).pdfHenry Tapper
 
Stages of Startup Funding - An Explainer
Stages of Startup Funding - An ExplainerStages of Startup Funding - An Explainer
Stages of Startup Funding - An ExplainerAlejandro Cremades
 

Recently uploaded (20)

Event Report - IBM Think 2024 - It is all about AI and hybrid
Event Report - IBM Think 2024 - It is all about AI and hybridEvent Report - IBM Think 2024 - It is all about AI and hybrid
Event Report - IBM Think 2024 - It is all about AI and hybrid
 
Pitch Deck Teardown: Terra One's $7.5m Seed deck
Pitch Deck Teardown: Terra One's $7.5m Seed deckPitch Deck Teardown: Terra One's $7.5m Seed deck
Pitch Deck Teardown: Terra One's $7.5m Seed deck
 
Raising Seed Capital by Steve Schlafman at RRE Ventures
Raising Seed Capital by Steve Schlafman at RRE VenturesRaising Seed Capital by Steve Schlafman at RRE Ventures
Raising Seed Capital by Steve Schlafman at RRE Ventures
 
A Brief Introduction About Jacob Badgett
A Brief Introduction About Jacob BadgettA Brief Introduction About Jacob Badgett
A Brief Introduction About Jacob Badgett
 
Unveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptx
Unveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptxUnveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptx
Unveiling the Dynamic Gemini_ Personality Traits and Sign Dates.pptx
 
Copyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to KnowCopyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to Know
 
Falcon Invoice Discounting Setup for Small Businesses
Falcon Invoice Discounting Setup for Small BusinessesFalcon Invoice Discounting Setup for Small Businesses
Falcon Invoice Discounting Setup for Small Businesses
 
Revolutionizing Industries: The Power of Carbon Components
Revolutionizing Industries: The Power of Carbon ComponentsRevolutionizing Industries: The Power of Carbon Components
Revolutionizing Industries: The Power of Carbon Components
 
Innomantra Viewpoint - Building Moonshots : May-Jun 2024.pdf
Innomantra Viewpoint - Building Moonshots : May-Jun 2024.pdfInnomantra Viewpoint - Building Moonshots : May-Jun 2024.pdf
Innomantra Viewpoint - Building Moonshots : May-Jun 2024.pdf
 
How Do Venture Capitalists Make Decisions?
How Do Venture Capitalists Make Decisions?How Do Venture Capitalists Make Decisions?
How Do Venture Capitalists Make Decisions?
 
Cracking the Change Management Code Main New.pptx
Cracking the Change Management Code Main New.pptxCracking the Change Management Code Main New.pptx
Cracking the Change Management Code Main New.pptx
 
FEXLE- Salesforce Field Service Lightning
FEXLE- Salesforce Field Service LightningFEXLE- Salesforce Field Service Lightning
FEXLE- Salesforce Field Service Lightning
 
The Ultimate Guide to IPTV App Development Process_ Step-By-Step Instructions
The Ultimate Guide to IPTV App Development Process_ Step-By-Step InstructionsThe Ultimate Guide to IPTV App Development Process_ Step-By-Step Instructions
The Ultimate Guide to IPTV App Development Process_ Step-By-Step Instructions
 
Hyundai capital 2024 1q Earnings release
Hyundai capital 2024 1q Earnings releaseHyundai capital 2024 1q Earnings release
Hyundai capital 2024 1q Earnings release
 
The Inspiring Personality To Watch In 2024.pdf
The Inspiring Personality To Watch In 2024.pdfThe Inspiring Personality To Watch In 2024.pdf
The Inspiring Personality To Watch In 2024.pdf
 
HR and Employment law update: May 2024.
HR and Employment law update:  May 2024.HR and Employment law update:  May 2024.
HR and Employment law update: May 2024.
 
Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024Equinox Gold Corporate Deck May 24th 2024
Equinox Gold Corporate Deck May 24th 2024
 
12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdf
12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdf12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdf
12 Conversion Rate Optimization Strategies for Ecommerce Websites.pdf
 
Global Interconnection Group Joint Venture[960] (1).pdf
Global Interconnection Group Joint Venture[960] (1).pdfGlobal Interconnection Group Joint Venture[960] (1).pdf
Global Interconnection Group Joint Venture[960] (1).pdf
 
Stages of Startup Funding - An Explainer
Stages of Startup Funding - An ExplainerStages of Startup Funding - An Explainer
Stages of Startup Funding - An Explainer
 

Sustainable operability: Keeping complex linguistic resources alive.

  • 1. Sustainable operability: Keeping complex linguistic resources alive Menzo Windhouwer Alexis Dimitriadis
  • 2. Sustainability of text resources  All electronic resources are accessed with the mediation of appropriate software.  Text editors, and web browsers, are generic: they can be used, with satisfactory results, with any document in a supported format.  Documents are (more or less) portable across software that supports their storage format.  Sustainability can be safeguarded by relying on documented, standard formats for their encoding. CLIN 20 - February 5, 2010
  • 3. The problem with databases  Databases are not portable in the sense that text documents are:  The data and relational structure of databases can be stored in (semi-)standard SQL format, or exported to other formats.  But databases are typically accessed through a custom- made user interface. Preserving the data, therefore, does not preserve the complete resource.  In this talk, we focus on (typological) databases. CLIN 20 - February 5, 2010
  • 4. Operability of complex resources  The general problem: Complex resources depend on custom software. Without the software, the resource is not usable and hence not truly preserved.  We will call a resource operable if suitable access or management software (operating software) exists for it.  While all electronic resources depend on software for their operability, complex resources are particularly vulnerable because they lack an economy of scale. CLIN 20 - February 5, 2010
  • 5. Outline  The problem of sustainable operability  Sustainable operability of typological databases  The IDDF architecture  The Typological Database System  The TDS Curator project CLIN 20 - February 5, 2010
  • 6. Next  The problem of sustainable operability  Sustainable operability of typological databases  The Typological Database System  The IDDF architecture  The TDS Curator project CLIN 20 - February 5, 2010
  • 7. Typological databases  Contain high-level, summary information about selected phenomena in a large number of languages.  May include example sentences with interlinear gloss annotations.  Are implemented on a variety of software platforms (Filemaker, MS Access, MySQL, 4th Dimension, Excel spreadsheets, custom software), and may or may not have a web interface. CLIN 20 - February 5, 2010
  • 8. Databases are diverse:  Original schema snippet from TDN database Field Values Metadata V105 0, 1, 9, 99 ATTRIBUTIVE ADJECTIVES ARE RELATIVE CLAUSES V168 0, 1, 9, 99 PRED LOC = ZERO + LOC PP V204 0, 1, 9, 99 PRED ADJ = COP VS. PRED LOC = VERB (NONCOP)  Phoneme inventory (SPIN database) CLIN 20 - February 5, 2010
  • 9. Databases are diverse:  User interface snippet for the Anatyp database CLIN 20 - February 5, 2010
  • 10. Typological databases – their fate Completed databases are subject to the usual perils:  Gradual obsolescence of db software, OS, or hardware.  Sudden disappearance due to incompatible software updates, retirement of legacy servers, or hardware failure.  Gradual fall into unusability, with the dissipation of the insider knowledge needed to utilize a poorly documented database. CLIN 20 - February 5, 2010
  • 11. A data dump is insufficient  Why not just export a database’s tables in some standard format (tab-separated Unicode text, or even a dump in “standard” SQL)?  This would still be deficient in 1. Completeness of content and documentation 2. Operability CLIN 20 - February 5, 2010
  • 12. Completeness  The meaning of table contents, and their interrelationships, are not explicitly given in the data tables. Completeness example: TDN database  DocumentationMetadata normally stored in the relational Field Values is not schema 1, 9, 99 ATTRIBUTIVE ADJECTIVES ARE RELATIVE CLAUSES V105 0,  V168 0, 1, 9, 99 PRED LOC = ZERO + LOC PP Often, documentation is embedded in the user interface. Some is only in V204heads9, 99 creators. = COP VS. PRED LOC = VERB (NONCOP) the 0, 1, of its PRED ADJ  In the OAIS terminology, data tables alone are rarely “independently understandable”. CLIN 20 - February 5, 2010
  • 13. Operability Presentation example: Anatyp database  The presentation (user interface) of a database suggests how its content is to be interpreted.  The navigation structure and application logic also encode information about the data.  While there are general-purpose browsers for relational database tables, they are a very poor way to view a complex database:  They allow viewing one table at a time, but do not automatically select, project or join tables appropriately into views.  Even with foreign key information, there is no way to determine which joins or projections are “appropriate”. CLIN 20 - February 5, 2010
  • 14. Our approach to sustainable operability 1. Map resources to a sufficiently rich format at time of archiving. 2. Maintain generic software that can provide browsing and query access to all archived resources in an application domain. CLIN 20 - February 5, 2010
  • 15. Next  The problem of sustainable operability  Sustainable operability of typological databases  The IDDF architecture  The Typological Database System  The TDS Curator project CLIN 20 - February 5, 2010
  • 16. The Integrated Data and Documentation Format  Data, structuring information and documentation are combined into an integrated, XML-based standardized format, the Integrated Data and Document Format (IDDF).  Software is provided that can manage IDDF-encoded resources in a generic way, just as a text editor or corpus tool can manage arbitrary conforming resources.  New generations of management software can be provided in the future, utilizing the self-describing nature of the IDDF and an economy of scale. CLIN 20 - February 5, 2010
  • 17. IDDF structure  Two major sections: 1. Metadata section:  provides the (loose) data schema  documents the elements in the schema 2. Data section:  contains the actual data  Hierarchical, semi-structured data model  Network of hierarchical units, a.k.a. semantic contexts CLIN 20 - February 5, 2010
  • 18. IDDF: metadata  For each data element:  A label and a description  One or more links  to other elements  to external resources, e.g., a knowledge base  Data types:  A semantic data type for the element, e.g. UPPC  A semantic (key) value data type, e.g. interlinear glossed text tier  An (partial) enumeration of possible values:  The literal (key) value  A label and a description  One or more links  to other elements  to external resources CLIN 20 - February 5, 2010
  • 19. IDDF: system architecture User Tool renderer User interface generic Application Programming Interface Knowledge IDDF base Resource creation domain specific CLIN 20 - February 5, 2010
  • 20. Next  The problem of sustainability operability  Sustainable operability of typological databases  The IDDF architecture  The Typological Database System  The TDS Curator project CLIN 20 - February 5, 2010
  • 21. The Typological Database System  The Typological Database System (TDS) provides integrated access to multiple, independently created typological databases.  Provide an interface that will help users find relevant data.  Allow users to interpret the data they are presented with.  The system behaves, as much as possible, as a single database. Various differences between the component databases must be dealt with. CLIN 20 - February 5, 2010
  • 22. TDS: system architecture 1060 & BB online Navigating and searching Querying Reasoning TDS Workbench TDS data Global domain ontology Enriching Merging Local DTL engine DTL Transforming specifications offline Importing meta data component databases CLIN 20 - February 5, 2010
  • 23. Next  The problem of sustainable operability  Sustainable operability of typological databases  The IDDF architecture  The Typological Database System  The TDS Curator project CLIN 20 - February 5, 2010
  • 24. The TDS-Curator project  CLARIN-NL Call 1 project  “TDS Curator will make the TDS into a sustainable service that conforms to CLARIN infrastructural requirements.”  Partners:  Utrecht University  DANS  Max-Planck-Institute for Psycholinguistics  May – November, 2010 CLIN 20 - February 5, 2010
  • 25. Work packages User Tool renderer User interface generic Application Programming Interface Knowledge IDDF base Resource creation TDS import domain specific CLIN 20 - February 5, 2010
  • 26. IDDF-based sustainable operability languagelink.let.uu.nl DANS Figure thanks to Dirk Roorda (DANS) CLIN 20 - February 5, 2010
  • 27. Summary  Sustainable operability is a challenge for complex resources like typological databases  A corner stone of a solution is a generic format that is rich enough to allow operability by generic tools  The Typological Database System provides a promising architecture, which has been applied to more then a dozen of typological databases  The propriety data model of the TDS can be turned into an open format, i.e. the Integrated Data and Documentation Format  In the CLARIN-NL TDS Curator project this more generic setup will be realized  It will be interesting to also use this generic system outside the domain of linguistic typology or even linguistics CLIN 20 - February 5, 2010
  • 28. http://languagelink.let.uu.nl/tds/ Thanks for your attention! CLIN 20 - February 5, 2010
  • 29. IDDF: top-level structure <iddf:warehouse xmlns:iddf="http://.../ns/iddf"> <iddf:meta> <iddf:scope id="tds" type="warehouse"> ... </iddf:scope> <iddf:notion id="n1" name="language“ scope="tds“ type="root" key-datatype="enum"> <iddf:label>Language</iddf:label> <iddf:description> One of the world’s languages </iddf:description> ... </iddf:notion> ... </iddf:meta> <iddf:data xmlns:tds="..." ...> <tds:language iddf:notion="n1" key="..."> ... </tds:language> ... </iddf:data> CLIN 20 - February 5, 2010 </iddf:warehouse>
  • 30. IDDF: metadata example <iddf:notion id="n7" name="vowel" scope="SyllTyp"> <iddf:label>Vowel</iddf:label> <iddf:description> Is the segment a vowel? </iddf:description> <iddf:link type="concept" rel="as“ href="...owl#vowel"/> <iddf:link type="concept" rel="to“ href="...owl#vocalicFeatureNode"/> <iddf:values datatype="enum"> <iddf:value> <iddf:literal>+</iddf:literal> <iddf:description> The segment is a vowel. </iddf:description> </iddf:value> … </iddf:values> </iddf:notion> CLIN 20 - February 5, 2010
  • 31. IDDF: data example <iddf:data xmlns:tds="…/ns/iddf/tds" … > <tds:language key="l-iso-tba" iddf:notion="n1" iddf:sources="SyllTyp UPSID"> <tds:identification iddf:notion="n2" iddf:sources="SyllTyp UPSID"> <tds:name iddf:notion="n3" iddf:sources="SyllTyp UPSID"> <iddf:value srcs="SyllTyp"> Wari’ (Tubar&#227;o) </iddf:value> <iddf:value srcs="UPSID"> Huari </iddf:value> </tds:name> … </tds:identification> … </tds:language> … </iddf:data> CLIN 20 - February 5, 2010
  • 32. CLIN 20 - February 5, 2010