SlideShare a Scribd company logo
Jakob Voß
Revealing digital documents
 Concealed structures in data
     http://arxiv.org/abs/1105.5832
          http://aboutdata.org


          International Conference on Theory
          and Practice in Digital Libraries (TPDL)
          Doctoral Consortium, Berlin 2011-09-25
question




           how are (digital) documents
            structured and described?



Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th     http://aboutdata.org
what is a document?

          “[...] any physical or symbolic sign, preserved
                or recorded, intended to represent, to
            reconstruct, or to demonstrate a physical or
             conceptual phenomenon” – Suzanne Briet

       “[...] consists of anything that someone wishes
       to store. A document is something designated
      by a person to be a document [...]“ – Ted Nelson



Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
scope




                digital documents
            somehow recorded (stable),
           eventually as sequence of bits



Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
CR2, AAF, AAT, ADL, AES Core Audio, AES Process History, AGLS, Alleg
SCII, ASN.1, Atom, BIBO, BibTeX, BISAC, BPEL, BPMN, BSON, CanCor
 CO, CDR, CDWA, CDWA Lite, CIDOC/CRM, CQL, CSDGM, CSV, DACS
ata Committee Content Standard, DC, DCAM, DDC, DDI, DDL, DFDL, DI
 G35, DjVU, DOM, DTD, Dublin Core, DwC, EAC, EAC-CPF, EAD, ebXM
    ECN, Ediakt, EDIFAKT, eduPerson, EML, ERM, Etch, EXIF, Federal
eographic, FOAF, FRAD, FRBR, FRSAD, FRSAR, GEM, GILS, GKD, GM
ssian, HTML, HTTP, ID3, IDL, IEEE/LOM, indecs, inetOrgPerson, INI, IPT
I, ISAAR(CPF), ISAD(G), ISBD, ISBN, ISO 19115, ISO 19119, JSON, KM
               there is not one
LCC, LCSH, LDAP, Linked Data, LMER, MAB2, MADS, MARC, MARC21
 RC Relator Codes, MARCXML, MathML, MEI, MESH, METS, METS Rig
           single document format
MFC, MGraph, MIX, MO, MODS, MOTS, MPEG-21 , MPEG-7, MSchema
seumDat, MusicXML, MXF, NewsML, NFC, NFD, NFKC, NFKD, NIAM, O
OAI-ORE, OAI-PMH, OAIS, ODRL, ONIX, Ontology for Media, OODBMS
OpenDocument, OpenSearch, OpenURL, ORM, OWL, PB Core, PDF, PI
ca+, Pica3, PND, PREMIS, PRISM, Proto, QDC, RAD, RAK, RDA, RDBM
DF, RDFS, RDF/XML, Relax NG, RELAX NG, Resource, RIS, RSS, RSW
 Schematron, SCORM, SDXF, Seel, S-EXP, SGML, SIOC, SKOS, SMIL,
PECTRUM, SQL, SRU/SRW, SWAP, SWB, TEI, TEX, TextMD, TGM I, TG
 TGN, Thrift, Topic Maps, UCS, ULAN, UML, unAPI, UNIMARC, URI, UTF
 ard, Vorbis Comment, VRA, VSO Data Model, XDR, XMetaDiss, XML, XM
thesis



       but there are common patterns
          on all levels of description,
               independent from
            particular technologies


Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
examples of particular technologies
     XML                                                        relational databases
      ●   Unicode                                                ●   Relational Model
      ●   XML Infoset                                            ●   SQL
      ●   XML Schema                                             ●   Entity-Relationship-
      ●   Xpath                                                      Diagrams



                      families of related standards


Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
method


                   not statistical
           this would limit my research to
             one level and technology of
                     description


Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th    http://aboutdata.org
method




              phenomenological
      data description in all of its forms
       as it appears in our experience



Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th    http://aboutdata.org
phenomenological method

                                                                data description analyzed
                                                                as phenomena:
                                                                1. critical intuiting
                                                                   (experience)
                                                                2. analyzing structures,
      Hegel                                                        free of known
                      Husserl                                      categories
                                     Merleau-Ponty*
                                                                3. describing the essence



  * Image CC-BY Pierre-Alain Gouanvic

Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
results
      1) Categorization
         of data structuring methods
      2) Collection
         of data structuring paradigms
      3) Pattern language
         of data patterns




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th    http://aboutdata.org
result 1: categorization of methods
      ●   encodings express data
          (UTF-8 Unicode, IEEE floating point, Base64…)
      ●   file and database systems store data
      ●   identifiers and query languages refer to data
      ●   data structuring and markup languages
          structure data
      ●   schema languages constrain and validate data
      ●   conceptual models describe data

    ¡Concrete methods appear as combinations of categories!

Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
result 2: paradigms
      ●   Document- or Object-oriented approach
            ●   Document-oriented (e.g. ordered tree with
                tagged character strings: XML, Relax NG…)
                ⇒ descriptive data description
            ●   Object-oriented (objects with properties and
                defined value spaces: XML Schema, UML…)
                ⇒ prescriptive data description




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
result 2: paradigms
      ●   Entities and connections

              Jakob                    1979


                                      born
               Jakob                                          1979



               Jakob                   Birth                  1979


Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
result 2: paradigms
      ●   Layers of abstraction
      ●   Standards and rules
      ●   Collections and types
      ●   Granularity




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
result 3: patterns
      ●   patterns as systematic tool for describing good design
          practice, introduced by Christopher Alexander:
          “Each pattern describes a problem which occurs over and
            over again in our environment, and then describes the
                   core of the solution to that problem […]”
      ●   Adopted as design patterns in software engineering
      ●   Collected in a pattern language with meaningful
          connections between patterns (network of patterns).




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
result 3: patterns
                                            collection

          separator                                                              known size


                                            sequence




       position                           ordered set                                  array



Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th     http://aboutdata.org
applications
      ●   data archeology
            ●   In 200 years someone finds snapshots and
                archives of Wikipedia in different forms
                (SQL, XML, Wikitext, DBPedia, HTML…)
            ●   What are significant parts?
                How relate parts to each other?




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
… another document




                               to give a simple example…




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
… another document

                                   sequence with delimiter




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
… another document

                                   sequence with delimiter



                     grouping of sequences with delimiter




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
… another document

                                   sequence with delimiter



                     grouping of sequences with delimiter



                                   encoding (morse code)
 D           A        T        A                   P              A        T T E             R       N          S
Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th       http://aboutdata.org

More Related Content

What's hot

Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology EngineeringLearning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineering
butest
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
Herbert Van de Sompel
 
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataNERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
Giuseppe Rizzo
 
Motivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustrationMotivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustration
Herbert Van de Sompel
 
The aDORe Federation Architecture
The aDORe Federation ArchitectureThe aDORe Federation Architecture
The aDORe Federation Architecture
Herbert Van de Sompel
 
Augmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositoriesAugmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositories
Herbert Van de Sompel
 
Applying NLP (natural language processing) to the patent genre
Applying NLP (natural language processing) to the patent genreApplying NLP (natural language processing) to the patent genre
Applying NLP (natural language processing) to the patent genre
Uny Cao
 
NERD: an open source platform for extracting and disambiguating named entitie...
NERD: an open source platform for extracting and disambiguating named entitie...NERD: an open source platform for extracting and disambiguating named entitie...
NERD: an open source platform for extracting and disambiguating named entitie...
Raphael Troncy
 
OAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall ForumOAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall Forum
Robert Sanderson
 
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
kevig
 
Perspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from textPerspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from text
Jennifer D'Souza
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: Introduction
Kent State University
 
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Jennifer D'Souza
 
Lean ontology development
Lean ontology developmentLean ontology development
Lean ontology development
Lieke Verhelst
 
Graph Databases Lifecycle Methodology and Tool to Support Index/Store Versio...
Graph Databases Lifecycle Methodology  and Tool to Support Index/Store Versio...Graph Databases Lifecycle Methodology  and Tool to Support Index/Store Versio...
Graph Databases Lifecycle Methodology and Tool to Support Index/Store Versio...
Paolo Nesi
 

What's hot (15)

Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology EngineeringLearning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineering
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataNERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
 
Motivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustrationMotivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustration
 
The aDORe Federation Architecture
The aDORe Federation ArchitectureThe aDORe Federation Architecture
The aDORe Federation Architecture
 
Augmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositoriesAugmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositories
 
Applying NLP (natural language processing) to the patent genre
Applying NLP (natural language processing) to the patent genreApplying NLP (natural language processing) to the patent genre
Applying NLP (natural language processing) to the patent genre
 
NERD: an open source platform for extracting and disambiguating named entitie...
NERD: an open source platform for extracting and disambiguating named entitie...NERD: an open source platform for extracting and disambiguating named entitie...
NERD: an open source platform for extracting and disambiguating named entitie...
 
OAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall ForumOAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall Forum
 
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
 
Perspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from textPerspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from text
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: Introduction
 
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
 
Lean ontology development
Lean ontology developmentLean ontology development
Lean ontology development
 
Graph Databases Lifecycle Methodology and Tool to Support Index/Store Versio...
Graph Databases Lifecycle Methodology  and Tool to Support Index/Store Versio...Graph Databases Lifecycle Methodology  and Tool to Support Index/Store Versio...
Graph Databases Lifecycle Methodology and Tool to Support Index/Store Versio...
 

Similar to Revealing digital documents - concealed structures in data

Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
Laura Po
 
Open data and reuse of public information
Open data and reuse of public informationOpen data and reuse of public information
Open data and reuse of public information
Vestforsk.no
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
Enrico Daga
 
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant FormatELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
PretaLLOD
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
Carole Goble
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
Giorgia Lodi
 
Building intelligent systems (that can explain)
Building intelligent systems (that can explain)Building intelligent systems (that can explain)
Building intelligent systems (that can explain)
Vrije Universiteit Amsterdam
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Ontotext
 
Graphic Editor For Multilingual Ontologies
Graphic Editor For Multilingual OntologiesGraphic Editor For Multilingual Ontologies
Graphic Editor For Multilingual Ontologies
Sociedade da Informação
 
Mapping of extensible markup language-to-ontology representation for effectiv...
Mapping of extensible markup language-to-ontology representation for effectiv...Mapping of extensible markup language-to-ontology representation for effectiv...
Mapping of extensible markup language-to-ontology representation for effectiv...
IAESIJAI
 
Ontology Engineering
Ontology EngineeringOntology Engineering
Ontology Engineering
Alessandro Adamou
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
Alessandro Adamou
 
Building intelligent systems (that can explain)
Building intelligent systems (that can explain)Building intelligent systems (that can explain)
Building intelligent systems (that can explain)
Vrije Universiteit Amsterdam
 
dotte.ppt
dotte.pptdotte.ppt
dotte.ppt
qwerty309340
 
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked dataESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
eswcsummerschool
 
Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001
Dan Brickley
 
Semantic Web in Physical Science
Semantic Web in Physical ScienceSemantic Web in Physical Science
Semantic Web in Physical Science
petermurrayrust
 
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYUSING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
cseij
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
Scott Abel
 
Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1
iotest
 

Similar to Revealing digital documents - concealed structures in data (20)

Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
Open data and reuse of public information
Open data and reuse of public informationOpen data and reuse of public information
Open data and reuse of public information
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant FormatELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
Building intelligent systems (that can explain)
Building intelligent systems (that can explain)Building intelligent systems (that can explain)
Building intelligent systems (that can explain)
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
Graphic Editor For Multilingual Ontologies
Graphic Editor For Multilingual OntologiesGraphic Editor For Multilingual Ontologies
Graphic Editor For Multilingual Ontologies
 
Mapping of extensible markup language-to-ontology representation for effectiv...
Mapping of extensible markup language-to-ontology representation for effectiv...Mapping of extensible markup language-to-ontology representation for effectiv...
Mapping of extensible markup language-to-ontology representation for effectiv...
 
Ontology Engineering
Ontology EngineeringOntology Engineering
Ontology Engineering
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
 
Building intelligent systems (that can explain)
Building intelligent systems (that can explain)Building intelligent systems (that can explain)
Building intelligent systems (that can explain)
 
dotte.ppt
dotte.pptdotte.ppt
dotte.ppt
 
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked dataESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
 
Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001
 
Semantic Web in Physical Science
Semantic Web in Physical ScienceSemantic Web in Physical Science
Semantic Web in Physical Science
 
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYUSING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
 
Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1
 

More from Jakob .

Einheitliche Normdatendienste der VZG
Einheitliche Normdatendienste der VZGEinheitliche Normdatendienste der VZG
Einheitliche Normdatendienste der VZG
Jakob .
 
Connections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedConnections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystified
Jakob .
 
Linked Open Data in Bibliotheken, Archiven & Museen
Linked Open Data in Bibliotheken, Archiven & MuseenLinked Open Data in Bibliotheken, Archiven & Museen
Linked Open Data in Bibliotheken, Archiven & Museen
Jakob .
 
Collaborative Creation of a Wikidata handbook
Collaborative Creation of a Wikidata handbookCollaborative Creation of a Wikidata handbook
Collaborative Creation of a Wikidata handbook
Jakob .
 
Another RDF Encoding Form
Another RDF Encoding FormAnother RDF Encoding Form
Another RDF Encoding Form
Jakob .
 
On the Way to a Holding Ontology
On the Way to a Holding OntologyOn the Way to a Holding Ontology
On the Way to a Holding Ontology
Jakob .
 
Stand und Planungen im Bereich der Schnittstellen in der VZG
Stand und Planungen im Bereich der Schnittstellen in der VZGStand und Planungen im Bereich der Schnittstellen in der VZG
Stand und Planungen im Bereich der Schnittstellen in der VZG
Jakob .
 
Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...
Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...
Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...
Jakob .
 
Beschreibung von Bibliotheks-Dienstleistungen mit Mikro-Ontologien
Beschreibung von Bibliotheks-Dienstleistungen mit Mikro-OntologienBeschreibung von Bibliotheks-Dienstleistungen mit Mikro-Ontologien
Beschreibung von Bibliotheks-Dienstleistungen mit Mikro-Ontologien
Jakob .
 
Linking Folksonomies to Knowledge Organization Systems
Linking Folksonomies to Knowledge Organization SystemsLinking Folksonomies to Knowledge Organization Systems
Linking Folksonomies to Knowledge Organization Systems
Jakob .
 
Encoding Patron Information in RDF
Encoding Patron Information in RDFEncoding Patron Information in RDF
Encoding Patron Information in RDF
Jakob .
 
Libraries in a data-centered environment
Libraries in a data-centered environmentLibraries in a data-centered environment
Libraries in a data-centered environment
Jakob .
 
Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...
Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...
Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...
Jakob .
 
FRBR light with Simplified Ontology for Bibliographic Resource
FRBR light with Simplified Ontology for Bibliographic ResourceFRBR light with Simplified Ontology for Bibliographic Resource
FRBR light with Simplified Ontology for Bibliographic Resource
Jakob .
 
RDF-Daten in eigenen Anwendungen nutzen
RDF-Daten in eigenen Anwendungen nutzenRDF-Daten in eigenen Anwendungen nutzen
RDF-Daten in eigenen Anwendungen nutzen
Jakob .
 
Linked Data Light - Linkaggregation mit BEACON
Linked Data Light - Linkaggregation mit BEACONLinked Data Light - Linkaggregation mit BEACON
Linked Data Light - Linkaggregation mit BEACON
Jakob .
 
Wie kommen unsere Sacherschließungsdaten ins Semantic Web? Vom lokalen Normda...
Wie kommen unsere Sacherschließungsdaten ins Semantic Web? Vom lokalen Normda...Wie kommen unsere Sacherschließungsdaten ins Semantic Web? Vom lokalen Normda...
Wie kommen unsere Sacherschließungsdaten ins Semantic Web? Vom lokalen Normda...
Jakob .
 
Herausforderungen und Lösungen bei der Publikation und Nutzung von Normdaten ...
Herausforderungen und Lösungen bei der Publikation und Nutzung von Normdaten ...Herausforderungen und Lösungen bei der Publikation und Nutzung von Normdaten ...
Herausforderungen und Lösungen bei der Publikation und Nutzung von Normdaten ...
Jakob .
 
Linked Data: Die Zukunft der Nutzung von Katalogdaten
Linked Data: Die Zukunft der Nutzung von KatalogdatenLinked Data: Die Zukunft der Nutzung von Katalogdaten
Linked Data: Die Zukunft der Nutzung von Katalogdaten
Jakob .
 
We were promised Xanadu
We were promised XanaduWe were promised Xanadu
We were promised Xanadu
Jakob .
 

More from Jakob . (20)

Einheitliche Normdatendienste der VZG
Einheitliche Normdatendienste der VZGEinheitliche Normdatendienste der VZG
Einheitliche Normdatendienste der VZG
 
Connections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedConnections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystified
 
Linked Open Data in Bibliotheken, Archiven & Museen
Linked Open Data in Bibliotheken, Archiven & MuseenLinked Open Data in Bibliotheken, Archiven & Museen
Linked Open Data in Bibliotheken, Archiven & Museen
 
Collaborative Creation of a Wikidata handbook
Collaborative Creation of a Wikidata handbookCollaborative Creation of a Wikidata handbook
Collaborative Creation of a Wikidata handbook
 
Another RDF Encoding Form
Another RDF Encoding FormAnother RDF Encoding Form
Another RDF Encoding Form
 
On the Way to a Holding Ontology
On the Way to a Holding OntologyOn the Way to a Holding Ontology
On the Way to a Holding Ontology
 
Stand und Planungen im Bereich der Schnittstellen in der VZG
Stand und Planungen im Bereich der Schnittstellen in der VZGStand und Planungen im Bereich der Schnittstellen in der VZG
Stand und Planungen im Bereich der Schnittstellen in der VZG
 
Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...
Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...
Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...
 
Beschreibung von Bibliotheks-Dienstleistungen mit Mikro-Ontologien
Beschreibung von Bibliotheks-Dienstleistungen mit Mikro-OntologienBeschreibung von Bibliotheks-Dienstleistungen mit Mikro-Ontologien
Beschreibung von Bibliotheks-Dienstleistungen mit Mikro-Ontologien
 
Linking Folksonomies to Knowledge Organization Systems
Linking Folksonomies to Knowledge Organization SystemsLinking Folksonomies to Knowledge Organization Systems
Linking Folksonomies to Knowledge Organization Systems
 
Encoding Patron Information in RDF
Encoding Patron Information in RDFEncoding Patron Information in RDF
Encoding Patron Information in RDF
 
Libraries in a data-centered environment
Libraries in a data-centered environmentLibraries in a data-centered environment
Libraries in a data-centered environment
 
Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...
Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...
Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...
 
FRBR light with Simplified Ontology for Bibliographic Resource
FRBR light with Simplified Ontology for Bibliographic ResourceFRBR light with Simplified Ontology for Bibliographic Resource
FRBR light with Simplified Ontology for Bibliographic Resource
 
RDF-Daten in eigenen Anwendungen nutzen
RDF-Daten in eigenen Anwendungen nutzenRDF-Daten in eigenen Anwendungen nutzen
RDF-Daten in eigenen Anwendungen nutzen
 
Linked Data Light - Linkaggregation mit BEACON
Linked Data Light - Linkaggregation mit BEACONLinked Data Light - Linkaggregation mit BEACON
Linked Data Light - Linkaggregation mit BEACON
 
Wie kommen unsere Sacherschließungsdaten ins Semantic Web? Vom lokalen Normda...
Wie kommen unsere Sacherschließungsdaten ins Semantic Web? Vom lokalen Normda...Wie kommen unsere Sacherschließungsdaten ins Semantic Web? Vom lokalen Normda...
Wie kommen unsere Sacherschließungsdaten ins Semantic Web? Vom lokalen Normda...
 
Herausforderungen und Lösungen bei der Publikation und Nutzung von Normdaten ...
Herausforderungen und Lösungen bei der Publikation und Nutzung von Normdaten ...Herausforderungen und Lösungen bei der Publikation und Nutzung von Normdaten ...
Herausforderungen und Lösungen bei der Publikation und Nutzung von Normdaten ...
 
Linked Data: Die Zukunft der Nutzung von Katalogdaten
Linked Data: Die Zukunft der Nutzung von KatalogdatenLinked Data: Die Zukunft der Nutzung von Katalogdaten
Linked Data: Die Zukunft der Nutzung von Katalogdaten
 
We were promised Xanadu
We were promised XanaduWe were promised Xanadu
We were promised Xanadu
 

Recently uploaded

Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Networks
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
Steven Carlson
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
Google Developer Group - Harare
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
Zilliz
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
moinahousna
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
313mohammedarshad
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
digitalxplive
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
Priyanka Aash
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 

Recently uploaded (20)

Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 

Revealing digital documents - concealed structures in data

  • 1. Jakob Voß Revealing digital documents Concealed structures in data http://arxiv.org/abs/1105.5832 http://aboutdata.org International Conference on Theory and Practice in Digital Libraries (TPDL) Doctoral Consortium, Berlin 2011-09-25
  • 2. question how are (digital) documents structured and described? Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 3. what is a document? “[...] any physical or symbolic sign, preserved or recorded, intended to represent, to reconstruct, or to demonstrate a physical or conceptual phenomenon” – Suzanne Briet “[...] consists of anything that someone wishes to store. A document is something designated by a person to be a document [...]“ – Ted Nelson Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 4. scope digital documents somehow recorded (stable), eventually as sequence of bits Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 5. CR2, AAF, AAT, ADL, AES Core Audio, AES Process History, AGLS, Alleg SCII, ASN.1, Atom, BIBO, BibTeX, BISAC, BPEL, BPMN, BSON, CanCor CO, CDR, CDWA, CDWA Lite, CIDOC/CRM, CQL, CSDGM, CSV, DACS ata Committee Content Standard, DC, DCAM, DDC, DDI, DDL, DFDL, DI G35, DjVU, DOM, DTD, Dublin Core, DwC, EAC, EAC-CPF, EAD, ebXM ECN, Ediakt, EDIFAKT, eduPerson, EML, ERM, Etch, EXIF, Federal eographic, FOAF, FRAD, FRBR, FRSAD, FRSAR, GEM, GILS, GKD, GM ssian, HTML, HTTP, ID3, IDL, IEEE/LOM, indecs, inetOrgPerson, INI, IPT I, ISAAR(CPF), ISAD(G), ISBD, ISBN, ISO 19115, ISO 19119, JSON, KM there is not one LCC, LCSH, LDAP, Linked Data, LMER, MAB2, MADS, MARC, MARC21 RC Relator Codes, MARCXML, MathML, MEI, MESH, METS, METS Rig single document format MFC, MGraph, MIX, MO, MODS, MOTS, MPEG-21 , MPEG-7, MSchema seumDat, MusicXML, MXF, NewsML, NFC, NFD, NFKC, NFKD, NIAM, O OAI-ORE, OAI-PMH, OAIS, ODRL, ONIX, Ontology for Media, OODBMS OpenDocument, OpenSearch, OpenURL, ORM, OWL, PB Core, PDF, PI ca+, Pica3, PND, PREMIS, PRISM, Proto, QDC, RAD, RAK, RDA, RDBM DF, RDFS, RDF/XML, Relax NG, RELAX NG, Resource, RIS, RSS, RSW Schematron, SCORM, SDXF, Seel, S-EXP, SGML, SIOC, SKOS, SMIL, PECTRUM, SQL, SRU/SRW, SWAP, SWB, TEI, TEX, TextMD, TGM I, TG TGN, Thrift, Topic Maps, UCS, ULAN, UML, unAPI, UNIMARC, URI, UTF ard, Vorbis Comment, VRA, VSO Data Model, XDR, XMetaDiss, XML, XM
  • 6. thesis but there are common patterns on all levels of description, independent from particular technologies Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 7. examples of particular technologies XML relational databases ● Unicode ● Relational Model ● XML Infoset ● SQL ● XML Schema ● Entity-Relationship- ● Xpath Diagrams families of related standards Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 8. method not statistical this would limit my research to one level and technology of description Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 9. method phenomenological data description in all of its forms as it appears in our experience Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 10. phenomenological method data description analyzed as phenomena: 1. critical intuiting (experience) 2. analyzing structures, Hegel free of known Husserl categories Merleau-Ponty* 3. describing the essence * Image CC-BY Pierre-Alain Gouanvic Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 11. results 1) Categorization of data structuring methods 2) Collection of data structuring paradigms 3) Pattern language of data patterns Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 12. result 1: categorization of methods ● encodings express data (UTF-8 Unicode, IEEE floating point, Base64…) ● file and database systems store data ● identifiers and query languages refer to data ● data structuring and markup languages structure data ● schema languages constrain and validate data ● conceptual models describe data ¡Concrete methods appear as combinations of categories! Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 13. result 2: paradigms ● Document- or Object-oriented approach ● Document-oriented (e.g. ordered tree with tagged character strings: XML, Relax NG…) ⇒ descriptive data description ● Object-oriented (objects with properties and defined value spaces: XML Schema, UML…) ⇒ prescriptive data description Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 14. result 2: paradigms ● Entities and connections Jakob 1979 born Jakob 1979 Jakob Birth 1979 Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 15. result 2: paradigms ● Layers of abstraction ● Standards and rules ● Collections and types ● Granularity Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 16. result 3: patterns ● patterns as systematic tool for describing good design practice, introduced by Christopher Alexander: “Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem […]” ● Adopted as design patterns in software engineering ● Collected in a pattern language with meaningful connections between patterns (network of patterns). Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 17. result 3: patterns collection separator known size sequence position ordered set array Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 18. applications ● data archeology ● In 200 years someone finds snapshots and archives of Wikipedia in different forms (SQL, XML, Wikitext, DBPedia, HTML…) ● What are significant parts? How relate parts to each other? Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 19. Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 20. … another document to give a simple example… Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 21. … another document sequence with delimiter Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 22. … another document sequence with delimiter grouping of sequences with delimiter Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 23. … another document sequence with delimiter grouping of sequences with delimiter encoding (morse code) D A T A P A T T E R N S Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org