SlideShare a Scribd company logo
1 of 23
Download to read offline
Jakob Voß
Revealing digital documents
 Concealed structures in data
     http://arxiv.org/abs/1105.5832
          http://aboutdata.org


          International Conference on Theory
          and Practice in Digital Libraries (TPDL)
          Doctoral Consortium, Berlin 2011-09-25
question




           how are (digital) documents
            structured and described?



Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th     http://aboutdata.org
what is a document?

          “[...] any physical or symbolic sign, preserved
                or recorded, intended to represent, to
            reconstruct, or to demonstrate a physical or
             conceptual phenomenon” – Suzanne Briet

       “[...] consists of anything that someone wishes
       to store. A document is something designated
      by a person to be a document [...]“ – Ted Nelson



Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
scope




                digital documents
            somehow recorded (stable),
           eventually as sequence of bits



Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
CR2, AAF, AAT, ADL, AES Core Audio, AES Process History, AGLS, Alleg
SCII, ASN.1, Atom, BIBO, BibTeX, BISAC, BPEL, BPMN, BSON, CanCor
 CO, CDR, CDWA, CDWA Lite, CIDOC/CRM, CQL, CSDGM, CSV, DACS
ata Committee Content Standard, DC, DCAM, DDC, DDI, DDL, DFDL, DI
 G35, DjVU, DOM, DTD, Dublin Core, DwC, EAC, EAC-CPF, EAD, ebXM
    ECN, Ediakt, EDIFAKT, eduPerson, EML, ERM, Etch, EXIF, Federal
eographic, FOAF, FRAD, FRBR, FRSAD, FRSAR, GEM, GILS, GKD, GM
ssian, HTML, HTTP, ID3, IDL, IEEE/LOM, indecs, inetOrgPerson, INI, IPT
I, ISAAR(CPF), ISAD(G), ISBD, ISBN, ISO 19115, ISO 19119, JSON, KM
               there is not one
LCC, LCSH, LDAP, Linked Data, LMER, MAB2, MADS, MARC, MARC21
 RC Relator Codes, MARCXML, MathML, MEI, MESH, METS, METS Rig
           single document format
MFC, MGraph, MIX, MO, MODS, MOTS, MPEG-21 , MPEG-7, MSchema
seumDat, MusicXML, MXF, NewsML, NFC, NFD, NFKC, NFKD, NIAM, O
OAI-ORE, OAI-PMH, OAIS, ODRL, ONIX, Ontology for Media, OODBMS
OpenDocument, OpenSearch, OpenURL, ORM, OWL, PB Core, PDF, PI
ca+, Pica3, PND, PREMIS, PRISM, Proto, QDC, RAD, RAK, RDA, RDBM
DF, RDFS, RDF/XML, Relax NG, RELAX NG, Resource, RIS, RSS, RSW
 Schematron, SCORM, SDXF, Seel, S-EXP, SGML, SIOC, SKOS, SMIL,
PECTRUM, SQL, SRU/SRW, SWAP, SWB, TEI, TEX, TextMD, TGM I, TG
 TGN, Thrift, Topic Maps, UCS, ULAN, UML, unAPI, UNIMARC, URI, UTF
 ard, Vorbis Comment, VRA, VSO Data Model, XDR, XMetaDiss, XML, XM
thesis



       but there are common patterns
          on all levels of description,
               independent from
            particular technologies


Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
examples of particular technologies
     XML                                                        relational databases
      ●   Unicode                                                ●   Relational Model
      ●   XML Infoset                                            ●   SQL
      ●   XML Schema                                             ●   Entity-Relationship-
      ●   Xpath                                                      Diagrams



                      families of related standards


Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
method


                   not statistical
           this would limit my research to
             one level and technology of
                     description


Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th    http://aboutdata.org
method




              phenomenological
      data description in all of its forms
       as it appears in our experience



Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th    http://aboutdata.org
phenomenological method

                                                                data description analyzed
                                                                as phenomena:
                                                                1. critical intuiting
                                                                   (experience)
                                                                2. analyzing structures,
      Hegel                                                        free of known
                      Husserl                                      categories
                                     Merleau-Ponty*
                                                                3. describing the essence



  * Image CC-BY Pierre-Alain Gouanvic

Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
results
      1) Categorization
         of data structuring methods
      2) Collection
         of data structuring paradigms
      3) Pattern language
         of data patterns




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th    http://aboutdata.org
result 1: categorization of methods
      ●   encodings express data
          (UTF-8 Unicode, IEEE floating point, Base64…)
      ●   file and database systems store data
      ●   identifiers and query languages refer to data
      ●   data structuring and markup languages
          structure data
      ●   schema languages constrain and validate data
      ●   conceptual models describe data

    ¡Concrete methods appear as combinations of categories!

Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
result 2: paradigms
      ●   Document- or Object-oriented approach
            ●   Document-oriented (e.g. ordered tree with
                tagged character strings: XML, Relax NG…)
                ⇒ descriptive data description
            ●   Object-oriented (objects with properties and
                defined value spaces: XML Schema, UML…)
                ⇒ prescriptive data description




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
result 2: paradigms
      ●   Entities and connections

              Jakob                    1979


                                      born
               Jakob                                          1979



               Jakob                   Birth                  1979


Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
result 2: paradigms
      ●   Layers of abstraction
      ●   Standards and rules
      ●   Collections and types
      ●   Granularity




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
result 3: patterns
      ●   patterns as systematic tool for describing good design
          practice, introduced by Christopher Alexander:
          “Each pattern describes a problem which occurs over and
            over again in our environment, and then describes the
                   core of the solution to that problem […]”
      ●   Adopted as design patterns in software engineering
      ●   Collected in a pattern language with meaningful
          connections between patterns (network of patterns).




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
result 3: patterns
                                            collection

          separator                                                              known size


                                            sequence




       position                           ordered set                                  array



Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th     http://aboutdata.org
applications
      ●   data archeology
            ●   In 200 years someone finds snapshots and
                archives of Wikipedia in different forms
                (SQL, XML, Wikitext, DBPedia, HTML…)
            ●   What are significant parts?
                How relate parts to each other?




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
… another document




                               to give a simple example…




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
… another document

                                   sequence with delimiter




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
… another document

                                   sequence with delimiter



                     grouping of sequences with delimiter




Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
… another document

                                   sequence with delimiter



                     grouping of sequences with delimiter



                                   encoding (morse code)
 D           A        T        A                   P              A        T T E             R       N          S
Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th       http://aboutdata.org

More Related Content

What's hot

Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology EngineeringLearning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineering
butest
 
Motivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustrationMotivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustration
Herbert Van de Sompel
 
Augmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositoriesAugmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositories
Herbert Van de Sompel
 
Graph Databases Lifecycle Methodology and Tool to Support Index/Store Versio...
Graph Databases Lifecycle Methodology  and Tool to Support Index/Store Versio...Graph Databases Lifecycle Methodology  and Tool to Support Index/Store Versio...
Graph Databases Lifecycle Methodology and Tool to Support Index/Store Versio...
Paolo Nesi
 

What's hot (15)

Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology EngineeringLearning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineering
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataNERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
 
Motivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustrationMotivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustration
 
The aDORe Federation Architecture
The aDORe Federation ArchitectureThe aDORe Federation Architecture
The aDORe Federation Architecture
 
Augmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositoriesAugmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositories
 
Applying NLP (natural language processing) to the patent genre
Applying NLP (natural language processing) to the patent genreApplying NLP (natural language processing) to the patent genre
Applying NLP (natural language processing) to the patent genre
 
NERD: an open source platform for extracting and disambiguating named entitie...
NERD: an open source platform for extracting and disambiguating named entitie...NERD: an open source platform for extracting and disambiguating named entitie...
NERD: an open source platform for extracting and disambiguating named entitie...
 
OAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall ForumOAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall Forum
 
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
 
Perspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from textPerspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from text
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: Introduction
 
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
 
Lean ontology development
Lean ontology developmentLean ontology development
Lean ontology development
 
Graph Databases Lifecycle Methodology and Tool to Support Index/Store Versio...
Graph Databases Lifecycle Methodology  and Tool to Support Index/Store Versio...Graph Databases Lifecycle Methodology  and Tool to Support Index/Store Versio...
Graph Databases Lifecycle Methodology and Tool to Support Index/Store Versio...
 

Similar to Revealing digital documents - concealed structures in data

ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked dataESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
eswcsummerschool
 
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYUSING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
cseij
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
Scott Abel
 

Similar to Revealing digital documents - concealed structures in data (20)

Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
Open data and reuse of public information
Open data and reuse of public informationOpen data and reuse of public information
Open data and reuse of public information
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant FormatELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
Building intelligent systems (that can explain)
Building intelligent systems (that can explain)Building intelligent systems (that can explain)
Building intelligent systems (that can explain)
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
Graphic Editor For Multilingual Ontologies
Graphic Editor For Multilingual OntologiesGraphic Editor For Multilingual Ontologies
Graphic Editor For Multilingual Ontologies
 
Mapping of extensible markup language-to-ontology representation for effectiv...
Mapping of extensible markup language-to-ontology representation for effectiv...Mapping of extensible markup language-to-ontology representation for effectiv...
Mapping of extensible markup language-to-ontology representation for effectiv...
 
Ontology Engineering
Ontology EngineeringOntology Engineering
Ontology Engineering
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
 
Building intelligent systems (that can explain)
Building intelligent systems (that can explain)Building intelligent systems (that can explain)
Building intelligent systems (that can explain)
 
dotte.ppt
dotte.pptdotte.ppt
dotte.ppt
 
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked dataESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
 
Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001
 
Semantic Web in Physical Science
Semantic Web in Physical ScienceSemantic Web in Physical Science
Semantic Web in Physical Science
 
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYUSING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
 
Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1
 

More from Jakob .

Stand und Planungen im Bereich der Schnittstellen in der VZG
Stand und Planungen im Bereich der Schnittstellen in der VZGStand und Planungen im Bereich der Schnittstellen in der VZG
Stand und Planungen im Bereich der Schnittstellen in der VZG
Jakob .
 
Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...
Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...
Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...
Jakob .
 
Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...
Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...
Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...
Jakob .
 

More from Jakob . (20)

Einheitliche Normdatendienste der VZG
Einheitliche Normdatendienste der VZGEinheitliche Normdatendienste der VZG
Einheitliche Normdatendienste der VZG
 
Connections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedConnections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystified
 
Linked Open Data in Bibliotheken, Archiven & Museen
Linked Open Data in Bibliotheken, Archiven & MuseenLinked Open Data in Bibliotheken, Archiven & Museen
Linked Open Data in Bibliotheken, Archiven & Museen
 
Collaborative Creation of a Wikidata handbook
Collaborative Creation of a Wikidata handbookCollaborative Creation of a Wikidata handbook
Collaborative Creation of a Wikidata handbook
 
Another RDF Encoding Form
Another RDF Encoding FormAnother RDF Encoding Form
Another RDF Encoding Form
 
On the Way to a Holding Ontology
On the Way to a Holding OntologyOn the Way to a Holding Ontology
On the Way to a Holding Ontology
 
Stand und Planungen im Bereich der Schnittstellen in der VZG
Stand und Planungen im Bereich der Schnittstellen in der VZGStand und Planungen im Bereich der Schnittstellen in der VZG
Stand und Planungen im Bereich der Schnittstellen in der VZG
 
Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...
Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...
Verwaltung dokumentenorientierter DTDs für den Dokument- und Publikationsserv...
 
Beschreibung von Bibliotheks-Dienstleistungen mit Mikro-Ontologien
Beschreibung von Bibliotheks-Dienstleistungen mit Mikro-OntologienBeschreibung von Bibliotheks-Dienstleistungen mit Mikro-Ontologien
Beschreibung von Bibliotheks-Dienstleistungen mit Mikro-Ontologien
 
Linking Folksonomies to Knowledge Organization Systems
Linking Folksonomies to Knowledge Organization SystemsLinking Folksonomies to Knowledge Organization Systems
Linking Folksonomies to Knowledge Organization Systems
 
Encoding Patron Information in RDF
Encoding Patron Information in RDFEncoding Patron Information in RDF
Encoding Patron Information in RDF
 
Libraries in a data-centered environment
Libraries in a data-centered environmentLibraries in a data-centered environment
Libraries in a data-centered environment
 
Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...
Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...
Was gibt's wie und wo? Informationen zu Standorten, Exemplaren und Dienstleis...
 
FRBR light with Simplified Ontology for Bibliographic Resource
FRBR light with Simplified Ontology for Bibliographic ResourceFRBR light with Simplified Ontology for Bibliographic Resource
FRBR light with Simplified Ontology for Bibliographic Resource
 
RDF-Daten in eigenen Anwendungen nutzen
RDF-Daten in eigenen Anwendungen nutzenRDF-Daten in eigenen Anwendungen nutzen
RDF-Daten in eigenen Anwendungen nutzen
 
Linked Data Light - Linkaggregation mit BEACON
Linked Data Light - Linkaggregation mit BEACONLinked Data Light - Linkaggregation mit BEACON
Linked Data Light - Linkaggregation mit BEACON
 
Wie kommen unsere Sacherschließungsdaten ins Semantic Web? Vom lokalen Normda...
Wie kommen unsere Sacherschließungsdaten ins Semantic Web? Vom lokalen Normda...Wie kommen unsere Sacherschließungsdaten ins Semantic Web? Vom lokalen Normda...
Wie kommen unsere Sacherschließungsdaten ins Semantic Web? Vom lokalen Normda...
 
Herausforderungen und Lösungen bei der Publikation und Nutzung von Normdaten ...
Herausforderungen und Lösungen bei der Publikation und Nutzung von Normdaten ...Herausforderungen und Lösungen bei der Publikation und Nutzung von Normdaten ...
Herausforderungen und Lösungen bei der Publikation und Nutzung von Normdaten ...
 
Linked Data: Die Zukunft der Nutzung von Katalogdaten
Linked Data: Die Zukunft der Nutzung von KatalogdatenLinked Data: Die Zukunft der Nutzung von Katalogdaten
Linked Data: Die Zukunft der Nutzung von Katalogdaten
 
We were promised Xanadu
We were promised XanaduWe were promised Xanadu
We were promised Xanadu
 

Recently uploaded

CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
Wonjun Hwang
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 

Recently uploaded (20)

CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Navigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi DaparthiNavigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi Daparthi
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 

Revealing digital documents - concealed structures in data

  • 1. Jakob Voß Revealing digital documents Concealed structures in data http://arxiv.org/abs/1105.5832 http://aboutdata.org International Conference on Theory and Practice in Digital Libraries (TPDL) Doctoral Consortium, Berlin 2011-09-25
  • 2. question how are (digital) documents structured and described? Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 3. what is a document? “[...] any physical or symbolic sign, preserved or recorded, intended to represent, to reconstruct, or to demonstrate a physical or conceptual phenomenon” – Suzanne Briet “[...] consists of anything that someone wishes to store. A document is something designated by a person to be a document [...]“ – Ted Nelson Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 4. scope digital documents somehow recorded (stable), eventually as sequence of bits Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 5. CR2, AAF, AAT, ADL, AES Core Audio, AES Process History, AGLS, Alleg SCII, ASN.1, Atom, BIBO, BibTeX, BISAC, BPEL, BPMN, BSON, CanCor CO, CDR, CDWA, CDWA Lite, CIDOC/CRM, CQL, CSDGM, CSV, DACS ata Committee Content Standard, DC, DCAM, DDC, DDI, DDL, DFDL, DI G35, DjVU, DOM, DTD, Dublin Core, DwC, EAC, EAC-CPF, EAD, ebXM ECN, Ediakt, EDIFAKT, eduPerson, EML, ERM, Etch, EXIF, Federal eographic, FOAF, FRAD, FRBR, FRSAD, FRSAR, GEM, GILS, GKD, GM ssian, HTML, HTTP, ID3, IDL, IEEE/LOM, indecs, inetOrgPerson, INI, IPT I, ISAAR(CPF), ISAD(G), ISBD, ISBN, ISO 19115, ISO 19119, JSON, KM there is not one LCC, LCSH, LDAP, Linked Data, LMER, MAB2, MADS, MARC, MARC21 RC Relator Codes, MARCXML, MathML, MEI, MESH, METS, METS Rig single document format MFC, MGraph, MIX, MO, MODS, MOTS, MPEG-21 , MPEG-7, MSchema seumDat, MusicXML, MXF, NewsML, NFC, NFD, NFKC, NFKD, NIAM, O OAI-ORE, OAI-PMH, OAIS, ODRL, ONIX, Ontology for Media, OODBMS OpenDocument, OpenSearch, OpenURL, ORM, OWL, PB Core, PDF, PI ca+, Pica3, PND, PREMIS, PRISM, Proto, QDC, RAD, RAK, RDA, RDBM DF, RDFS, RDF/XML, Relax NG, RELAX NG, Resource, RIS, RSS, RSW Schematron, SCORM, SDXF, Seel, S-EXP, SGML, SIOC, SKOS, SMIL, PECTRUM, SQL, SRU/SRW, SWAP, SWB, TEI, TEX, TextMD, TGM I, TG TGN, Thrift, Topic Maps, UCS, ULAN, UML, unAPI, UNIMARC, URI, UTF ard, Vorbis Comment, VRA, VSO Data Model, XDR, XMetaDiss, XML, XM
  • 6. thesis but there are common patterns on all levels of description, independent from particular technologies Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 7. examples of particular technologies XML relational databases ● Unicode ● Relational Model ● XML Infoset ● SQL ● XML Schema ● Entity-Relationship- ● Xpath Diagrams families of related standards Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 8. method not statistical this would limit my research to one level and technology of description Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 9. method phenomenological data description in all of its forms as it appears in our experience Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 10. phenomenological method data description analyzed as phenomena: 1. critical intuiting (experience) 2. analyzing structures, Hegel free of known Husserl categories Merleau-Ponty* 3. describing the essence * Image CC-BY Pierre-Alain Gouanvic Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 11. results 1) Categorization of data structuring methods 2) Collection of data structuring paradigms 3) Pattern language of data patterns Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 12. result 1: categorization of methods ● encodings express data (UTF-8 Unicode, IEEE floating point, Base64…) ● file and database systems store data ● identifiers and query languages refer to data ● data structuring and markup languages structure data ● schema languages constrain and validate data ● conceptual models describe data ¡Concrete methods appear as combinations of categories! Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 13. result 2: paradigms ● Document- or Object-oriented approach ● Document-oriented (e.g. ordered tree with tagged character strings: XML, Relax NG…) ⇒ descriptive data description ● Object-oriented (objects with properties and defined value spaces: XML Schema, UML…) ⇒ prescriptive data description Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 14. result 2: paradigms ● Entities and connections Jakob 1979 born Jakob 1979 Jakob Birth 1979 Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 15. result 2: paradigms ● Layers of abstraction ● Standards and rules ● Collections and types ● Granularity Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 16. result 3: patterns ● patterns as systematic tool for describing good design practice, introduced by Christopher Alexander: “Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem […]” ● Adopted as design patterns in software engineering ● Collected in a pattern language with meaningful connections between patterns (network of patterns). Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 17. result 3: patterns collection separator known size sequence position ordered set array Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 18. applications ● data archeology ● In 200 years someone finds snapshots and archives of Wikipedia in different forms (SQL, XML, Wikitext, DBPedia, HTML…) ● What are significant parts? How relate parts to each other? Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 19. Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 20. … another document to give a simple example… Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 21. … another document sequence with delimiter Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 22. … another document sequence with delimiter grouping of sequences with delimiter Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  • 23. … another document sequence with delimiter grouping of sequences with delimiter encoding (morse code) D A T A P A T T E R N S Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org