Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Jakob VoßRevealing digital documents Concealed structures in data     http://arxiv.org/abs/1105.5832          http://about...
question           how are (digital) documents            structured and described?Jakob Voß: Revealing digital documents....
what is a document?          “[...] any physical or symbolic sign, preserved                or recorded, intended to repre...
scope                digital documents            somehow recorded (stable),           eventually as sequence of bitsJakob...
CR2, AAF, AAT, ADL, AES Core Audio, AES Process History, AGLS, AllegSCII, ASN.1, Atom, BIBO, BibTeX, BISAC, BPEL, BPMN, BS...
thesis       but there are common patterns          on all levels of description,               independent from          ...
examples of particular technologies     XML                                                        relational databases   ...
method                   not statistical           this would limit my research to             one level and technology of...
method              phenomenological      data description in all of its forms       as it appears in our experienceJakob ...
phenomenological method                                                                data description analyzed          ...
results      1) Categorization         of data structuring methods      2) Collection         of data structuring paradigm...
result 1: categorization of methods      ●   encodings express data          (UTF-8 Unicode, IEEE floating point, Base64…)...
result 2: paradigms      ●   Document- or Object-oriented approach            ●   Document-oriented (e.g. ordered tree wit...
result 2: paradigms      ●   Entities and connections              Jakob                    1979                          ...
result 2: paradigms      ●   Layers of abstraction      ●   Standards and rules      ●   Collections and types      ●   Gr...
result 3: patterns      ●   patterns as systematic tool for describing good design          practice, introduced by Christ...
result 3: patterns                                            collection          separator                               ...
applications      ●   data archeology            ●   In 200 years someone finds snapshots and                archives of W...
Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th   http://aboutdata.org
… another document                               to give a simple example…Jakob Voß: Revealing digital documents. Conceale...
… another document                                   sequence with delimiterJakob Voß: Revealing digital documents. Concea...
… another document                                   sequence with delimiter                     grouping of sequences wit...
… another document                                   sequence with delimiter                     grouping of sequences wit...
Upcoming SlideShare
Loading in …5
×

Revealing digital documents - concealed structures in data

2,280 views

Published on

Presented September 25th 2011 at the Doctoral Consortium of Conference on Theory and Practice in Digital Libraries (TPDL), Berlin

Published in: Technology, Education
  • Be the first to comment

Revealing digital documents - concealed structures in data

  1. 1. Jakob VoßRevealing digital documents Concealed structures in data http://arxiv.org/abs/1105.5832 http://aboutdata.org International Conference on Theory and Practice in Digital Libraries (TPDL) Doctoral Consortium, Berlin 2011-09-25
  2. 2. question how are (digital) documents structured and described?Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  3. 3. what is a document? “[...] any physical or symbolic sign, preserved or recorded, intended to represent, to reconstruct, or to demonstrate a physical or conceptual phenomenon” – Suzanne Briet “[...] consists of anything that someone wishes to store. A document is something designated by a person to be a document [...]“ – Ted NelsonJakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  4. 4. scope digital documents somehow recorded (stable), eventually as sequence of bitsJakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  5. 5. CR2, AAF, AAT, ADL, AES Core Audio, AES Process History, AGLS, AllegSCII, ASN.1, Atom, BIBO, BibTeX, BISAC, BPEL, BPMN, BSON, CanCor CO, CDR, CDWA, CDWA Lite, CIDOC/CRM, CQL, CSDGM, CSV, DACSata Committee Content Standard, DC, DCAM, DDC, DDI, DDL, DFDL, DI G35, DjVU, DOM, DTD, Dublin Core, DwC, EAC, EAC-CPF, EAD, ebXM ECN, Ediakt, EDIFAKT, eduPerson, EML, ERM, Etch, EXIF, Federaleographic, FOAF, FRAD, FRBR, FRSAD, FRSAR, GEM, GILS, GKD, GMssian, HTML, HTTP, ID3, IDL, IEEE/LOM, indecs, inetOrgPerson, INI, IPTI, ISAAR(CPF), ISAD(G), ISBD, ISBN, ISO 19115, ISO 19119, JSON, KM there is not oneLCC, LCSH, LDAP, Linked Data, LMER, MAB2, MADS, MARC, MARC21 RC Relator Codes, MARCXML, MathML, MEI, MESH, METS, METS Rig single document formatMFC, MGraph, MIX, MO, MODS, MOTS, MPEG-21 , MPEG-7, MSchemaseumDat, MusicXML, MXF, NewsML, NFC, NFD, NFKC, NFKD, NIAM, OOAI-ORE, OAI-PMH, OAIS, ODRL, ONIX, Ontology for Media, OODBMSOpenDocument, OpenSearch, OpenURL, ORM, OWL, PB Core, PDF, PIca+, Pica3, PND, PREMIS, PRISM, Proto, QDC, RAD, RAK, RDA, RDBMDF, RDFS, RDF/XML, Relax NG, RELAX NG, Resource, RIS, RSS, RSW Schematron, SCORM, SDXF, Seel, S-EXP, SGML, SIOC, SKOS, SMIL,PECTRUM, SQL, SRU/SRW, SWAP, SWB, TEI, TEX, TextMD, TGM I, TG TGN, Thrift, Topic Maps, UCS, ULAN, UML, unAPI, UNIMARC, URI, UTF ard, Vorbis Comment, VRA, VSO Data Model, XDR, XMetaDiss, XML, XM
  6. 6. thesis but there are common patterns on all levels of description, independent from particular technologiesJakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  7. 7. examples of particular technologies XML relational databases ● Unicode ● Relational Model ● XML Infoset ● SQL ● XML Schema ● Entity-Relationship- ● Xpath Diagrams families of related standardsJakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  8. 8. method not statistical this would limit my research to one level and technology of descriptionJakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  9. 9. method phenomenological data description in all of its forms as it appears in our experienceJakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  10. 10. phenomenological method data description analyzed as phenomena: 1. critical intuiting (experience) 2. analyzing structures, Hegel free of known Husserl categories Merleau-Ponty* 3. describing the essence * Image CC-BY Pierre-Alain GouanvicJakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  11. 11. results 1) Categorization of data structuring methods 2) Collection of data structuring paradigms 3) Pattern language of data patternsJakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  12. 12. result 1: categorization of methods ● encodings express data (UTF-8 Unicode, IEEE floating point, Base64…) ● file and database systems store data ● identifiers and query languages refer to data ● data structuring and markup languages structure data ● schema languages constrain and validate data ● conceptual models describe data ¡Concrete methods appear as combinations of categories!Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  13. 13. result 2: paradigms ● Document- or Object-oriented approach ● Document-oriented (e.g. ordered tree with tagged character strings: XML, Relax NG…) ⇒ descriptive data description ● Object-oriented (objects with properties and defined value spaces: XML Schema, UML…) ⇒ prescriptive data descriptionJakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  14. 14. result 2: paradigms ● Entities and connections Jakob 1979 born Jakob 1979 Jakob Birth 1979Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  15. 15. result 2: paradigms ● Layers of abstraction ● Standards and rules ● Collections and types ● GranularityJakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  16. 16. result 3: patterns ● patterns as systematic tool for describing good design practice, introduced by Christopher Alexander: “Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem […]” ● Adopted as design patterns in software engineering ● Collected in a pattern language with meaningful connections between patterns (network of patterns).Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  17. 17. result 3: patterns collection separator known size sequence position ordered set arrayJakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  18. 18. applications ● data archeology ● In 200 years someone finds snapshots and archives of Wikipedia in different forms (SQL, XML, Wikitext, DBPedia, HTML…) ● What are significant parts? How relate parts to each other?Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  19. 19. Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  20. 20. … another document to give a simple example…Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  21. 21. … another document sequence with delimiterJakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  22. 22. … another document sequence with delimiter grouping of sequences with delimiterJakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  23. 23. … another document sequence with delimiter grouping of sequences with delimiter encoding (morse code) D A T A P A T T E R N SJakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org

×