Successfully reported this slideshow.

Revealing digital documents - concealed structures in data

2

Share

1 of 23
1 of 23

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Revealing digital documents - concealed structures in data

  1. 1. Jakob Voß Revealing digital documents Concealed structures in data http://arxiv.org/abs/1105.5832 http://aboutdata.org International Conference on Theory and Practice in Digital Libraries (TPDL) Doctoral Consortium, Berlin 2011-09-25
  2. 2. question how are (digital) documents structured and described? Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  3. 3. what is a document? “[...] any physical or symbolic sign, preserved or recorded, intended to represent, to reconstruct, or to demonstrate a physical or conceptual phenomenon” – Suzanne Briet “[...] consists of anything that someone wishes to store. A document is something designated by a person to be a document [...]“ – Ted Nelson Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  4. 4. scope digital documents somehow recorded (stable), eventually as sequence of bits Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  5. 5. CR2, AAF, AAT, ADL, AES Core Audio, AES Process History, AGLS, Alleg SCII, ASN.1, Atom, BIBO, BibTeX, BISAC, BPEL, BPMN, BSON, CanCor CO, CDR, CDWA, CDWA Lite, CIDOC/CRM, CQL, CSDGM, CSV, DACS ata Committee Content Standard, DC, DCAM, DDC, DDI, DDL, DFDL, DI G35, DjVU, DOM, DTD, Dublin Core, DwC, EAC, EAC-CPF, EAD, ebXM ECN, Ediakt, EDIFAKT, eduPerson, EML, ERM, Etch, EXIF, Federal eographic, FOAF, FRAD, FRBR, FRSAD, FRSAR, GEM, GILS, GKD, GM ssian, HTML, HTTP, ID3, IDL, IEEE/LOM, indecs, inetOrgPerson, INI, IPT I, ISAAR(CPF), ISAD(G), ISBD, ISBN, ISO 19115, ISO 19119, JSON, KM there is not one LCC, LCSH, LDAP, Linked Data, LMER, MAB2, MADS, MARC, MARC21 RC Relator Codes, MARCXML, MathML, MEI, MESH, METS, METS Rig single document format MFC, MGraph, MIX, MO, MODS, MOTS, MPEG-21 , MPEG-7, MSchema seumDat, MusicXML, MXF, NewsML, NFC, NFD, NFKC, NFKD, NIAM, O OAI-ORE, OAI-PMH, OAIS, ODRL, ONIX, Ontology for Media, OODBMS OpenDocument, OpenSearch, OpenURL, ORM, OWL, PB Core, PDF, PI ca+, Pica3, PND, PREMIS, PRISM, Proto, QDC, RAD, RAK, RDA, RDBM DF, RDFS, RDF/XML, Relax NG, RELAX NG, Resource, RIS, RSS, RSW Schematron, SCORM, SDXF, Seel, S-EXP, SGML, SIOC, SKOS, SMIL, PECTRUM, SQL, SRU/SRW, SWAP, SWB, TEI, TEX, TextMD, TGM I, TG TGN, Thrift, Topic Maps, UCS, ULAN, UML, unAPI, UNIMARC, URI, UTF ard, Vorbis Comment, VRA, VSO Data Model, XDR, XMetaDiss, XML, XM
  6. 6. thesis but there are common patterns on all levels of description, independent from particular technologies Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  7. 7. examples of particular technologies XML relational databases ● Unicode ● Relational Model ● XML Infoset ● SQL ● XML Schema ● Entity-Relationship- ● Xpath Diagrams families of related standards Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  8. 8. method not statistical this would limit my research to one level and technology of description Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  9. 9. method phenomenological data description in all of its forms as it appears in our experience Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  10. 10. phenomenological method data description analyzed as phenomena: 1. critical intuiting (experience) 2. analyzing structures, Hegel free of known Husserl categories Merleau-Ponty* 3. describing the essence * Image CC-BY Pierre-Alain Gouanvic Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  11. 11. results 1) Categorization of data structuring methods 2) Collection of data structuring paradigms 3) Pattern language of data patterns Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  12. 12. result 1: categorization of methods ● encodings express data (UTF-8 Unicode, IEEE floating point, Base64…) ● file and database systems store data ● identifiers and query languages refer to data ● data structuring and markup languages structure data ● schema languages constrain and validate data ● conceptual models describe data ¡Concrete methods appear as combinations of categories! Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  13. 13. result 2: paradigms ● Document- or Object-oriented approach ● Document-oriented (e.g. ordered tree with tagged character strings: XML, Relax NG…) ⇒ descriptive data description ● Object-oriented (objects with properties and defined value spaces: XML Schema, UML…) ⇒ prescriptive data description Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  14. 14. result 2: paradigms ● Entities and connections Jakob 1979 born Jakob 1979 Jakob Birth 1979 Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  15. 15. result 2: paradigms ● Layers of abstraction ● Standards and rules ● Collections and types ● Granularity Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  16. 16. result 3: patterns ● patterns as systematic tool for describing good design practice, introduced by Christopher Alexander: “Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem […]” ● Adopted as design patterns in software engineering ● Collected in a pattern language with meaningful connections between patterns (network of patterns). Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  17. 17. result 3: patterns collection separator known size sequence position ordered set array Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  18. 18. applications ● data archeology ● In 200 years someone finds snapshots and archives of Wikipedia in different forms (SQL, XML, Wikitext, DBPedia, HTML…) ● What are significant parts? How relate parts to each other? Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  19. 19. Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  20. 20. … another document to give a simple example… Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  21. 21. … another document sequence with delimiter Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  22. 22. … another document sequence with delimiter grouping of sequences with delimiter Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org
  23. 23. … another document sequence with delimiter grouping of sequences with delimiter encoding (morse code) D A T A P A T T E R N S Jakob Voß: Revealing digital documents. Concealed structures in data. TPDL 2011, Sep. 25th http://aboutdata.org

×