Your SlideShare is downloading. ×
0
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Significant Characteristics In Planets Manfred Thaller

1,286

Published on

3rd Annual WePreserve Conference Nice 2008

3rd Annual WePreserve Conference Nice 2008

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,286
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Significant characteristics in Planets Manfred Thaller Universität zu* Köln *University at not of Cologne
  • 2. What are “significant characteristics”? Those properties of a digital file which have to be known to enable the processing of the file within a specific setup.
  • 3. Why extract them by software? To create technical metadata as required by organizational models for long term preservation. (NLNZ)
  • 4. Within Planets … … served by solutions to identify formats: formats registry / PRONOM / DROID. … and a solution for extracting and processing such characteristics: XCL.
  • 5. A Vision Extractor tiff XCDL tiff 93% Migrator Comparator png tiff XCEL png XCEL png XCDL
  • 6. A Vision Extractor Comparator Appropriate XCELs C-Set
  • 7. Why automate? 1 million objects: use one second for each. == 16666.7 minutes == 277.8 hours == 11.57 working days of a computer == 34.7 8-hour days for a Human == 7 working weeks
  • 8. Why automate? 1 million objects: use five minutes for each. == 416 666.7 hours == 52 803.4 8-hour days for a Human
  • 9. Why automate? Assumption: Preservation is only feasible, if the content of two digital objects can be compared without human intervention, giving a numerical estimate of their degree of similarity.
  • 10. Demo
  • 11. Abstract solution I (1) Language to represent the complete content of a digital object. XCDL (2) Language to describe any machine readable format in a formal language. XCEL (3) Software to extract the content of a file based upon a description as under (2) and express it in the language as specified under (1). “extractor” (4) Software to compare two such content descriptions. “comparator”
  • 12. <XCELDocument...> ... <xcdl> <formatDescription>.... <object id=quot;o1quot; > <symbol identifier=quot;ID01_I01_I01_S02quot; <normData id=quot;nd1quot; > ... </normData> originalName=quot;height“ interpretation=quot;uint32quot;> <property id=quot;p1quot; source=quot;rawquot; <range><startposition xsi:type=quot;sequential“> cat=quot;descrquot; > </startposition> <length xsi:type=quot;fixedquot;>4</length></range> <name> compression</name> <name>height</name> <valueSet id=quot;i_i1_s6quot; > <rawValue>0 </rawValue> </symbol> <labValue>...</labValue> <symbol identifier=quot;ID01_I01_I01_S04quot; originalName=quot;colourTypequot;> <dataRef ind=quot;normAllquot; /> <range> <propRel/> <startposition xsi:type=quot;sequentialquot;> </valueSet> </startposition> </property> <length xsi:type=quot;fixedquot;>1</length></range> <property id=quot;p2quot; source=quot;rawquot; <valueInterpretation> cat=quot;descrquot; > <valueLabel>greyscale</valueLabel> <name> height</name> <value>0</value></valueinterpretation> <valueSet id=quot;i_i1_s3quot; > <name>imageType</name> <rawValue>0 0 1 ad </rawValue> </symbol> <labValue> <symbol identifier=quot;ID01_I01_I01_S05quot; <val>429</val> originalName=quot;compressionMethodquot;> <type>uint32</type> <range> </labValue> <startposition xsi:type=quot;sequential“> <dataRef ind=quot;normAllquot; /> </startposition> <propRel/> <length </valueSet> xsi:type=quot;fixedquot;>1</length></range> </property> <valueInterpretation> <property id=quot;p3quot; source=quot;rawquot; <valueLabel>zlibDeflateInflate</valueLabel> cat=quot;descrquot; > <value>0</value></valueInterpretation> <name> imageType</name> <name>compression</name> ..... </symbol>...
  • 13. <request2> <property id=quot;2quot; <measurementRequest> name=quot;imageHeightquot; <source name=quot;XCDL1.xmlquot;/> unit=quot;pixelquot; <target name=quot;XCDL2.xmlquot;/> compStatus=quot;completequot;> <property id=quot;45quot; name=quot;rgbPalettequot;> <values type=quot;intquot;> <metric id=quot;10quot; name=quot;hammingDistancequot;/> <src>32</src> </property> <property id=quot;300quot; name=quot;normDataquot;> <tar>32</tar> <metric id=quot;10quot; </values> name=quot;hammingDistancequot;/> <metric id=quot;50quot; name=quot;RMSEquot;/> <metric id=quot;200quot; </property> name=quot;equalquot; result=quot;truequot;/> <property id=quot;2quot; <metric id=quot;201quot; name=quot;imageHeightquot; unit=quot;pixelquot;> name=quot;intDiffquot; result=quot;0quot;/> <metric id=quot;200quot; name=quot;equalquot;/> <metric id=quot;210quot; <metric id=quot;201quot; name=quot;intDiffquot;/> name=quot;percDevquot; <metric id=quot;210quot; name=quot;percDevquot;/> result=quot;0.000000quot;/> </property> </property> <property id=quot;30quot; name=quot;imageWidthquot; unit=quot;pixelquot;> <metric id=quot;200quot; name=quot;equalquot;/> <metric id=quot;201quot; name=quot;intDiffquot;/> <metric id=quot;210quot; name=quot;percDevquot;/> </property>
  • 14. Abstract solution I (1) Language to represent the complete content of a digital object. XCDL (2) Language to describe any machine readable format in a formal language. XCEL (3) Software to extract the content of a file based upon a description as under (2) and express it in the language as specified under (1). “extractor” (4) Software to compare two such content descriptions. “comparator”
  • 15. Are the following two items equal: VIII  8
  • 16. eight eight VIII  8
  • 17. otto eight eight otto VIII  8
  • 18. otto acht eight eight otto VIII  8 acht
  • 19. 8.0 otto acht eight eight otto VIII  8 acht
  • 20. Information model: „an image“ otto acht eight eight otto VIII  8 acht
  • 21. information model: „an image“ format ontology: „what terms are used in formats to describe image properties“ VIII  8
  • 22. Information model: „what is an image“ Format ontology: „what terms are used in formats to describe image properties“ Extraction language: “how to get the terms describing an image out of a file”
  • 23. Abstract solution II (1) A theoretical model of information (not: data) types – “image”, “text”, “audio” ... (2) Ontologies, which map existing file format terminologies onto these model. (3) A language – XCDL – which allows to express the content of files in different formats using the vocabulary of the ontologies and the “grammar” of the information model.
  • 24. XCDL eXtensible Characterisation Definition Language Purpose: Describe the contents of a file in terms of an abstract model.
  • 25. XCDL: text model (1) A text (= <object>) is composed of data (= <normData>) plus interpretations of data according to the underlying format specification (= <property>).
  • 26. XCDL: text model (2) Or, one level of abstraction higher, a text is composed of content carrying tokens, accompanied by rendering info plus deployment info plus historical info.
  • 27. This text is a <refData id=quot;1quot;>54 68 69 73 20 69 73 20 61 20 74 65 78 74</refData> … <property> <name>fontsize</name> <rawVal> <val>48</val> <type>unsignedInt8</type> </rawVal> <dataRef> <!-- property refers to discrete part of reference data--> <ref id=quot;1quot; start=quot;0quot; end=quot;3quot;/> <ref id=quot;1quot; start=“10quot; end=quot;12quot;/> </dataRef> </property>
  • 28. This text is a <refData id=quot;1quot;>54 68 69 73 20 69 73 20 61 20 74 65 78 74</refData> … <property> <name>fontsize</name> <rawVal> <val>48</val> <type>unsignedInt8</type> </rawVal> <dataRef> <!-- property refers to discrete part of reference data--> <ref id=quot;1quot; start=quot;0quot; end=quot;3quot;/> <ref id=quot;1quot; start=“10quot; end=quot;12quot;/> </dataRef> </property>
  • 29. Thank you! Questions? Manfred.thaller@uni-koeln.de

×