Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

FRBR Applied to Scientific Data by Joseph A. Hourclé


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

FRBR Applied to Scientific Data by Joseph A. Hourclé

  1. 1. FRBR Applied to Scientific Data Joseph A. Hourclé 2008-Sept-22 ASIS&T PVC
  2. 2. About Me
  3. 3. Functional Requirements for Bibligraphic Records (FRBR) <ul><li>Reference Model for the design of bibliographic catalog systems. </li></ul><ul><li>Defines four different concepts of ‘book’ that might be cataloged. </li></ul><ul><ul><li>Work </li></ul></ul><ul><ul><li>Expression </li></ul></ul><ul><ul><li>Manifestation </li></ul></ul><ul><ul><li>Item </li></ul></ul>
  4. 4. FRBR Group 1 Entities <ul><li>Work </li></ul><ul><ul><li>A distinct intellectual or artistic creation </li></ul></ul><ul><li>Expression </li></ul><ul><ul><li>The intellectual or artistic realization of a work in the form of alpha-numeric, … sound, image, object, movement, etc … </li></ul></ul><ul><li>Manifestation </li></ul><ul><ul><li>The physical embodiment of an expression of a work </li></ul></ul><ul><li>Item </li></ul><ul><ul><li>A single exemplar of a manifestation </li></ul></ul>
  5. 5. What questions can we ask of each level? <ul><li>Work </li></ul><ul><ul><li>Who wrote it? What is the subject? </li></ul></ul><ul><li>Expression </li></ul><ul><ul><li>What language is it in? </li></ul></ul><ul><li>Manifestation </li></ul><ul><ul><li>What size is the font or book? </li></ul></ul><ul><li>Item </li></ul><ul><ul><li>Is the individual copy available to me? </li></ul></ul>
  6. 6. Why ask these questons? <ul><li>Work </li></ul><ul><ul><li>Who wrote it? What is the subject? </li></ul></ul><ul><ul><li>Determine interest / Applicability </li></ul></ul><ul><li>Expression </li></ul><ul><ul><li>What language is it in? </li></ul></ul><ul><ul><li>Usability / Acccessibility (of content) </li></ul></ul><ul><li>Manifestation </li></ul><ul><ul><li>What size is the font or book? </li></ul></ul><ul><ul><li>Usability / Accessibility (of content within carrier) </li></ul></ul><ul><li>Item </li></ul><ul><ul><li>Is the individual copy available to me? </li></ul></ul><ul><ul><li>Availability / Accessibility (of the carrier) </li></ul></ul>
  7. 7. FRBR Applied to Scientific Data
  8. 8. Two Extra Entities <ul><li>Sensor </li></ul><ul><ul><li>Converts information about its environment to a digital signal </li></ul></ul><ul><li>Observation </li></ul><ul><ul><li>Data created by the sensor </li></ul></ul><ul><ul><li>Necessary to unambiguously track if two works are different interpretations of the same data </li></ul></ul>
  9. 9. In this model … <ul><li>Item </li></ul><ul><ul><li>Is a logical item that might be identified via a URL. </li></ul></ul><ul><ul><li>Two items of the same manifestation would be bytewise identical copies </li></ul></ul><ul><li>Manifestation </li></ul><ul><ul><li>A logical embodiment, to include aspects of the carrier </li></ul></ul><ul><ul><ul><li>How each datum is organized within the package </li></ul></ul></ul><ul><ul><ul><li>File format and encoding </li></ul></ul></ul><ul><ul><li>Typically contains multiple expressions </li></ul></ul><ul><ul><li>Two manifestations of the same expression contain identical values within each datum </li></ul></ul>
  10. 10. In this model … <ul><li>Work </li></ul><ul><ul><li>Calibrated state of the data </li></ul></ul><ul><ul><ul><li>Translation of the sensor output to remove sensor issues or to physical units </li></ul></ul></ul><ul><ul><li>Two works of the same observation would be interpretations of the same raw sensor data </li></ul></ul><ul><ul><li>Also includes catalogs and metadata </li></ul></ul><ul><ul><ul><li>But through other expressions, not directly derived from the observation </li></ul></ul></ul><ul><li>Expression </li></ul><ul><ul><li>The numeric values encoded in the file </li></ul></ul><ul><ul><li>Two expressions of the same work would have been generated from the same calibration of the observation </li></ul></ul>
  11. 11. Limitations <ul><li>Scientific Discipline </li></ul><ul><ul><li>Each discipline has different requirements for attributes describing their data </li></ul></ul><ul><li>Digital Objects </li></ul><ul><ul><li>Does not deal with digitization from analog sources or generation of physical items </li></ul></ul><ul><li>Non-Human Workflow </li></ul><ul><ul><li>May need to model software and other aspects of the data workflow </li></ul></ul>
  12. 12. Limitations <ul><li>Data Collection vs. Data Granule </li></ul><ul><ul><li>Do we model each successive data object, or the full set of aggregated objects? </li></ul></ul><ul><ul><ul><li>Similar to tracking journals vs. articles </li></ul></ul></ul><ul><li>Individual Objects vs. Dynamic Packaging </li></ul><ul><ul><li>Scientific archives are moving to packaging on distribution, rather than storing the data in files </li></ul></ul><ul><li>Data Archives Without Attached Metadata </li></ul><ul><ul><li>Metadata is tracked as a supplementary work that may be contained in the same manifestation to prepare for this eventuality </li></ul></ul>
  13. 13. Sunspot on 15 July 2002 from the Swedish 1-m Solar Telescope on La Palma
  14. 14. [email_address]
  15. 16. Different Observations 171Å 195Å 284Å 304Å
  16. 17. Different Works
  17. 18. Different Expressions <ul><li>Downsampled data </li></ul><ul><ul><li>2x2 binned </li></ul></ul><ul><ul><li>5-min averages </li></ul></ul><ul><ul><li>8bit vs. 16bit pixels </li></ul></ul><ul><li>Lossy compression </li></ul><ul><ul><li>JPEG / JPEG2000 </li></ul></ul><ul><li>Datum extrapolation to fit a different coordinate system </li></ul><ul><li>Any form of data loss </li></ul><ul><li>Any form of data ‘creation’ to fill in missing data </li></ul>
  18. 19. Different Manifestations <ul><li>Changes in Carrier / Packaging: </li></ul><ul><ul><li>Different metadata attached </li></ul></ul><ul><ul><li>Different file formats </li></ul></ul><ul><ul><ul><li>FITS vs. CDF vs. HDF </li></ul></ul></ul><ul><ul><li>Different aggregation </li></ul></ul><ul><ul><ul><li>individual images vs. an hourly collection </li></ul></ul></ul>
  19. 20. Different Items <ul><li>Bytewise identical </li></ul><ul><li>Stored in different locations </li></ul>