Functional and Architectural
           Requirements for Metadata:
                  Supporting Discovery and
                 Management of Scientific Data

      Jian Qin                Alex Ball                Jane Greenberg
School of Information   Digital Curation Centre    School of Information and
       Studies                   UKOLN                  Library Science
 Syracuse University      University of Bath      University of North Caroline
         USA                       UK                  Chapel Hill, USA



          International Conference of Dublin Core Metadata
                Kuching, Malaysia, September 5, 2012
Metadata standards for scientific data

                     Ecological
                     Metadata
                     Language (EML)




CSDGM

                                Access to Biological
                                Collections Data –
                                ABCD

                                Darwin Core
Many tools have been developed…
Many tools have been developed…
Many tools have been developed…
Motivation for adoption
• Standardize the format and terminology of
  metadata
• Enable fast and effective discovery of datasets
  across different data repositories
• Enable data sharing, reuse, and preservation
• Provide information for obtaining datasets
  from the data owners
Hindering factors


• large numbers of           • steep learning curve
  elements                   • difficult to automate
• many layers in structure     metadata generation
   – “Unwieldy to apply”     • unnecessary duplicate
• been created for             data entry
  manual data entry,         • high costs in time,
  rather than for              resource, and
  automatic generation         personnel expertise
Same entity data repeated
                 in the same record…
Seamless Daily Precipitation for the Conterminous United States

Metadata:
Identification_Information
Data_Quality_Information
Spatial_Data_Organization_Information
Spatial_Reference_Information
Entity_and_Attribute_Information
Distribution_Information
Metadata_Reference_Information
…and they are already in…
Research questions

• What functions do metadata standards for
  scientific data serve?

• How should metadata standards for scientific
  data be modeled to support these functions
  by meeting the associated requirements?
Functions expected
• Resource discovery and use,
• Data interoperability,
• Automatic and semi-automatic metadata
  generation,
• Linking of publications and underlying
  datasets,
• Data/metadata quality control, and
• Data security.
Metadata requirements for
     scientific data
Architectural view
Identity metadata


• Person: researcherID,     • Name repositories
  URI                       • Linked data architecture
• Institution: ORCID, URI   • Customizable research
• Data object: DOI,           group/community
  Handle, URI                 member name lists
• Associated publication:
  DOI
Semantic metadata
• Large semantic resources available in linked data format, but
  usually not suitable for representing scientific data because
  they are designed for publications, especially books and
  journals (containers)
• Format is contemporary but the content is far from it



     Smaller, specialized semantic
resources are necessary for automatic
    semantic metadata generation
Contextual
                             metadata
• Provenance

               Provenance data model
               (W3C, http://www.w3.org/TR/prov-primer/)

               Provenance data represent the origins of
               digital objects and describe the entities and
               activities involved in producing and delivering
               or otherwise influencing a given object.
Geospatial metadata
                 Biological sciences                                    Climate
                                        Ecological
                       Darwin           Metadata
  Biological            Core                                             NetCDF
 Data Profile                           Language
                       (DwC)              (EML)                       Climate and
                                                                      Forecast (CF)
                                Georeferencing                         Metadata
  Shoreline                     elements                              Conventions
  Metadata
   Profile                        FGDC               Georeferencing
                                                     elements           Astronomy
CSDGM Profiles
                                 CSDGM
                                                                         Astronomy
                                                                        Visualization
                                                                         Metadata
                           ISO 19115: 2003                                Standard
                        Geographic information -
                              Metadata.
                                       18
Temporal metadata

•   Mean solar time    • Different measurement
•   Civil time           systems result in
•   GPS time             different units and
                         format
•   Terrestrial time
                       • Conversion between
•   Atomic time          systems

• Geologic time
• The least effort principle
• The infrastructure service principle
• The portable principle
TABLE 1. Mapping data user tasks with metadata functions and
                     architectural building blocks

Data          Metadata function       Architectural building block
user
tasks
Discover    Descriptive metadata Identity and semantic metadata
Identify    Descriptive metadata Identity metadata
Select      Descriptive, technical
                                 Identity, semantic, scientific
            metadata             context, geospatial, temporal,
                                 miscellany metadata
Obtain      Descriptive metadata Identify metadata
Verify      Descriptive metadata Scientific context metadata
Analyze                          Scientific context, geospatial, and
                                 temporal metadata
Manage      Descriptive,         Identify, semantic, scientific
            administrative,      context, geospatial, temporal,
            structural, and      miscellany metadata
            technical metadata
Archive     Descriptive,         Identify, semantic, scientific
            administrative,      context, geospatial, temporal,
            structural, and      miscellany metadata
            technical metadata
Publish     Descriptive metadata Identity, semantic, scientific
                                 context, geospatial, and temporal
                                 metadata
Cite        Descriptive metadata Identify metadata
Development potentials
• An infrastructure of metadata services
  – Entities as linked data
  – Tools for “slicing” members by research group,
    community, or institution to customize the entity
    set
  – Tools for grabbing entity data from existing
    resources through interoperability protocols
Application scenarios
• Cross-domain discovery and verification
• Automatically populating entity information
  from customized slices of entities into
  metadata records
• And more…
Conclusion
• Scientific data are inherently complex and diverse
• Functional metadata requirements should be
  translated into an effective and efficient architecture
   – Three principles for modeling metadata for
     scientific data
• Metadata for scientific data (or other domains at
  large) should adopt an infrastructure service approach
• Much to be explored, experimented, and evaluated
Thank you!

Questions?

Functional and Architectural Requirements for Metadata: Supporting Discovery and Management of Scientific Data

  • 1.
    Functional and Architectural Requirements for Metadata: Supporting Discovery and Management of Scientific Data Jian Qin Alex Ball Jane Greenberg School of Information Digital Curation Centre School of Information and Studies UKOLN Library Science Syracuse University University of Bath University of North Caroline USA UK Chapel Hill, USA International Conference of Dublin Core Metadata Kuching, Malaysia, September 5, 2012
  • 2.
    Metadata standards forscientific data Ecological Metadata Language (EML) CSDGM Access to Biological Collections Data – ABCD Darwin Core
  • 3.
    Many tools havebeen developed…
  • 4.
    Many tools havebeen developed…
  • 5.
    Many tools havebeen developed…
  • 6.
    Motivation for adoption •Standardize the format and terminology of metadata • Enable fast and effective discovery of datasets across different data repositories • Enable data sharing, reuse, and preservation • Provide information for obtaining datasets from the data owners
  • 7.
    Hindering factors • largenumbers of • steep learning curve elements • difficult to automate • many layers in structure metadata generation – “Unwieldy to apply” • unnecessary duplicate • been created for data entry manual data entry, • high costs in time, rather than for resource, and automatic generation personnel expertise
  • 8.
    Same entity datarepeated in the same record… Seamless Daily Precipitation for the Conterminous United States Metadata: Identification_Information Data_Quality_Information Spatial_Data_Organization_Information Spatial_Reference_Information Entity_and_Attribute_Information Distribution_Information Metadata_Reference_Information
  • 9.
    …and they arealready in…
  • 10.
    Research questions • Whatfunctions do metadata standards for scientific data serve? • How should metadata standards for scientific data be modeled to support these functions by meeting the associated requirements?
  • 11.
    Functions expected • Resourcediscovery and use, • Data interoperability, • Automatic and semi-automatic metadata generation, • Linking of publications and underlying datasets, • Data/metadata quality control, and • Data security.
  • 12.
  • 14.
  • 15.
    Identity metadata • Person:researcherID, • Name repositories URI • Linked data architecture • Institution: ORCID, URI • Customizable research • Data object: DOI, group/community Handle, URI member name lists • Associated publication: DOI
  • 16.
    Semantic metadata • Largesemantic resources available in linked data format, but usually not suitable for representing scientific data because they are designed for publications, especially books and journals (containers) • Format is contemporary but the content is far from it Smaller, specialized semantic resources are necessary for automatic semantic metadata generation
  • 17.
    Contextual metadata • Provenance Provenance data model (W3C, http://www.w3.org/TR/prov-primer/) Provenance data represent the origins of digital objects and describe the entities and activities involved in producing and delivering or otherwise influencing a given object.
  • 18.
    Geospatial metadata Biological sciences Climate Ecological Darwin Metadata Biological Core NetCDF Data Profile Language (DwC) (EML) Climate and Forecast (CF) Georeferencing Metadata Shoreline elements Conventions Metadata Profile FGDC Georeferencing elements Astronomy CSDGM Profiles CSDGM Astronomy Visualization Metadata ISO 19115: 2003 Standard Geographic information - Metadata. 18
  • 19.
    Temporal metadata • Mean solar time • Different measurement • Civil time systems result in • GPS time different units and format • Terrestrial time • Conversion between • Atomic time systems • Geologic time
  • 20.
    • The leasteffort principle • The infrastructure service principle • The portable principle
  • 21.
    TABLE 1. Mappingdata user tasks with metadata functions and architectural building blocks Data Metadata function Architectural building block user tasks Discover Descriptive metadata Identity and semantic metadata Identify Descriptive metadata Identity metadata Select Descriptive, technical Identity, semantic, scientific metadata context, geospatial, temporal, miscellany metadata Obtain Descriptive metadata Identify metadata Verify Descriptive metadata Scientific context metadata Analyze Scientific context, geospatial, and temporal metadata Manage Descriptive, Identify, semantic, scientific administrative, context, geospatial, temporal, structural, and miscellany metadata technical metadata Archive Descriptive, Identify, semantic, scientific administrative, context, geospatial, temporal, structural, and miscellany metadata technical metadata Publish Descriptive metadata Identity, semantic, scientific context, geospatial, and temporal metadata Cite Descriptive metadata Identify metadata
  • 22.
    Development potentials • Aninfrastructure of metadata services – Entities as linked data – Tools for “slicing” members by research group, community, or institution to customize the entity set – Tools for grabbing entity data from existing resources through interoperability protocols
  • 23.
    Application scenarios • Cross-domaindiscovery and verification • Automatically populating entity information from customized slices of entities into metadata records • And more…
  • 24.
    Conclusion • Scientific dataare inherently complex and diverse • Functional metadata requirements should be translated into an effective and efficient architecture – Three principles for modeling metadata for scientific data • Metadata for scientific data (or other domains at large) should adopt an infrastructure service approach • Much to be explored, experimented, and evaluated
  • 25.