SlideShare a Scribd company logo
Data Equivalence
Mark A. Parsons
and the ESIP Preservation and Stewardship Committee


NISO Forum:
   Tracking it Back to the Source: Managing and Citing Research Data
Denver, Colorado, USA
24 September 2012
The National Snow and Ice Data
                    Center…



Manages and
 distributes                           Performs scientific and
scientific data                          informatics research




                       Supports data
                          users



Creates tools for                       Educates the public
  data access                          about the cryosphere
                    http://nsidc.org
Minimum Arctic Sea Ice Extent.
Image courtesy NASA/Goddard Scientific Visualization Studio
21 Sept. 2012


                                 Minimum, 16 Sept. 3,412,196 km2*




Derived from the NSIDC Sea Ice Index (Fetterer et al., 2009)
                                                               *5-day running mean of daily values
Minimum, 16 Sept. 3,541,406 km2*




From the Arctic Sea Ice Monitor at the IARC-JAXA Information System (IJIS)
http://www.ijis.iarc.uaf.edu/en/home/seaice_extent.htm         *5-day running mean of daily values
Minimum, 16 Sept. 3,541,406 km2*
                                             3.8% greater than NSIDC’s value




From the Arctic Sea Ice Monitor at the IARC-JAXA Information System (IJIS)
http://www.ijis.iarc.uaf.edu/en/home/seaice_extent.htm         *5-day running mean of daily values
Example of differences in SSM/I derived ice concentration
 values calculated with three different passive microwave
 algorithms (Meier et al. 2001)
Designating User Communities by Parsons and Duerr; CODATA, Berlin, 8 Nov. 2004
Metaphor is for most people a device
of the poetic imagination and the
rhetorical flourish— a matter of
extraordinary rather than ordinary
language. Moreover, metaphor is
typically viewed as characteristic of
language alone, a matter of words
rather than thought or action. For this
reason, most people think they can get
along perfectly well without metaphor.

We have found, on the contrary, that
metaphor is pervasive in everyday life,
not just in language but in thought and
action. Our ordinary conceptual
system, in terms of which we both
think and act, is fundamentally
metaphorical in nature.
“Is Data Publication the Right Metaphor?”
 Parsons and Fox. (in press). Data Science Journal

  Preprint at http://mp-datamatters.blogspot.com
Purpose of Data Citation

• Aid scientific reproducibility through direct, unambiguous
  connection to the precise data used
• Credit for data authors and stewards
• Accountability for creators and stewards
• Track impact of data set
• Help identify data use (e.g., trackbacks)
   • Data authors can verify how their data are being used.
   • Users can better understand the application of the data.


• A locator/reference mechanism not a discovery mechanism per se


                                                                   9
“Bridging Data Lifecycles: Tracking Data Use
via Data Citations” UCAR Workshop Report

• Recommendation 1:
  Identify what you want to achieve via data citations
• Recommendation 2: Understand the options for
  actionable identifier schemes
• Recommendation 3: Engage stakeholders
• Recommendation 4: Start with well-bounded cases
• Recommendation 5: Plan for long-term implications


• http://library.ucar.edu/data_workshop/
                                                         10
How data citation is currently done

• Citation of traditional publication that actually contains the data,
  e.g. a parameterization value.
• Not mentioned, just used, e.g., in tables or figures
• Reference to name or source of data in text
• URL in text (with variable degrees of specificity)
• Citation of related paper (e.g. CRU Temp. records recommend
  citing two old journal articles which do not contain the actual
  data or full description of methods)
• Citation of actual data set typically using recommended citation
  given by data center
• Citation of data set including a persistent identifier/locator,
  typically a DOI
                                                                         11
2009
        1.7%

      2008
         1.3%

       2007
        0.9%


       2006
        1.3%


       2005
        0.7%


       2004
        0.7%


      2003
         1.0%
                                                       Formal Citation
                                                       Total Entries
       2002
        1.3%

               0
          100
   200
   300
   400
       500
      600


“MODIS Snow Cover Data” in Google Scholar
Data Citation Guidelines

• Federation of Earth Science Information Partners. 2012. http://bit.ly/data_citation
  and related guidelines for the Group on Earth Observations (GEO)
    • Best available for Earth system science. Not yet widely adopted but growing.
• Digital Curation Center. 2011. http://www.dcc.ac.uk/resources/how-guides/cite-
  datasets
    • Best overall guide. Not yet widely adopted but growing.
• DataCite—a well-recognized consortium of libraries and related organizations
  working to define a citation approach around DOIs and working to get data
  citations included in citation indices.
• DataVerse Network Project—a standard from the social science community using
  a Handle locator and “Universal Numerical Fingerprint” as a unique identifier.
• New CODATA Task Group in collaboration with ICSTI. Report due soon.
• NASA DAACs, NCAR, some NOAA centers adopting ESIP-based approaches.
• More consistency is emerging but there is still great variation in recommended
  approach. They range from specific data citation, to general acknowledgement,
  to recommending citing a journal article, or even a presentation.
                                                                                        13
Basic data citation form and content

Per DataCite:
Creator. PublicationYear. Title. [Version]. Publisher.
[ResourceType]. Identifier.




Per ESIP:
Author(s). ReleaseDate. Title, [version]. [editor(s)]. Archive and/or
Distributor. Locator. [date/time accessed]. [subset used].




                                                                        14
An Example Citation

Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston.
  2002, Updated 2003. CLPX-Ground: ISA snow depth
  transects and related measurements ver. 2.0. Edited by M.
  Parsons and M. J. Brodzik. Boulder, CO: National Snow
  and Ice Data Center. Data set accessed 2008-05-14 at
  http://nsidc.org/data/nsidc-0175.html.
Version

Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston.
  2002, Updated 2003. CLPX-Ground: ISA snow depth
  transects and related measurements, ver. 2.0. Edited by M.
  Parsons and M. J. Brodzik. Boulder, CO: National Snow
  and Ice Data Center. Data set accessed 2008-05-14 at
  http://nsidc.org/data/nsidc-0175.html.
Locator

Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston.
  2002, Updated 2003. CLPX-Ground: ISA snow depth
  transects and related measurements, ver. 2.0. Edited by M.
  Parsons and M. J. Brodzik. Boulder, CO: National Snow
  and Ice Data Center. Data set accessed 2008-05-14 at
  http://nsidc.org/data/nsidc-0175.html.
Locator

Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston.
  2002, Updated 2003. CLPX-Ground: ISA snow depth
  transects and related measurements, ver. 2.0. Edited by M.
  Parsons and M. J. Brodzik. Boulder, CO: National Snow
  and Ice Data Center. Data set accessed 2012-09-22 at
  http://dx.doi.org/10.5060/D4H41PBP.
Identifier vs. Locator

• Human ID: Mark Alan Parsons (son of Robert A. and Ann M., etc.)
    • every term defined independently and only unique in context/
      provenance (remember that).
    • Alternative like a social security number requires a very well
      controlled central authority.
• Human Locator: 1540 30th St., Room 201, Boulder CO 80303.
    • every term has a naming authority


• Data Set IDs: data set title, filename, database key, object id code (e.g.
  UUID), etc.
• Data set Locators: URL, directory structure, catalog number, registered
  locator (e.g. DOI), etc.
                                                                              19
An assessment of identification schemes for
 digital Earth science data
                                  Unique               Unique                Citable           Scientifically
                                 Identifier             Locator               Locator            Unique ID
         ID Scheme           Data Set     Item     Data Set     Item    Data Set     Item     Data Set     Item

            URL/N/I
             PURL
               XRI
            Handle
               DOI
              ARK
              LSID                                                                                 Good
               OID                                                                                  Fair
                                                                                                   Poor
              UUID
Adapted from Duerr, R. E., et al.. 2011. On the utility of identification schemes for digital Earth science data: An
assessment and recommendations. Earth Science Informatics. 4:139-160.                                            20
http://dx.doi.org/10.1007/s12145-011-0083-6
An assessment of identification schemes for
       digital Earth science data
                                     Unique               Unique                Citable           Scientifically
                                    Identifier             Locator               Locator            Unique ID
            ID Scheme           Data Set     Item     Data Set     Item    Data Set     Item     Data Set     Item
       s




               URL/N/I
   or
  at
   c




                PURL
Lo




                  XRI
               Handle
                  DOI
                 ARK
   ers




                 LSID                                                                                 Good
 ntifi




                  OID                                                                                  Fair
Ide




                                                                                                      Poor
                 UUID
   Adapted from Duerr, R. E., et al.. 2011. On the utility of identification schemes for digital Earth science data: An
   assessment and recommendations. Earth Science Informatics. 4:139-160.                                            20
   http://dx.doi.org/10.1007/s12145-011-0083-6
Why the DOI?

• Not perfect but well understood by publishers
• DataCite working with Thomson Reuters to get data citations in their
  index.


  But...
   • What is the citable unit?
   • How do we handle different versions?
   • What about “retired” data?
   • When is a DOI assigned?



                                                                         21
Issues largely resolved by...

• A defined versioning scheme
• Good tracking and documentation of the versions
• Due diligence in archive and release practices




                                                    22
When to assign a DOI?

• First principle: Data should be reference-able as soon as they are
  available for use by anyone other than the original authors.
• But...
    • Most people (falsely) believe that a DOI assures permanence so
      how do we cite transient data?
    • Some believe that a DOI should not be assigned until the data has
      undergone some level of review (e.g. Lawrence et al. 2011). So
      how do we cite data used before the review?
    • Data are often used by friends and collaborators in a raw,
      “unpublished” state. Should this use be cited with a DOI?
    • Near real time or preliminary data may only be available for a short
      uncurated, period, and there may not be a good match between
      the submission package and the distribution package. What gets
      the DOI? When?                                                         23
Versioning approach recommended by DCC

• “As DOIs are used to cite data as evidence, the dataset to which a
  DOI points should also remain unchanged, with any new version
  receiving a new DOI.”
• “There are two possible approaches the data repository can take: time
  slices and snapshots.”




                                                                          24
Versioning and locators:
some suggestions from NSIDC

• major version.minor version.[archive version]
• Individual stewards need to determine which are major vs. minor versions and describe
  the nature and file/record range of every version.
• Assign DOIs to major versions.
• Old DOIs should be maintained and point to some appropriate page that explains what
  happened to the old data if they were not archived.
• A new major version leads to the creation of a new collection-level metadata record that
  is distributed to appropriate registries. The older metadata record should remain with a
  pointer to the new version and with explanation of the status of the older version data.
• Major and minor version (after the first version) should be exposed with the data set title
  and recommended citation.
• Minor versions should be explained in documentation, ideally in file-level metadata.
• Applying UUIDs to individual files upon ingest aids in tracking minor versions and
  historical citations.


                                                                                               25
Basic data citation form and content

Author(s). ReleaseDate. Title, Version. [editor(s)]. Archive and/or
Distributor. Locator. [date/time accessed]. [subset used].




The best solution may be to have unique identifiers or query IDs
for subsets, but that won’t be available for most data sets for a
long time, so we need alternative solutions...




                                                                      26
February 8, 2011, 4:45 PM

    Page Numbers for Kindle Books an Imperfect Solution
    Neither solution is perfect—‘locations’ or
    page numbers—because the problem is
    unsolvable. The best we can hope for is a
    choice...




                                                                 Amazon’s Kindle will have
                                                                 page numbers that
                                                                 correspond to real books
                                                                 and locations by passage.


http://pogue.blogs.nytimes.com/2011/02/08/page-numbers-for-kindle-books-an-imperfect-solution/
Chapter and Verse

• Bible

• Koran

• Bhagavad-Gita and
  Ramayana

• other sacred texts




• A “structural index”
The “Archive Information Unit”

“An Archival Information Package whose Content Information is not
further broken down into other Content Information components, each
of which has its own complete Preservation Description Information. It
can be viewed as an ‘atomic’ AIP”
“From an Access viewpoint, new subsetting and manipulation
capabilities are beginning to blur the distinction between AICs and AIUs.
Content objects which used to be viewed as atomic can now be viewed
as containing a large variation of contents based on the subsetting
parameters chosen. In a more extreme example, the Content
Information of an AIU may not exist as a physical entity. The Content
Information could consist of several input files (or pointers to the AIPs
containing these data files) and an algorithm which uses these files to
create the data object of interest.”
• CCSDS. 2002. Reference Model for An Open Archival Information System (OAIS)
  CCSDS 650.0-B-1 Issue 1. Washington, DC: CCSDS Secretariat. p. 1-8, 4-38.	
                                                                                29
Citation scenarios and production patterns

• What kind of “atomic” item is being cited—the “Archive Information
  Unit (AIU)” (e.g., a data file, a data element within a file, a relational (or
  other) database, a job “residue”)?
• How many AIUs items are in a typical citation for the scenario being
  considered?
• What other digital or physical objects need to be available to make the
  unit usable—the “Preservation Description Information (PDI)”?


  Key Question:
• What structure or structures can we use to organize data collections
  that might be common across Earth sciences?


                                                                                 30
An example production pattern for Cline et al. (2003).
32
33
34
35
36
37
38
39
40
41
42
A production pattern for Cline et al., 2003



   field notebook                    Excel v1                                       printout                   Excel v2


                       TRANSECT,IOP ,DATE      ,TIME,UTME ,UTMN    ,DEPTH      ,SWET,SRUF,CNPY,
                       TEMP,SURVEYOR                                         ,QC
                       ,COMMENTS
                               ,     ,         ,    ,      ,       ,cm         ,    ,     ,    ,       deg-
                       F,                                               ,                          ,
                       FAA01.1 ,iop4,2003-03-25,1017,425941,4410860,        104,   d,    y,   n,
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)                  ","
                       "
                       FAA01.2 ,iop4,2003-03-25,1017,425956,4410860,         13,   d,    n,   n,
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)                  ","
                       "
                       ...
                       FAA04.1 ,iop4,2003-03-25,1221,425938,4411193,        325,   d,   y,   n,
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)
                       ","Couldn't find post, used GPS 5940 1197; FAA4.4    and FAA4.5 unsafe, avalanche
                       area!

analog to                                                ascii files
digital w/ QC
                                                                                                                         shapefiles

born
digital


                   camera                                                   interim jpgs                          collated and named jpgs
A production pattern for Cline et al., 2003



   field notebook                    Excel v1                                       printout                   Excel v2
                   distributed data set

                       TRANSECT,IOP ,DATE      ,TIME,UTME ,UTMN    ,DEPTH      ,SWET,SRUF,CNPY,
                       TEMP,SURVEYOR                                         ,QC
                       ,COMMENTS
                               ,     ,         ,    ,      ,       ,cm         ,    ,     ,    ,       deg-
                       F,                                               ,                          ,
                       FAA01.1 ,iop4,2003-03-25,1017,425941,4410860,        104,   d,    y,   n,
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)                  ","
                       "
                       FAA01.2 ,iop4,2003-03-25,1017,425956,4410860,         13,   d,    n,   n,
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)                  ","
                       "
                       ...
                       FAA04.1 ,iop4,2003-03-25,1221,425938,4411193,        325,   d,   y,   n,
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)
                       ","Couldn't find post, used GPS 5940 1197; FAA4.4    and FAA4.5 unsafe, avalanche
                       area!

analog to                                                ascii files
digital w/ QC
                                                                                                                         shapefiles

born
digital


                   camera                                                   interim jpgs                          collated and named jpgs
A production pattern for Cline et al., 2003



   field notebook                    Excel v1                                       printout                   Excel v2
                   distributed data set

                       TRANSECT,IOP ,DATE      ,TIME,UTME ,UTMN    ,DEPTH      ,SWET,SRUF,CNPY,
                       TEMP,SURVEYOR                                         ,QC




             +
                       ,COMMENTS

   HTML                F,
                               ,     ,         ,    ,      ,       ,cm

                       FAA01.1 ,iop4,2003-03-25,1017,425941,4410860,
                                                                        ,
                                                                               ,

                                                                            104,
                                                                                    ,

                                                                                   d,
                                                                                          ,

                                                                                         y,
                                                                                               ,

                                                                                              n,
                                                                                                   ,
                                                                                                       deg-




   Doc.
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)                  ","
                       "
                       FAA01.2 ,iop4,2003-03-25,1017,425956,4410860,         13,   d,    n,   n,
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)                  ","
                       "
                       ...
                       FAA04.1 ,iop4,2003-03-25,1221,425938,4411193,        325,   d,   y,   n,
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)
                       ","Couldn't find post, used GPS 5940 1197; FAA4.4    and FAA4.5 unsafe, avalanche
                       area!

analog to                                                ascii files
digital w/ QC
                                                                                                                         shapefiles

born
digital


                   camera                                                   interim jpgs                          collated and named jpgs
A production pattern for Cline et al., 2003



   field notebook                    Excel v1                                       printout                   Excel v2
                   distributed data set

                       TRANSECT,IOP ,DATE      ,TIME,UTME ,UTMN    ,DEPTH      ,SWET,SRUF,CNPY,
                       TEMP,SURVEYOR                                         ,QC




             +
                       ,COMMENTS

   HTML                        ,     ,         ,    ,      ,       ,cm         ,    ,     ,    ,       deg-




                                                 100s                                                                100s
                       F,                                               ,                          ,
                       FAA01.1 ,iop4,2003-03-25,1017,425941,4410860,        104,   d,    y,   n,

   Doc.
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)                  ","
                       "
                       FAA01.2 ,iop4,2003-03-25,1017,425956,4410860,         13,   d,    n,   n,
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)                  ","
                       "
                       ...
                       FAA04.1 ,iop4,2003-03-25,1221,425938,4411193,        325,   d,   y,   n,
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)
                       ","Couldn't find post, used GPS 5940 1197; FAA4.4    and FAA4.5 unsafe, avalanche
                       area!

analog to                                                ascii files
digital w/ QC
                                                                                                                         shapefiles

born
digital
                                                                                                                  1000s
                   camera                                                   interim jpgs                          collated and named jpgs
A production pattern for Cline et al., 2003



   field notebook                    Excel v1                                       printout                   Excel v2
                   distributed data set

                       TRANSECT,IOP ,DATE      ,TIME,UTME ,UTMN    ,DEPTH      ,SWET,SRUF,CNPY,
                       TEMP,SURVEYOR                                         ,QC




             +
                       ,COMMENTS

   HTML                        ,     ,         ,    ,      ,       ,cm         ,    ,     ,    ,       deg-




                                                 100s                                                                100s
                       F,                                               ,                          ,
                       FAA01.1 ,iop4,2003-03-25,1017,425941,4410860,        104,   d,    y,   n,

   Doc.
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)                  ","
                       "
                       FAA01.2 ,iop4,2003-03-25,1017,425956,4410860,         13,   d,    n,   n,
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)                  ","
                       "
                       ...
                       FAA04.1 ,iop4,2003-03-25,1221,425938,4411193,        325,   d,   y,   n,
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)
                       ","Couldn't find post, used GPS 5940 1197; FAA4.4    and FAA4.5 unsafe, avalanche
                       area!

analog to                                                ascii files
digital w/ QC          tarball                                                                                           shapefiles

born
digital
                                                                                                                  1000s
                   camera                                                   interim jpgs                          collated and named jpgs
A production pattern for Cline et al., 2003



   field notebook                    Excel v1                      Couldn't find post, v2
                                                                      printout      Excel
                   distributed data set                           used GPS 5940
                                                                  1197; FAA4.4 and
                       TRANSECT,IOP ,DATE      ,TIME,UTME ,UTMN    ,DEPTH      ,SWET,SRUF,CNPY,
                       TEMP,SURVEYOR
                                                                  FAA4.5 unsafe,
                                                                             ,QC




             +
                       ,COMMENTS

   HTML                        ,     ,         ,    ,      ,       ,cm         ,    ,     ,    ,       deg-




                                                 100s
                                                                  avalanche area!
                                                                                                                 100s
                       F,                                               ,                          ,
                       FAA01.1 ,iop4,2003-03-25,1017,425941,4410860,        104,   d,    y,   n,

   Doc.
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)                  ","
                       "
                       FAA01.2 ,iop4,2003-03-25,1017,425956,4410860,         13,   d,    n,   n,
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)                  ","
                       "
                       ...
                       FAA04.1 ,iop4,2003-03-25,1221,425938,4411193,        325,   d,   y,   n,
                       -999,"Fitzgerald, Matous, Dundas                     ","QC(000)
                       ","Couldn't find post, used GPS 5940 1197; FAA4.4    and FAA4.5 unsafe, avalanche
                       area!

analog to                                                ascii files
digital w/ QC          tarball                                                                                     shapefiles

born
digital
                                                                                                              1000s
                   camera                                                   interim jpgs                      collated and named jpgs
Crude, inaccurate production pattern for MODIS/Aqua Snow Cover
         Daily L3 Global 500m Grid V005 (Hall et al., 2007)
Crude, inaccurate production pattern for MODIS/Aqua Snow Cover
         Daily L3 Global 500m Grid V005 (Hall et al., 2007)

                        Archives
 GSFC                  EDC       NSIDC




           MODAPs Processing
Crude, inaccurate production pattern for MODIS/Aqua Snow Cover
         Daily L3 Global 500m Grid V005 (Hall et al., 2007)

                        Archives
 GSFC                  EDC       NSIDC




                                     1 file/day/tile (grid cell)
                                     Each file contains
                                     metadata describing
                                     previous inputs and
                                     detailed versioning
           MODAPs Processing
Crude, inaccurate production pattern for MODIS/Aqua Snow Cover
         Daily L3 Global 500m Grid V005 (Hall et al., 2007)

                        Archives
 GSFC                  EDC       NSIDC


                                            1,000,000s


                                     1 file/day/tile (grid cell)
                                     Each file contains
                                     metadata describing
                                     previous inputs and
                                     detailed versioning
           MODAPs Processing
Doing it as best we can...?

• Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007,
  updated daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid
  V005.3, Oct. 2007- Sep. 2008, 84°N, 75°W; 44°N, 10°W. Boulder,
  Colorado USA: National Snow and Ice Data Center. Data set accessed
  2008-11-01 at http://dx.doi.org/10.1234/xxx.
• Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007,
  updated daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid
  V005.3, Oct. 2007- Sep. 2008, Tiles (15,2;16,0;16,1;16,2;17,0;17,1).
  Boulder, Colorado USA: National Snow and Ice Data Center. Data set
  accessed 2008-11-01 at http://dx.doi.org/10.1234/xxx.
• Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002,
  Updated 2003. CLPX-Ground: ISA snow depth transects and related
  measurements, Version 2.0, shapefiles. Edited by M. Parsons and M.
  J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set
  accessed 2008-05-14 at http://dx.doi.org/10.5060/D4H41PBP.             45
Sea Ice Index
                  Fetterer et al. 2009



 “Near Real Time”
                                          “Preliminary”
Maslanik and Stroeve
                                         Meir et al. 2006
       1999


                        “Final”
                  Cavalieri et al 1996
Remote Sensing Systems
Sea Ice Production – Data Workflow                                                       TBs (Wentz)
                                                                                                                                                                                Color Key
                                                                                                                                                                                Source (data)
                                                                                                                                                                                Value added product
                                                                                                                                                                                 Near-Real-Time product
         NSIDC-0002                                                                                                                                                             Preliminary Product
       SSM/I Daily and                                                                      NSIDC-0001
                                                                                        SSM/I Polar Stereo Tbs                                                                  Final Product
        Monthly Polar
      Gridded Bootstrap                                                                                                                                                         Not part of discussion

Deaccession to begin 1/2012




                                                                                      NSIDC-0051                                  Goddard Space                                  NSIDC-0079
          Fowler                            Anderson                                                                               Flight Center                          Preliminary Bootstrap Sea
                                                                                   Preliminary Sea Ice
      University of CO                   University of NE                                                                             (GSFC)                              Ice Concentrations from
                                                                                     Concentrations
                                                                                 From SMMR and SSM/I                                                                           Nimbus-7 SMMR
                                                                                 Passive Microwave Data                                                                        and DMSP SSM/I

       NISDC-0116                           NSIDC-0105
                                          Snow Melt Onset                                                                                                                                Data not yet
     Polar Pathfinder
                                                                                                                                                                                         distributed
       Daily 25 km                       Over Arctic Sea Ice
    EASE-Grid Sea Ice                     from SMMR and                                                    NSIDC-0051                                        NSIDC-0079
     Motion Vectors                          SSM/I Tbs                                                Sea Ice Concentrations                               Bootstrap Sea Ice
                                                                                                      from SMMR and SSM/I                                 Concentrations from
                                                                                                     Passive Microwave Data                                SMMR and SSM/I


                                                                       NASA Team
      G00791                                                          production line
   IABP Drifting          NSIDC-0066
                          AVHRR Polar
        Buoy
     Pressure,             Pathfinder
                           Twice-Daily                                                                                                  G02202
   Temperature,                                                                                                                     NOAA/NSIDC CDR
                          5 km EASE-
   Position, and
    Interpolated              Grid
                           Composites
    Ice Velocity


                                                                                       NSIDC-0081                                     NSIDC-0192
                                                     NSIDC-0046                   Near-Real-Time DMSP                              Sea Ice Trends and
                     Robinson                      NH EASE-Grid                  SSM/I Daily Polar Gridded                      Climatologies from SMMR
                   Psuedo-weekly                       Weekly                     Sea Ice Concentrations                               and SSM/I
                    Snow Cover                   Snow Cover and Sea
                      Rutgers                       Ice Extent V3




                                                                                       NSIDC-0080
                                                                                  Near-Real-Time DMSP                                                                           NISE on NEO
     World Wide Web                             G02135                           SSM/I Daily Polar Gridded                                         NISE
       Arctic Sea Ice                         Sea Ice Index                                 Tbs
     News and Analysis



                                                                                                                      CLASS
                                                                                                                      F17 TBs
                                                                                                                                                                                Last updated: D Scott 12/2011
MODIS                           chlorophyll
level 0
           SeaWiFS         GSM            phytoplankton                  …
level 1a                   algorithms
             ...                           particulates                  …
                            software
level 1b                    calibration
                           ...
level 2
              reprocess,
  …           revalidate

                                                                             48
                                                  slide courtesy Greg Janee, UCSB
Hypothesis: ~80% of citation scenarios for 80% of
Earth system science data can be addressed with
basic citations (Author(s). ReleaseYear. Title, Version.
[editor(s)]. Archive. Locator. [date/time accessed]. [subset used].),
a solid methods section, and reasonable due
diligence by archives.
Future directions on scientific equivalence
Content equivalence and provenance equivalence
serve as a proxy for scientific equivalence.

Content Equivalence: Is there an algorithm that can consider the
content of a file and come up with a unique identifier that will be the
same for objects in the same “content equivalence class”?
   • Exact content equivalence can use digital signature or
     cryptographic techniques MD5, SHA-1, etc.  
   • Universal Numeric Footprint (UNF) takes a digital signature of a
     “canonical” representation of the information, but canonical is a
     problem.
   • How can we define loose or scientific content equivalence? Are
     the shapefiles and text files in Cline et al. (2003) equivalent?




                                                                         51
Content equivalence and provenance equivalence
serve as a proxy for scientific equivalence.

Provenance Equivalence:  A process is reproducible when there is
sufficient creation provenance details for someone else to make an
equivalent file. If someone follows those provenance details to re-create
the object, the resulting object will be equivalent to the original.   
   • If we can enumerate/list sufficient or “essential” creation
     provenance details to  make an equivalent file, then we can
     describe an algorithm to  produce an identifier that will be the
     same for files that match those provenance details.
   • Algorithm can be similar to the content equivalence algorithm
     before: take a digital signature of a canonical representation of the
     information in this case a canonical serialization of processes.




                                                                             52
Thank You
parsonsm@nsidc.org




                     photo courtesy NOAA
Backup slides
More examples
• Glacier Photos: chemical creation of image, digitization, multi-source, no
  versioning
• Hurricane Ike: direct digital creation, embedded software, single-source, no
  versioning
• Rock or Ice Cores: physical specimens with later experimental operations to
  create data collections – digital results may be stored in databases with complex
  provenance
• SeaDataNet: Many individual Conductivity, Temperature, Depth measurements
  from many individual cruises compiled into one database.
• Global Historical Climate Network: In situ measurements of Temp and Humidity,
  recorded as time series and appended to files (with some quality control) – multi-
  source, some versioning
• Radiosonde Network: Balloon-borne Temp and Humidity from about 20,000 sites
  every 12 or 6 hours assimilated into weather forecasts – archives maintained at
  NCAR, GSFC, and elsewhere – reanalysis can create new versions
• Large-scale Satellite Data Production: complex, large-scale data flows from
  multiple sources, systematic approaches to versioning where the structure of one
  version of a data product resembles the next                                        55
Basic data citation form and content

Mandatory:                    Optional:
 Author(s)                     Editor(s)
 Release Date                  Archive or Distributor Place
 Title                         Other Institutional Role
 Version                       Indication of subset used
 Archive and/or Distributor
 Locator or Persistent
 Location service
 Time, date accessed


                                                              56
Serving, Citing and Publishing Data
Citation forms an important part of the
     scientific record.
                                          Doi:10232/123ro
                                                                           This involves the peer-review of data
We draw a clear distinction between:                     2.                sets, and gives “stamp of approval”
                                              Publication of data sets     associated with traditional journal
publishing = making available for                                          publications. Can’t be done without
    consumption (e.g. on the web),                                         effective linking/citing of the data
                                                                           sets.
    and

                                          Doi:10232/123
Publishing = publishing after some                        1.               This is our first step for this project –
    formal process which adds value                Data set Citation       formulate and formalise a way of
    for the consumer:                                                      citing data sets. Will provide benefits
•   e.g. PloS ONE type review, or                                          to our users – and a carrot to get
                                                                           them to provide data to us!
•   EGU journal type public review, or
•   More traditional peer review.
                                                          0.
AND                                              Serving of data sets      This is what data centres do as our
•   provides commitment to                                                 day job – take in data supplied by
                                                                           scientists and make it available to
    persistence                                                            other interested parties.




                                                                         slide courtesy S. Callaghan, BADC


                                          VO Sandpit, November 2009

More Related Content

What's hot

Data managementbasics issr_20130301
Data managementbasics issr_20130301Data managementbasics issr_20130301
Data managementbasics issr_20130301
Rebecca Reznik-Zellen
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
Robert H. McDonald
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
tsbbbu
 
Research Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassResearch Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities Class
Aaron Collie
 
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Dimitrios Koureas
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering Students
Aaron Collie
 
The Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRSThe Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRS
Robert Oostenveld
 
IASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrIASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrCarly Strasser
 
Metadata Workshop
Metadata WorkshopMetadata Workshop
Metadata Workshop
Rachel Lovinger
 
Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...
ASIS&T
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
University of Arizona
 
NISO DCMI Webinar bibframe-20130123
NISO DCMI Webinar bibframe-20130123NISO DCMI Webinar bibframe-20130123
NISO DCMI Webinar bibframe-20130123
National Information Standards Organization (NISO)
 
A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4
Leon Osinski
 
Small Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific RepositoriesSmall Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific Repositories
Anita de Waard
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataGudmundur Thorisson
 
Making Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLMaking Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDL
Carly Strasser
 
The Donders Repository
The Donders RepositoryThe Donders Repository
The Donders Repository
Robert Oostenveld
 
Data Management: The Current Landscape
Data Management: The Current LandscapeData Management: The Current Landscape
Data Management: The Current Landscape
Carly Strasser
 
Michener Plenary PPSR2012
Michener Plenary PPSR2012Michener Plenary PPSR2012
Michener Plenary PPSR2012
CitizenScience.org
 

What's hot (20)

Data managementbasics issr_20130301
Data managementbasics issr_20130301Data managementbasics issr_20130301
Data managementbasics issr_20130301
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
 
Research Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassResearch Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities Class
 
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering Students
 
The Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRSThe Brain Imaging Data Structure and its use for fNIRS
The Brain Imaging Data Structure and its use for fNIRS
 
IASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrIASSIST identifiers By Joan Starr
IASSIST identifiers By Joan Starr
 
Metadata Workshop
Metadata WorkshopMetadata Workshop
Metadata Workshop
 
Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
NISO DCMI Webinar bibframe-20130123
NISO DCMI Webinar bibframe-20130123NISO DCMI Webinar bibframe-20130123
NISO DCMI Webinar bibframe-20130123
 
A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4
 
Small Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific RepositoriesSmall Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific Repositories
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research Data
 
Making Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLMaking Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDL
 
The Donders Repository
The Donders RepositoryThe Donders Repository
The Donders Repository
 
Data Management: The Current Landscape
Data Management: The Current LandscapeData Management: The Current Landscape
Data Management: The Current Landscape
 
NISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector AdministrationNISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector Administration
 
Michener Plenary PPSR2012
Michener Plenary PPSR2012Michener Plenary PPSR2012
Michener Plenary PPSR2012
 

Similar to NISO Forum, Denver, Sept. 24, 2012: Data Equivalence

Identity, Location, and Citation at NEON
Identity, Location, and Citation at NEONIdentity, Location, and Citation at NEON
Identity, Location, and Citation at NEON
Mark Parsons
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Natsuko Nicholls
 
RDAP13 Jian Qin: Functional and Architectural Requirements for Metadata
RDAP13 Jian Qin: Functional and Architectural Requirements for MetadataRDAP13 Jian Qin: Functional and Architectural Requirements for Metadata
RDAP13 Jian Qin: Functional and Architectural Requirements for Metadata
ASIS&T
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Jian Qin
 
DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data Citation
DataONE
 
DataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefDataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefCrossref
 
GBIF and reuse of research data, Bergen (2016-12-14)
GBIF and reuse of research data, Bergen (2016-12-14)GBIF and reuse of research data, Bergen (2016-12-14)
GBIF and reuse of research data, Bergen (2016-12-14)
Dag Endresen
 
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
nabo_ghea
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic Record
Kerstin Lehnert
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
Robert H. McDonald
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
aceas13tern
 
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
Publishing of Scientific Data  - Science Foundation Ireland Summit 2010Publishing of Scientific Data  - Science Foundation Ireland Summit 2010
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
jodischneider
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
Philippe Rocca-Serra
 
TDWG_2010_Chavan_data_citation
TDWG_2010_Chavan_data_citationTDWG_2010_Chavan_data_citation
TDWG_2010_Chavan_data_citation
Vishwas Chavan
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Carole Goble
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014
iedadata
 
Open Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon HodsonOpen Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon Hodson
Academy of Science of South Africa (ASSAf)
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
Anita de Waard
 
Data Management Planning - 02/21/13
Data Management Planning - 02/21/13Data Management Planning - 02/21/13
Data Management Planning - 02/21/13
Lizzy_Rolando
 
Inter-university Upper atmosphere Global Observation NETwork (IUGONET)
Inter-university Upper atmosphere Global Observation NETwork (IUGONET) Inter-university Upper atmosphere Global Observation NETwork (IUGONET)
Inter-university Upper atmosphere Global Observation NETwork (IUGONET) Iugo Net
 

Similar to NISO Forum, Denver, Sept. 24, 2012: Data Equivalence (20)

Identity, Location, and Citation at NEON
Identity, Location, and Citation at NEONIdentity, Location, and Citation at NEON
Identity, Location, and Citation at NEON
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
 
RDAP13 Jian Qin: Functional and Architectural Requirements for Metadata
RDAP13 Jian Qin: Functional and Architectural Requirements for MetadataRDAP13 Jian Qin: Functional and Architectural Requirements for Metadata
RDAP13 Jian Qin: Functional and Architectural Requirements for Metadata
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data Citation
 
DataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefDataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRef
 
GBIF and reuse of research data, Bergen (2016-12-14)
GBIF and reuse of research data, Bergen (2016-12-14)GBIF and reuse of research data, Bergen (2016-12-14)
GBIF and reuse of research data, Bergen (2016-12-14)
 
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
Managing Social Science Data from the Arctic with ELOKA, ACADIS, NSIDC, and (...
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic Record
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
 
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
Publishing of Scientific Data  - Science Foundation Ireland Summit 2010Publishing of Scientific Data  - Science Foundation Ireland Summit 2010
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
 
TDWG_2010_Chavan_data_citation
TDWG_2010_Chavan_data_citationTDWG_2010_Chavan_data_citation
TDWG_2010_Chavan_data_citation
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014
 
Open Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon HodsonOpen Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon Hodson
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
Data Management Planning - 02/21/13
Data Management Planning - 02/21/13Data Management Planning - 02/21/13
Data Management Planning - 02/21/13
 
Inter-university Upper atmosphere Global Observation NETwork (IUGONET)
Inter-university Upper atmosphere Global Observation NETwork (IUGONET) Inter-university Upper atmosphere Global Observation NETwork (IUGONET)
Inter-university Upper atmosphere Global Observation NETwork (IUGONET)
 

More from National Information Standards Organization (NISO)

Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
National Information Standards Organization (NISO)
 
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
National Information Standards Organization (NISO)
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
National Information Standards Organization (NISO)
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
National Information Standards Organization (NISO)
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
National Information Standards Organization (NISO)
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
National Information Standards Organization (NISO)
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
National Information Standards Organization (NISO)
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
National Information Standards Organization (NISO)
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
National Information Standards Organization (NISO)
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
National Information Standards Organization (NISO)
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
National Information Standards Organization (NISO)
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
National Information Standards Organization (NISO)
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
National Information Standards Organization (NISO)
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
National Information Standards Organization (NISO)
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
National Information Standards Organization (NISO)
 

More from National Information Standards Organization (NISO) (20)

Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
 
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 

Recently uploaded

Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
DhatriParmar
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 

Recently uploaded (20)

Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 

NISO Forum, Denver, Sept. 24, 2012: Data Equivalence

  • 1. Data Equivalence Mark A. Parsons and the ESIP Preservation and Stewardship Committee NISO Forum: Tracking it Back to the Source: Managing and Citing Research Data Denver, Colorado, USA 24 September 2012
  • 2. The National Snow and Ice Data Center… Manages and distributes Performs scientific and scientific data informatics research Supports data users Creates tools for Educates the public data access about the cryosphere http://nsidc.org
  • 3. Minimum Arctic Sea Ice Extent. Image courtesy NASA/Goddard Scientific Visualization Studio
  • 4. 21 Sept. 2012 Minimum, 16 Sept. 3,412,196 km2* Derived from the NSIDC Sea Ice Index (Fetterer et al., 2009) *5-day running mean of daily values
  • 5. Minimum, 16 Sept. 3,541,406 km2* From the Arctic Sea Ice Monitor at the IARC-JAXA Information System (IJIS) http://www.ijis.iarc.uaf.edu/en/home/seaice_extent.htm *5-day running mean of daily values
  • 6. Minimum, 16 Sept. 3,541,406 km2* 3.8% greater than NSIDC’s value From the Arctic Sea Ice Monitor at the IARC-JAXA Information System (IJIS) http://www.ijis.iarc.uaf.edu/en/home/seaice_extent.htm *5-day running mean of daily values
  • 7. Example of differences in SSM/I derived ice concentration values calculated with three different passive microwave algorithms (Meier et al. 2001) Designating User Communities by Parsons and Duerr; CODATA, Berlin, 8 Nov. 2004
  • 8. Metaphor is for most people a device of the poetic imagination and the rhetorical flourish— a matter of extraordinary rather than ordinary language. Moreover, metaphor is typically viewed as characteristic of language alone, a matter of words rather than thought or action. For this reason, most people think they can get along perfectly well without metaphor. We have found, on the contrary, that metaphor is pervasive in everyday life, not just in language but in thought and action. Our ordinary conceptual system, in terms of which we both think and act, is fundamentally metaphorical in nature.
  • 9. “Is Data Publication the Right Metaphor?” Parsons and Fox. (in press). Data Science Journal Preprint at http://mp-datamatters.blogspot.com
  • 10. Purpose of Data Citation • Aid scientific reproducibility through direct, unambiguous connection to the precise data used • Credit for data authors and stewards • Accountability for creators and stewards • Track impact of data set • Help identify data use (e.g., trackbacks) • Data authors can verify how their data are being used. • Users can better understand the application of the data. • A locator/reference mechanism not a discovery mechanism per se 9
  • 11. “Bridging Data Lifecycles: Tracking Data Use via Data Citations” UCAR Workshop Report • Recommendation 1: Identify what you want to achieve via data citations • Recommendation 2: Understand the options for actionable identifier schemes • Recommendation 3: Engage stakeholders • Recommendation 4: Start with well-bounded cases • Recommendation 5: Plan for long-term implications • http://library.ucar.edu/data_workshop/ 10
  • 12. How data citation is currently done • Citation of traditional publication that actually contains the data, e.g. a parameterization value. • Not mentioned, just used, e.g., in tables or figures • Reference to name or source of data in text • URL in text (with variable degrees of specificity) • Citation of related paper (e.g. CRU Temp. records recommend citing two old journal articles which do not contain the actual data or full description of methods) • Citation of actual data set typically using recommended citation given by data center • Citation of data set including a persistent identifier/locator, typically a DOI 11
  • 13. 2009 1.7% 2008 1.3% 2007 0.9% 2006 1.3% 2005 0.7% 2004 0.7% 2003 1.0% Formal Citation Total Entries 2002 1.3% 0 100 200 300 400 500 600 “MODIS Snow Cover Data” in Google Scholar
  • 14. Data Citation Guidelines • Federation of Earth Science Information Partners. 2012. http://bit.ly/data_citation and related guidelines for the Group on Earth Observations (GEO) • Best available for Earth system science. Not yet widely adopted but growing. • Digital Curation Center. 2011. http://www.dcc.ac.uk/resources/how-guides/cite- datasets • Best overall guide. Not yet widely adopted but growing. • DataCite—a well-recognized consortium of libraries and related organizations working to define a citation approach around DOIs and working to get data citations included in citation indices. • DataVerse Network Project—a standard from the social science community using a Handle locator and “Universal Numerical Fingerprint” as a unique identifier. • New CODATA Task Group in collaboration with ICSTI. Report due soon. • NASA DAACs, NCAR, some NOAA centers adopting ESIP-based approaches. • More consistency is emerging but there is still great variation in recommended approach. They range from specific data citation, to general acknowledgement, to recommending citing a journal article, or even a presentation. 13
  • 15. Basic data citation form and content Per DataCite: Creator. PublicationYear. Title. [Version]. Publisher. [ResourceType]. Identifier. Per ESIP: Author(s). ReleaseDate. Title, [version]. [editor(s)]. Archive and/or Distributor. Locator. [date/time accessed]. [subset used]. 14
  • 16. An Example Citation Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements ver. 2.0. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.
  • 17. Version Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements, ver. 2.0. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.
  • 18. Locator Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements, ver. 2.0. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0175.html.
  • 19. Locator Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements, ver. 2.0. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2012-09-22 at http://dx.doi.org/10.5060/D4H41PBP.
  • 20. Identifier vs. Locator • Human ID: Mark Alan Parsons (son of Robert A. and Ann M., etc.) • every term defined independently and only unique in context/ provenance (remember that). • Alternative like a social security number requires a very well controlled central authority. • Human Locator: 1540 30th St., Room 201, Boulder CO 80303. • every term has a naming authority • Data Set IDs: data set title, filename, database key, object id code (e.g. UUID), etc. • Data set Locators: URL, directory structure, catalog number, registered locator (e.g. DOI), etc. 19
  • 21. An assessment of identification schemes for digital Earth science data Unique Unique Citable Scientifically Identifier Locator Locator Unique ID ID Scheme Data Set Item Data Set Item Data Set Item Data Set Item URL/N/I PURL XRI Handle DOI ARK LSID Good OID Fair Poor UUID Adapted from Duerr, R. E., et al.. 2011. On the utility of identification schemes for digital Earth science data: An assessment and recommendations. Earth Science Informatics. 4:139-160. 20 http://dx.doi.org/10.1007/s12145-011-0083-6
  • 22. An assessment of identification schemes for digital Earth science data Unique Unique Citable Scientifically Identifier Locator Locator Unique ID ID Scheme Data Set Item Data Set Item Data Set Item Data Set Item s URL/N/I or at c PURL Lo XRI Handle DOI ARK ers LSID Good ntifi OID Fair Ide Poor UUID Adapted from Duerr, R. E., et al.. 2011. On the utility of identification schemes for digital Earth science data: An assessment and recommendations. Earth Science Informatics. 4:139-160. 20 http://dx.doi.org/10.1007/s12145-011-0083-6
  • 23. Why the DOI? • Not perfect but well understood by publishers • DataCite working with Thomson Reuters to get data citations in their index. But... • What is the citable unit? • How do we handle different versions? • What about “retired” data? • When is a DOI assigned? 21
  • 24. Issues largely resolved by... • A defined versioning scheme • Good tracking and documentation of the versions • Due diligence in archive and release practices 22
  • 25. When to assign a DOI? • First principle: Data should be reference-able as soon as they are available for use by anyone other than the original authors. • But... • Most people (falsely) believe that a DOI assures permanence so how do we cite transient data? • Some believe that a DOI should not be assigned until the data has undergone some level of review (e.g. Lawrence et al. 2011). So how do we cite data used before the review? • Data are often used by friends and collaborators in a raw, “unpublished” state. Should this use be cited with a DOI? • Near real time or preliminary data may only be available for a short uncurated, period, and there may not be a good match between the submission package and the distribution package. What gets the DOI? When? 23
  • 26. Versioning approach recommended by DCC • “As DOIs are used to cite data as evidence, the dataset to which a DOI points should also remain unchanged, with any new version receiving a new DOI.” • “There are two possible approaches the data repository can take: time slices and snapshots.” 24
  • 27. Versioning and locators: some suggestions from NSIDC • major version.minor version.[archive version] • Individual stewards need to determine which are major vs. minor versions and describe the nature and file/record range of every version. • Assign DOIs to major versions. • Old DOIs should be maintained and point to some appropriate page that explains what happened to the old data if they were not archived. • A new major version leads to the creation of a new collection-level metadata record that is distributed to appropriate registries. The older metadata record should remain with a pointer to the new version and with explanation of the status of the older version data. • Major and minor version (after the first version) should be exposed with the data set title and recommended citation. • Minor versions should be explained in documentation, ideally in file-level metadata. • Applying UUIDs to individual files upon ingest aids in tracking minor versions and historical citations. 25
  • 28. Basic data citation form and content Author(s). ReleaseDate. Title, Version. [editor(s)]. Archive and/or Distributor. Locator. [date/time accessed]. [subset used]. The best solution may be to have unique identifiers or query IDs for subsets, but that won’t be available for most data sets for a long time, so we need alternative solutions... 26
  • 29. February 8, 2011, 4:45 PM Page Numbers for Kindle Books an Imperfect Solution Neither solution is perfect—‘locations’ or page numbers—because the problem is unsolvable. The best we can hope for is a choice... Amazon’s Kindle will have page numbers that correspond to real books and locations by passage. http://pogue.blogs.nytimes.com/2011/02/08/page-numbers-for-kindle-books-an-imperfect-solution/
  • 30. Chapter and Verse • Bible • Koran • Bhagavad-Gita and Ramayana • other sacred texts • A “structural index”
  • 31. The “Archive Information Unit” “An Archival Information Package whose Content Information is not further broken down into other Content Information components, each of which has its own complete Preservation Description Information. It can be viewed as an ‘atomic’ AIP” “From an Access viewpoint, new subsetting and manipulation capabilities are beginning to blur the distinction between AICs and AIUs. Content objects which used to be viewed as atomic can now be viewed as containing a large variation of contents based on the subsetting parameters chosen. In a more extreme example, the Content Information of an AIU may not exist as a physical entity. The Content Information could consist of several input files (or pointers to the AIPs containing these data files) and an algorithm which uses these files to create the data object of interest.” • CCSDS. 2002. Reference Model for An Open Archival Information System (OAIS) CCSDS 650.0-B-1 Issue 1. Washington, DC: CCSDS Secretariat. p. 1-8, 4-38. 29
  • 32. Citation scenarios and production patterns • What kind of “atomic” item is being cited—the “Archive Information Unit (AIU)” (e.g., a data file, a data element within a file, a relational (or other) database, a job “residue”)? • How many AIUs items are in a typical citation for the scenario being considered? • What other digital or physical objects need to be available to make the unit usable—the “Preservation Description Information (PDI)”? Key Question: • What structure or structures can we use to organize data collections that might be common across Earth sciences? 30
  • 33. An example production pattern for Cline et al. (2003).
  • 34. 32
  • 35. 33
  • 36. 34
  • 37. 35
  • 38. 36
  • 39. 37
  • 40. 38
  • 41. 39
  • 42. 40
  • 43. 41
  • 44. 42
  • 45. A production pattern for Cline et al., 2003 field notebook Excel v1 printout Excel v2 TRANSECT,IOP ,DATE ,TIME,UTME ,UTMN ,DEPTH ,SWET,SRUF,CNPY, TEMP,SURVEYOR ,QC ,COMMENTS , , , , , ,cm , , , , deg- F, , , FAA01.1 ,iop4,2003-03-25,1017,425941,4410860, 104, d, y, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) "," " FAA01.2 ,iop4,2003-03-25,1017,425956,4410860, 13, d, n, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) "," " ... FAA04.1 ,iop4,2003-03-25,1221,425938,4411193, 325, d, y, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) ","Couldn't find post, used GPS 5940 1197; FAA4.4 and FAA4.5 unsafe, avalanche area! analog to ascii files digital w/ QC shapefiles born digital camera interim jpgs collated and named jpgs
  • 46. A production pattern for Cline et al., 2003 field notebook Excel v1 printout Excel v2 distributed data set TRANSECT,IOP ,DATE ,TIME,UTME ,UTMN ,DEPTH ,SWET,SRUF,CNPY, TEMP,SURVEYOR ,QC ,COMMENTS , , , , , ,cm , , , , deg- F, , , FAA01.1 ,iop4,2003-03-25,1017,425941,4410860, 104, d, y, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) "," " FAA01.2 ,iop4,2003-03-25,1017,425956,4410860, 13, d, n, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) "," " ... FAA04.1 ,iop4,2003-03-25,1221,425938,4411193, 325, d, y, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) ","Couldn't find post, used GPS 5940 1197; FAA4.4 and FAA4.5 unsafe, avalanche area! analog to ascii files digital w/ QC shapefiles born digital camera interim jpgs collated and named jpgs
  • 47. A production pattern for Cline et al., 2003 field notebook Excel v1 printout Excel v2 distributed data set TRANSECT,IOP ,DATE ,TIME,UTME ,UTMN ,DEPTH ,SWET,SRUF,CNPY, TEMP,SURVEYOR ,QC + ,COMMENTS HTML F, , , , , , ,cm FAA01.1 ,iop4,2003-03-25,1017,425941,4410860, , , 104, , d, , y, , n, , deg- Doc. -999,"Fitzgerald, Matous, Dundas ","QC(000) "," " FAA01.2 ,iop4,2003-03-25,1017,425956,4410860, 13, d, n, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) "," " ... FAA04.1 ,iop4,2003-03-25,1221,425938,4411193, 325, d, y, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) ","Couldn't find post, used GPS 5940 1197; FAA4.4 and FAA4.5 unsafe, avalanche area! analog to ascii files digital w/ QC shapefiles born digital camera interim jpgs collated and named jpgs
  • 48. A production pattern for Cline et al., 2003 field notebook Excel v1 printout Excel v2 distributed data set TRANSECT,IOP ,DATE ,TIME,UTME ,UTMN ,DEPTH ,SWET,SRUF,CNPY, TEMP,SURVEYOR ,QC + ,COMMENTS HTML , , , , , ,cm , , , , deg- 100s 100s F, , , FAA01.1 ,iop4,2003-03-25,1017,425941,4410860, 104, d, y, n, Doc. -999,"Fitzgerald, Matous, Dundas ","QC(000) "," " FAA01.2 ,iop4,2003-03-25,1017,425956,4410860, 13, d, n, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) "," " ... FAA04.1 ,iop4,2003-03-25,1221,425938,4411193, 325, d, y, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) ","Couldn't find post, used GPS 5940 1197; FAA4.4 and FAA4.5 unsafe, avalanche area! analog to ascii files digital w/ QC shapefiles born digital 1000s camera interim jpgs collated and named jpgs
  • 49. A production pattern for Cline et al., 2003 field notebook Excel v1 printout Excel v2 distributed data set TRANSECT,IOP ,DATE ,TIME,UTME ,UTMN ,DEPTH ,SWET,SRUF,CNPY, TEMP,SURVEYOR ,QC + ,COMMENTS HTML , , , , , ,cm , , , , deg- 100s 100s F, , , FAA01.1 ,iop4,2003-03-25,1017,425941,4410860, 104, d, y, n, Doc. -999,"Fitzgerald, Matous, Dundas ","QC(000) "," " FAA01.2 ,iop4,2003-03-25,1017,425956,4410860, 13, d, n, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) "," " ... FAA04.1 ,iop4,2003-03-25,1221,425938,4411193, 325, d, y, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) ","Couldn't find post, used GPS 5940 1197; FAA4.4 and FAA4.5 unsafe, avalanche area! analog to ascii files digital w/ QC tarball shapefiles born digital 1000s camera interim jpgs collated and named jpgs
  • 50. A production pattern for Cline et al., 2003 field notebook Excel v1 Couldn't find post, v2 printout Excel distributed data set used GPS 5940 1197; FAA4.4 and TRANSECT,IOP ,DATE ,TIME,UTME ,UTMN ,DEPTH ,SWET,SRUF,CNPY, TEMP,SURVEYOR FAA4.5 unsafe, ,QC + ,COMMENTS HTML , , , , , ,cm , , , , deg- 100s avalanche area! 100s F, , , FAA01.1 ,iop4,2003-03-25,1017,425941,4410860, 104, d, y, n, Doc. -999,"Fitzgerald, Matous, Dundas ","QC(000) "," " FAA01.2 ,iop4,2003-03-25,1017,425956,4410860, 13, d, n, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) "," " ... FAA04.1 ,iop4,2003-03-25,1221,425938,4411193, 325, d, y, n, -999,"Fitzgerald, Matous, Dundas ","QC(000) ","Couldn't find post, used GPS 5940 1197; FAA4.4 and FAA4.5 unsafe, avalanche area! analog to ascii files digital w/ QC tarball shapefiles born digital 1000s camera interim jpgs collated and named jpgs
  • 51. Crude, inaccurate production pattern for MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005 (Hall et al., 2007)
  • 52. Crude, inaccurate production pattern for MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005 (Hall et al., 2007) Archives GSFC EDC NSIDC MODAPs Processing
  • 53. Crude, inaccurate production pattern for MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005 (Hall et al., 2007) Archives GSFC EDC NSIDC 1 file/day/tile (grid cell) Each file contains metadata describing previous inputs and detailed versioning MODAPs Processing
  • 54. Crude, inaccurate production pattern for MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005 (Hall et al., 2007) Archives GSFC EDC NSIDC 1,000,000s 1 file/day/tile (grid cell) Each file contains metadata describing previous inputs and detailed versioning MODAPs Processing
  • 55. Doing it as best we can...? • Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005.3, Oct. 2007- Sep. 2008, 84°N, 75°W; 44°N, 10°W. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-11-01 at http://dx.doi.org/10.1234/xxx. • Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005.3, Oct. 2007- Sep. 2008, Tiles (15,2;16,0;16,1;16,2;17,0;17,1). Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-11-01 at http://dx.doi.org/10.1234/xxx. • Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated 2003. CLPX-Ground: ISA snow depth transects and related measurements, Version 2.0, shapefiles. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://dx.doi.org/10.5060/D4H41PBP. 45
  • 56. Sea Ice Index Fetterer et al. 2009 “Near Real Time” “Preliminary” Maslanik and Stroeve Meir et al. 2006 1999 “Final” Cavalieri et al 1996
  • 57. Remote Sensing Systems Sea Ice Production – Data Workflow TBs (Wentz) Color Key Source (data) Value added product Near-Real-Time product NSIDC-0002 Preliminary Product SSM/I Daily and NSIDC-0001 SSM/I Polar Stereo Tbs Final Product Monthly Polar Gridded Bootstrap Not part of discussion Deaccession to begin 1/2012 NSIDC-0051 Goddard Space NSIDC-0079 Fowler Anderson Flight Center Preliminary Bootstrap Sea Preliminary Sea Ice University of CO University of NE (GSFC) Ice Concentrations from Concentrations From SMMR and SSM/I Nimbus-7 SMMR Passive Microwave Data and DMSP SSM/I NISDC-0116 NSIDC-0105 Snow Melt Onset Data not yet Polar Pathfinder distributed Daily 25 km Over Arctic Sea Ice EASE-Grid Sea Ice from SMMR and NSIDC-0051 NSIDC-0079 Motion Vectors SSM/I Tbs Sea Ice Concentrations Bootstrap Sea Ice from SMMR and SSM/I Concentrations from Passive Microwave Data SMMR and SSM/I NASA Team G00791 production line IABP Drifting NSIDC-0066 AVHRR Polar Buoy Pressure, Pathfinder Twice-Daily G02202 Temperature, NOAA/NSIDC CDR 5 km EASE- Position, and Interpolated Grid Composites Ice Velocity NSIDC-0081 NSIDC-0192 NSIDC-0046 Near-Real-Time DMSP Sea Ice Trends and Robinson NH EASE-Grid SSM/I Daily Polar Gridded Climatologies from SMMR Psuedo-weekly Weekly Sea Ice Concentrations and SSM/I Snow Cover Snow Cover and Sea Rutgers Ice Extent V3 NSIDC-0080 Near-Real-Time DMSP NISE on NEO World Wide Web G02135 SSM/I Daily Polar Gridded NISE Arctic Sea Ice Sea Ice Index Tbs News and Analysis CLASS F17 TBs Last updated: D Scott 12/2011
  • 58. MODIS chlorophyll level 0 SeaWiFS GSM phytoplankton … level 1a algorithms ... particulates … software level 1b calibration ... level 2 reprocess, … revalidate 48 slide courtesy Greg Janee, UCSB
  • 59. Hypothesis: ~80% of citation scenarios for 80% of Earth system science data can be addressed with basic citations (Author(s). ReleaseYear. Title, Version. [editor(s)]. Archive. Locator. [date/time accessed]. [subset used].), a solid methods section, and reasonable due diligence by archives.
  • 60. Future directions on scientific equivalence
  • 61. Content equivalence and provenance equivalence serve as a proxy for scientific equivalence. Content Equivalence: Is there an algorithm that can consider the content of a file and come up with a unique identifier that will be the same for objects in the same “content equivalence class”? • Exact content equivalence can use digital signature or cryptographic techniques MD5, SHA-1, etc.   • Universal Numeric Footprint (UNF) takes a digital signature of a “canonical” representation of the information, but canonical is a problem. • How can we define loose or scientific content equivalence? Are the shapefiles and text files in Cline et al. (2003) equivalent? 51
  • 62. Content equivalence and provenance equivalence serve as a proxy for scientific equivalence. Provenance Equivalence:  A process is reproducible when there is sufficient creation provenance details for someone else to make an equivalent file. If someone follows those provenance details to re-create the object, the resulting object will be equivalent to the original.    • If we can enumerate/list sufficient or “essential” creation provenance details to  make an equivalent file, then we can describe an algorithm to  produce an identifier that will be the same for files that match those provenance details. • Algorithm can be similar to the content equivalence algorithm before: take a digital signature of a canonical representation of the information in this case a canonical serialization of processes. 52
  • 63. Thank You parsonsm@nsidc.org photo courtesy NOAA
  • 65. More examples • Glacier Photos: chemical creation of image, digitization, multi-source, no versioning • Hurricane Ike: direct digital creation, embedded software, single-source, no versioning • Rock or Ice Cores: physical specimens with later experimental operations to create data collections – digital results may be stored in databases with complex provenance • SeaDataNet: Many individual Conductivity, Temperature, Depth measurements from many individual cruises compiled into one database. • Global Historical Climate Network: In situ measurements of Temp and Humidity, recorded as time series and appended to files (with some quality control) – multi- source, some versioning • Radiosonde Network: Balloon-borne Temp and Humidity from about 20,000 sites every 12 or 6 hours assimilated into weather forecasts – archives maintained at NCAR, GSFC, and elsewhere – reanalysis can create new versions • Large-scale Satellite Data Production: complex, large-scale data flows from multiple sources, systematic approaches to versioning where the structure of one version of a data product resembles the next 55
  • 66. Basic data citation form and content Mandatory: Optional: Author(s) Editor(s) Release Date Archive or Distributor Place Title Other Institutional Role Version Indication of subset used Archive and/or Distributor Locator or Persistent Location service Time, date accessed 56
  • 67. Serving, Citing and Publishing Data Citation forms an important part of the scientific record. Doi:10232/123ro This involves the peer-review of data We draw a clear distinction between: 2. sets, and gives “stamp of approval” Publication of data sets associated with traditional journal publishing = making available for publications. Can’t be done without consumption (e.g. on the web), effective linking/citing of the data sets. and Doi:10232/123 Publishing = publishing after some 1. This is our first step for this project – formal process which adds value Data set Citation formulate and formalise a way of for the consumer: citing data sets. Will provide benefits • e.g. PloS ONE type review, or to our users – and a carrot to get them to provide data to us! • EGU journal type public review, or • More traditional peer review. 0. AND Serving of data sets This is what data centres do as our • provides commitment to day job – take in data supplied by scientists and make it available to persistence other interested parties. slide courtesy S. Callaghan, BADC VO Sandpit, November 2009

Editor's Notes

  1. Props to ESIP \n - Consortium of ~120 Earth Science-Related Partners\n- Started in 1998\n- Support from NASA, NOAA, and EPA With growing engagement from: USGS, NSF, DOE and USDA\n - Neutral forum for community networking, collaboration & problem solving\n - Limited international participation (strength and weakness)\n- P&S committee a few years old and very active. Please join. Google ESIP preservation\n\nA grand title for what might be better called “How do I specifically reference the precise data I used? And not the data that are very similar but different” \n\nData Equivalence by Mark A. Parsons and the ESIP Preservation and Stewardship Cluster is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. See http://creativecommons.org/licenses/by-sa/3.0/\n<a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-sa/3.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">How to Cite an Earth Science Data Set</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="http://nsidc.org/cgi-bin/publications/pub_list.pl" property="cc:attributionName" rel="cc:attributionURL">Mark A. Parsons and the ESIP Preservation and Stewardship Cluster</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>.\n\nAbstract:\nData citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I proposes some interim solutions and suggest research strategies for the future.\n
  2. We manage data, educate the public, and do research on the cryosphere--the frozen parts of the word from kryos the greek for butt cold. We are getting more involved with other data including life and social science data and even Indigenous knowledge. I’d argue we are becoming an important center of interdisciplinary informatics, but we are best known for keeping an eye on that stuff in the background image-- sea ice \n\nImages:\nTop left: Near Real-Time SSM/I EASE-Grid Daily Global Ice Concentration and Snow Extent, 3 February 2008\nTop right: Pancake ice seen off the bow of the research vessel Aurora Australis, in the Southern Ocean near Antarctica. Photo courtesy Ted Scambos.\nBottom right: Data from the Sea Ice Index, shown on Google Earth and distributed from our Web site in the form of a Quick Time movie. The left image is an animated time series of sea ice extent from 1979 to 2006; the static image on the right compares the extent in 2007.\nBottom left: Global Monthly SWE Climatology Browse, from http://nsidc.org/data/nsidc-0271.html\nCenter: A section of the online data set documentation from Sea Ice Charts of the Russian Arctic in Gridded Format, 1933-2006 (http://nsidc.org/data/g02176.html).\n
  3. In case you missed it, Arctic sea ice is rapidly declining. This years summer minimum sea ice extent about a week ago shattered previous records in an ongoing collapse.\nThis image shows this years minimum extent relative to to the 30-yr average minimum extent (yellow line). Note this average is in a period of rapid decline.\n\n“Satellite data reveal how the new record low Arctic sea ice extent, from Sept. 16, 2012, compares to the average minimum extent over the past 30 years (in yellow). Sea ice extent maps are derived from data captured by the Scanning Multichannel Microwave Radiometer aboard NASA's Nimbus-7 satellite and the Special Sensor Microwave Imager on multiple satellites from the Defense Meteorological Satellite Program. Credit: NASA/Goddard Scientific Visualization Studio “ http://www.nasa.gov/topics/earth/features/2012-seaicemin.html\n
  4. So let’s look at this a little more closely.\n\nFetterer, F., K. Knowles, W. Meier, and M. Savoie. 2002, updated 2009. Sea Ice Index. Boulder, Colorado USA: National Snow and Ice Data Center. http://nsidc.org/data/g02135.html\n
  5. \n
  6. This highlights the issue. In science you have to be very exact in what you are pointing to.\n\nMeier W.N., VanWoert M.L., & Bertoia C. (2001) Evaluation of operational SSM/I algorithms. Annals of Glaciology. 33:102-08. \n\n
  7. \n“researchers from Emory University reported in Brain & Language that when subjects in their laboratory read a metaphor involving texture, the sensory cortex, responsible for perceiving texture through touch, became active. Metaphors like “The singer had a velvet voice” and “He had leathery hands” roused the sensory cortex, while phrases matched for meaning, like “The singer had a pleasing voice” and “He had strong hands,” did not.” ANNIE MURPHY PAUL, NY Times 17 Mar 2012\n\nmetaphors help us create the complex narratives we use to understand our physical and conceptual experience. These complex narratives are made up of smaller, very simple narratives called “frames” or “scripts”. Framing and frame analysis are often used in knowledge representation, social theory, media studies, and psychology with much of the work stemming from Erving Goffman (1974). \n\nThese frames present a set of roles and relationships between them like characters in a play. They also help us define our terms and make sense of language, because words are defined relative to a conceptual frame. The word “sell” does not make sense without some understanding of a commercial transaction and some of the other roles and terms involved like “buyer,” “money,” and “cost”. Furthermore, by mentioning only one of these concepts like “buy” or “sell”, the whole commercial transaction scenario is evoked or “activated” in the mind (Fillmore, 1976). \n\nLakoff (2008) further argues that framing is critical to human cognition. The neural circuitry to create a frame is relatively simple and our brain essentially uses framing as a sort of cognitive processing shortcut. If things are understood in the context of a frame, much is already unconsciously understood and need not be consciously processed. We know what to expect. \n\nHe shows how language, metaphor, and framing play critical roles in any social enterprise. \n\n“Language is at once a surface phenomenon and a source of power. It is a means of expressing, communicating, accessing, and even shaping thought. ... Language gets its power because it is defined relative to frames, prototypes, metaphor, narratives, images and emotions. Part of its power comes from unconscious aspects: we are not consciously aware of all that it evokes in us, but it is there, hidden, always at work. If we hear the same language over and over, we will think more and more in terms of the frames and metaphors activated by that language.” (Lakoff 2008, p14)\nSo with that in mind Peter Fox and I wrote a critical essay, asking\n\nLakoff, G. (2008) The Political Mind: Why You Can't Understand 21st-Century Politics With An 18th-Century Brain. New York: Penguin Group. \n\n
  8. \n
  9. \n
  10. \n
  11. \n
  12. The National Snow and Ice Data Center distributes a variety of different snow cover products derived from the Moderate Resolution Imaging Spectroradiometer (MODIS). The results of a quick analysis of how many scientific papers mention use of "MODIS Snow Cover Data" (according to Google Scholar) and how often the data sets themselves are formally cited show a huge disparity, illustrating  the infrequency of proper data citation in practice. Moreover, the lack of data citation standards introduces the possibility that informal references to data do not point to the exact data set actually used.\n\nThis was a crude estimate to make a point. Gene Major, Heather Piwowar, others have done manual studies showing it’s not being done and it hasn’t improved in a decade.\n\nSo the approach to date is to try and develop guidelines and promote their use and acceptance of the practice\n\n
  13. So it’s pretty clear from all this that the author is not being fairly credited.\n\n
  14. Digital Mapping Techniques '00 -- Workshop ProceedingsU.S. Geological Survey Open-File Report 00-325\nProposal for Authorship and Citation Guidelines for Geologic Data Sets and Map Images in the Era of Digital Publication\nBy Stephen M. Richard\n\n\nWhen I presented this only a year ago, it was much more diverse. We are converging on the approach, now we need to implment it!\n
  15. regarding, tracking the impact...\n\nChen, R. S. and Downs, R. R. (2010). Evaluating the Use and Impacts of Scientific Data. National Federation of Advanced Information Services (NFAIS) Workshop, Assessing the Usage and Value of Scholarly and Scientific Output:  An Overview of Traditional and Emerging Approaches. Philadelphia, PA, November 10, 2010. http://info.nfais.org/info/ChenDownsNov10.pdf\n
  16. \n
  17. \n
  18. These databases (a) do not currently support tracking citations to data sets, (b) have expensive subscription fees, and (c) only reveal impact demonstrated through traditional citations. \n
  19. \n
  20. \n
  21. resource type is a short controlled list of “The general type of a resource.”\nmany other optional fields possible\n\n
  22. \n
  23. Authors are those who put the intellectual effort into creating the data set. Study designers, algorithm developers, instrument designers, field team leaders, etc.\n
  24. \n
  25. \n
  26. Editors and other roles help with credit and accountability\n\n
  27. Editors and other roles help with credit and accountability\n\n
  28. Publidhe. Could be multiple, but usually there’s an authority.\n
  29. \n
  30. \n
  31. They’re different, but sometimes locator can be used as an ID (The person working in this position at this address). Hence the general use of the term “identifier “such as in DOI, which is better described as a locator\n\nChun Li and Choi introduce name and address as complements to Id and locator. I’m not sure I entirely sure I understand the distinction, but a key poin they make is that while the concepts of identifier and locator are distinct tey can and do intersect in different context. Sometimes , you can id with a locator and locate with an identifier.\n\nWhat we really need is what Jon Kunze calls an “actionable locator”.\n\nKunze J. (2003) Towards electronic persistence using ARK identifier. Retrieved September 22, 2012 from the World Wide Web: http://pid.ndk.cz/dokumenty/zakladni-literatura/arkcdl.pdf\n\nChun, W, TH Lee, and T Choi. 2011. YANAIL: yet another definition on names, addresses, identifiers, and locators. Proceedings of the 6th International Conference on Future Internet Technologies. pp. 8-12\n
  32. \n
  33. Debate in LSID community weakens it. - - an LSID is a locator; but also the ObjectID part of it is an Identifier and most people use a UUID for the ObjectID part of it\nOID problem\nARK is a bit better than the rest of the locators because it has additional trust value ... maybe the color should have been more orange than yellow but I didn’t want to add more colors.  \n\nFrom ARK defn: transparency – no identifier can guarantee stability, and ARK inflections help users make informed judgements\n\n
  34. What is the citable unit with a DOI? A file? A collection of files? How many? Further, it is important to note that data products can be purged from an archive; such deleted information still needs to be able to be referenced. Even if the products themselves are not preserved, the raw data must be preserved along with detailed documentation describing how the product was created, and that documentation must be citable.\n\nSome have expressed concern that DOIs are too closely associated with publishers and may restrict open access.\n
  35. \n
  36. This i where the publication metaphor gets us in trouble.\nCan we track data packages flowing through an ecosystem instead and recreate\n
  37. \n
  38. still need date accessed with DOI\nCan we auto track uuids?\n\n
  39. “Micro-citation” The hardest piece of all and the one most critical to sci. repeatability.\nOne might think of this as citing a passage in a publication-- a page #\n
  40. page number requires a canonical format.\n
  41. a “structural index (chapter-verse)” vs. “internal location id (page number)” \nstructural index useful for when many different versions but with fairly consistent structure.\n\nThe difficulty with the Unique Numerical Fingerprint approach is that it assumes that there is an immutable order to the data elements. This is not always the case therefore c-v needs to refer to “equivalence classes” not canonical versions. But you can’t deny the human readability of the c-v approach\n\nWe probably need both approaches. We need the chapter and verse that makes sense to people and is easily conceived and communicated between people, but then we still need the precise location and identity of that rather mutable verse represented in a way that computers can readily understand and be precise about, i.e.  the identifier.\nAnd then we can't forget the fact that we have billions if not trillions of "verses" or "granules" that we're dealing with. Our human approach needs to make sense at a high level of aggregation, while the computer approach needs to handle the volumes and precision.\n\nanother option is to capture query and date--this relies on really good provenance tracking.\n
  42. note the outdated reference and page numbers-- a problem with costly ISO standards.\n
  43. It would probably be wise to distinguish between cases where the file\ncollections are open, but adding relatively stable items, from ones where\nthe files themselves are mutable\n\n\n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. the shape file does not hold all the comments that are in the text file.\n\nGo take a photo of the actual fieldbook with a pencil and use that.\n
  57. the shape file does not hold all the comments that are in the text file.\n\nGo take a photo of the actual fieldbook with a pencil and use that.\n
  58. the shape file does not hold all the comments that are in the text file.\n\nGo take a photo of the actual fieldbook with a pencil and use that.\n
  59. the shape file does not hold all the comments that are in the text file.\n\nGo take a photo of the actual fieldbook with a pencil and use that.\n
  60. the shape file does not hold all the comments that are in the text file.\n\nGo take a photo of the actual fieldbook with a pencil and use that.\n
  61. L3 files records interim files not L2.\nHall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 21 July 2011 at http://nsidc.org/data/myd10a1.html.\n
  62. L3 files records interim files not L2.\nHall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 21 July 2011 at http://nsidc.org/data/myd10a1.html.\n
  63. L3 files records interim files not L2.\nHall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated daily. MODIS/Aqua Snow Cover Daily L3 Global 500m Grid V005. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 21 July 2011 at http://nsidc.org/data/myd10a1.html.\n
  64. \n
  65. \n
  66. \n
  67. “To preserve data is to preserve ability to process data!”\nTo reference data is to track data\n\n
  68. Include engineers?\nData managers?\nSponsors?\nCorrect URL? This is where a DOI helps.\n
  69. Include engineers?\nData managers?\nCorrect URL? This is where a DOI helps.\n
  70. I’m have modified this hypothesis and I’m having second thoughts as to whether it stands. From a limited scholarly perspective of citing, well-controlled data sets, it’s probably true. It is “citable” in some of the credit and location senses, but that is insufficient for full scientific tracking and referencing of data streams.\n
  71. \n
  72. \n
  73. W3C and ProvO are doing some good work on this first point.\n\nTake away. Figure out what problem you are trying to solve with referencing and then examine the problem from many perspectives not just the classic scholarly publication model.\n
  74. \n
  75. \n
  76. \n
  77. \nmany other optional fields possible\n\n
  78. \n