Integrated Earth Data Applications:

1

Enhancing	
  Reliable	
  Data	
  Services	
  	
  
Through	
  the	
  Use	
  of	
  P...
Outline
Data	
  services	
  @	
  IEDA	
  
Types	
  of	
  Unique	
  Iden;fiers	
  @	
  IEDA	
  
Use	
  of	
  Unique	
  Iden;...
Thanks to the IEDA
Team

S. Carbotte

L. Hsu

A. Johansson

W. Ryan

L. Song

S. Chan

K. Lehnert

D. Walker

B. Chen

J. ...
Integrated Earth Data Applications

IEDA

www.iedadata.org

4

“…	
  a	
  community-­‐based	
  facility	
  that	
  serves	...
IEDA Scope:
Solid Earth Observational Data

Derived Data

Sensor-based

Sample-based

5

Field Data
IEDA Data Types
Sensor-­‐based	
  (MGDS)	
  
•  Field	
  data:	
  e.g.:	
  sonar	
  ping	
  files,	
  seismic	
  reflec+on	
...
IEDA hosts diverse data
•  Derived Geophysical Data!
•  Analytical geochemistry data"
•  Geochronological data"
•  Sample ...
IEDA hosts diverse data
•  Derived Geophysical Data"
•  Analytical geochemistry data!
•  Geochronological data"
•  Sample ...
IEDA hosts diverse data
•  Derived Geophysical Data"
•  Analytical geochemistry data"
•  Geochronological data"
•  Sample ...
IEDA Data Holdings
 nearly	
  24	
  terabytes,	
  >320,000	
  files	
  in	
  MGDS	
  
 19	
  million	
  geochemical	
  va...
IEDA Systems
Repositories	
  &	
  registries	
  

EarthChem TAS plots

•  Marine	
  Geoscience	
  Data	
  System	
  
•  Ea...
IEDA Foci
Data Preservation &
Curation

•  QA/QC, documentation
•  Persistent identification (DOI)
•  Long-term archiving
...
IEDA Foci
Data Preservation &
Curation

Data Discovery & Access

• 
• 
• 
• 

Web-based User interfaces
Programmatic acces...
IEDA Foci
Data Preservation &
Curation

Data Discovery & Access

Data Analysis

•  Visualization tools (GeoMapApp, Virtual...
IEDA Foci
Data Preservation &
Curation

Data Discovery & Access

Data Analysis

Web-based data submission
Data Management ...
IEDA Services & Architecture
Data submission

Data Discovery & Access

IEDA Repository
DOI
registration

Metadata
Catalogs...
IEDA needs persistent
identifiers
Persistent	
  iden;fiers	
  help	
  IEDA	
  achieve	
  greater…	
  
	
  

17

•  Accessib...
What objects need to
be identified?
IDs	
  assigned	
  by	
  IEDA	
  
•  People	
  	
  
•  Samples	
  
•  Datasets	
  /	
 ...
What identifiers are
used?
IDs	
  assigned	
  by	
  IEDA	
  
•  People	
  	
  
•  Samples	
  
IGSN	
  
•  Datasets	
  /	
 ...
DOI: Digital Object
Identifier

www.doi.org

20

“DOI	
  system	
  provides	
  a	
  technical	
  and	
  social	
  infrastr...
Publication DOIs
10.1016/j.epsl.2006.09.012!

21

Externally	
  assigned	
  publica;on	
  DOIs	
  are	
  used	
  to	
  lin...
EarthChem ‘Landing Page’	


22

Linking Data &
Publications
Data DOI
establish	
  easier	
  access	
  to	
  
research	
  data	
  on	
  the	
  Internet	
  
increase	
  acceptance	
  o...
Data DOIs

10.1594/IEDA/100041!

Data	
  DOIs	
  are	
  assigned	
  to	
  digital	
  resources	
  (datasets,	
  
technical...
EarthChem Library
Data Publication
EarthChem	
  Data	
  Manager	
  

Inves+gator	
  
Create	
  dataset	
  

QC	
  metadata...
QC/Review by Data
Managers
development	
  of	
  metadata	
  for	
  new	
  data	
  sets	
  
•  extract	
  from	
  publica+o...
Samples: IGSN

International Geo Sample Number

MGD000973!

Provides	
  persistent	
  unique	
  iden;fica;on	
  for	
  phys...
IGSN Attributes
persistent	
  
resolvable	
  (via	
  handle	
  service)	
  
broad	
  applica;on	
  
compliant	
  with	
  i...
Need for Unique
Sample Identifiers

Names	
  of	
  dredge	
  sample	
  3	
  of	
  
the	
  Amphitrite	
  cruise	
  
(PetDB	...
IGSN Metadata Profile
User	
  submi^ed	
  metadata	
  
QC	
  by	
  IGSN	
  Alloca;ng	
  Agent	
  
Access	
  via	
  IGSN	
 ...
A Scalable IGSN Architecture
IGSN eV

SESAR

LDEO

USGS

IGSN Registry

ExoPlanet

Near Space
Observatory

(invented examp...
IGSN Applications
Unambiguously	
  cite	
  physical	
  samples	
  (link	
  to	
  data	
  and	
  
publica;ons).	
  
Find,	
...
slide courtesy of
Bethan Keall, Elsevier

Elsevier creates a text link to http://
www.geosamples.org/profile?
igsn:HRV0035...
People – GeoPass ID

148!

GeoPass	
  IDs	
  iden;fy	
  users	
  across	
  mul;ple	
  IEDA	
  
systems	
  (single	
  sign-...
Coming Soon: ORCID IDs

35

registry	
  of	
  unique	
  researcher	
  
iden;fiers	
  	
  
transparent	
  method	
  of	
  li...
Cruises & Expeditions

AT15-17!

Cruise	
  IDs	
  group	
  and	
  link	
  documents,	
  sensor	
  data,	
  
sample	
  data...
R2R Cruise IDs

37

“The	
  Rolling	
  Deck	
  to	
  Repository	
  
(R2R)	
  program	
  aims	
  to	
  develop	
  
comprehe...
38

Platform IDs
Award numbers

0527053!

Award	
  IDs	
  in	
  the	
  Data	
  Compliance	
  Repor;ng	
  Tool	
  
group	
  all	
  data	
  r...
0527053!

Data Compliance Report"

40

Award numbers
Identifier challenges
and future work
Challenges	
  
	
  
•  Maintaining	
  iden+fiers	
  with	
  growing	
  content	
  	
...
IEDA identifiers in a
research workflow
3. Sample management"

4. Dataset publication"
Dataset DOI!
GeoPassID!

IGSN!
GeoP...
Upcoming SlideShare
Loading in …5
×

Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

742 views

Published on

Slides from the One NOAA Science Seminar in May 2013, given by Kerstin Lehnert.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
742
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

  1. 1. Integrated Earth Data Applications: 1 Enhancing  Reliable  Data  Services     Through  the  Use  of  Persistent  Iden;fiers  
  2. 2. Outline Data  services  @  IEDA   Types  of  Unique  Iden;fiers  @  IEDA   Use  of  Unique  Iden;fiers  @  IEDA   •  Data  Publica+on   •  Linking  Data,  Samples,  &  Literature   •  Data  Compliance  Support   •  Interoperability   2  
  3. 3. Thanks to the IEDA Team S. Carbotte L. Hsu A. Johansson W. Ryan L. Song S. Chan K. Lehnert D. Walker B. Chen J. Morton R. Weissel V. Ferrini A. Goodwillie S. O’Hara T. Rivera J. Ash E. Bohl K. McLain J. Zampas 3 R. Arko
  4. 4. Integrated Earth Data Applications IEDA www.iedadata.org 4 “…  a  community-­‐based  facility  that  serves  to  support,   sustain,  and  advance  the  geosciences  by  providing  a   centralized  loca+on  for  the  registry  of  and  access  to  data   essen+al  for  research  in  the  solid-­‐earth  and  polar  sciences.”    
  5. 5. IEDA Scope: Solid Earth Observational Data Derived Data Sensor-based Sample-based 5 Field Data
  6. 6. IEDA Data Types Sensor-­‐based  (MGDS)   •  Field  data:  e.g.:  sonar  ping  files,  seismic  reflec+on  shot  data,  side-­‐scan  sonar,   photographs,  gravity  field  data,  temperature  (>70  data  types)   •  Derived  data:  e.g.:  bathymetric  grids,  side-­‐scan  sonar  grids,  micro-­‐seismicity   catalogs,  migrated  seismic  reflec+on  profiles,  gravity  MBA  grids,  magne+za+on   grids  (>65  data  types)   Sample-­‐based  (EarthChem)   6 •  Sample  metadata  profiles:  rocks,  sediments,  liquids,  soils   •  Analy+cal  lab  data:  e.g.:  major  &  trace  element  composi+ons,  isotopic  ra+os,   mineralogy,  geochronology,  age  models,  P/T  model  data,  calculated  end-­‐member   composi+ons  (>  500  measured  proper4es)  
  7. 7. IEDA hosts diverse data •  Derived Geophysical Data! •  Analytical geochemistry data" •  Geochronological data" •  Sample metadata" •  Seismic Reflection Data" •  Photos and images" Soule et al., 2008" Marine Geoscience Data System 7 Multibeam bathymetry data
  8. 8. IEDA hosts diverse data •  Derived Geophysical Data" •  Analytical geochemistry data! •  Geochronological data" •  Sample metadata" •  Seismic Reflection Data" •  Photos and images" Standish et al., 2008" PetDB: The Petrological Database 8 Major element geochemical analyses
  9. 9. IEDA hosts diverse data •  Derived Geophysical Data" •  Analytical geochemistry data" •  Geochronological data" •  Sample metadata" •  Seismic Reflection Data" •  Photos and images! Soule et al., 2008" Web galleries for images, videos, maps, photos 9 MGDS and IEDA MediaBank
  10. 10. IEDA Data Holdings  nearly  24  terabytes,  >320,000  files  in  MGDS    19  million  geochemical  values  from  36,000  publica+ons   accessible  at  EarthChem    ca.  3.8  million  samples  registered  in  SESAR   10 EarthChem Portal sample locations
  11. 11. IEDA Systems Repositories  &  registries   EarthChem TAS plots •  Marine  Geoscience  Data  System   •  EarthChem  Library   •  System  for  Earth  Sample  Registra+on   Data  syntheses  &  products   MGDS Virtual Ocean •  GMRT,  PetDB,  SedDB,  Geochron   SoJware  tools  for  data  discovery,  access,  visualiza;on  and   analysis   •  GeoMapApp,  Virtual  Ocean,  EarthChem   Portals  to  complementary  data  held  in  other  repositories   11 •  ASP,  EarthChem,  USAP-­‐DCC  
  12. 12. IEDA Foci Data Preservation & Curation •  QA/QC, documentation •  Persistent identification (DOI) •  Long-term archiving Data Discovery & Access Data Analysis 12 Investigator Support
  13. 13. IEDA Foci Data Preservation & Curation Data Discovery & Access •  •  •  •  Web-based User interfaces Programmatic access interfaces GeoMapApp, GoogleEarth, etc. Links to the literature Data Analysis 13 Investigator Support
  14. 14. IEDA Foci Data Preservation & Curation Data Discovery & Access Data Analysis •  Visualization tools (GeoMapApp, Virtual Ocean, Earth Observer) •  Syntheses & Products 14 Investigator Support
  15. 15. IEDA Foci Data Preservation & Curation Data Discovery & Access Data Analysis Web-based data submission Data Management Plan tool Data Compliance Report tool Community 15 Investigator Support •  •  •  • 
  16. 16. IEDA Services & Architecture Data submission Data Discovery & Access IEDA Repository DOI registration Metadata Catalogs (datasets) IGSN registration EarthChem MGDS SESAR datasets remote data Data Compliance Support Synthesis GMRT PetDB SedDB (samples) Long-term Archiving 1 6
  17. 17. IEDA needs persistent identifiers Persistent  iden;fiers  help  IEDA  achieve  greater…     17 •  Accessibility:  by  naviga+ng  diverse  but  related  data  in  the   IEDA  systems   •  Reliability:  by  maintaining  links  between  IEDA  and  outside   systems  that  persist  through  +me   •  Citability:  by  enabling  proper  aaribu+on  to  research  with   long-­‐lived,  citable,  iden+fiers  
  18. 18. What objects need to be identified? IDs  assigned  by  IEDA   •  People     •  Samples   •  Datasets  /  Datafiles  /  Sobware   •  Cruises/Expedi+ons   Externally  assigned  IDs,  used  in  IEDA  systems   18 •  Publica+ons   •  Funding  Awards   •  Pladorms   •  Cruises   •  Organiza+on  IDs   •  Country,  State,  Language  codes   18
  19. 19. What identifiers are used? IDs  assigned  by  IEDA   •  People     •  Samples   IGSN   •  Datasets  /  Datafiles  /  Sobware   DOI  (DataCite)   •  Cruises/Expedi+ons   Externally  assigned  IDs,  used  in  IEDA  systems   19 •  People   ORCID  (coming  soon)   •  Publica+ons   DOI  (Publishers)   •  Funding  Awards   NSF  Award  Numbers   •  Pladorms   ICES  PlaSorm  Code   •  Cruises   R2R  Cruise  ID   •  Organiza+ons   IANA   •  Country,  State,  Language   ISO  codes   19
  20. 20. DOI: Digital Object Identifier www.doi.org 20 “DOI  system  provides  a  technical  and  social  infrastructure  for   the  registra;on  and  use  of  persistent  interoperable  iden;fiers   for  use  on  digital  networks.  The  DOI  system  implements  the   Handle  System  and  the  indecs  Framework.”  
  21. 21. Publication DOIs 10.1016/j.epsl.2006.09.012! 21 Externally  assigned  publica;on  DOIs  are  used  to  link   to  the  electronic  ar;cle  and  capture  IEDA  data  related   to  a  published  ar;cle  
  22. 22. EarthChem ‘Landing Page’ 22 Linking Data & Publications
  23. 23. Data DOI establish  easier  access  to   research  data  on  the  Internet   increase  acceptance  of  research   data  as  legi;mate,  citable   contribu;ons  to  the  scholarly   record   support  data  archiving  that  will   permit  results  to  be  verified   and  re-­‐purposed  for  future   study     23  
  24. 24. Data DOIs 10.1594/IEDA/100041! Data  DOIs  are  assigned  to  digital  resources  (datasets,   technical  reports,  and  soJware)  in  IEDA  repository   24 •  help  ensure  proper  aaribu+on  to  the  author     •  provide  open  access     •  allow  versioning     •  long-­‐term  archiving  in  Columbia  University  Libraries  
  25. 25. EarthChem Library Data Publication EarthChem  Data  Manager   Inves+gator   Create  dataset   QC  metadata  &   data   (guidelines  &  data   templates  provided)   Create  ECL  record   (enter  cataloging   metadata)   Upload  file   automatic notification to ECL manager Approve  Dataset   Register  Dataset   with  DOI   (Release  dataset)   25 (set  release  date)  
  26. 26. QC/Review by Data Managers development  of  metadata  for  new  data  sets   •  extract  from  publica+ons   •  extract  from  secondary  literature   •  contact  authors   con;nued  development  of  metadata  schemas  and   vocabularies  to  align  with  evolving  community  standards   ongoing  evalua;on  to  ensure  completeness  of  metadata   for  exis;ng  data  holdings   26 data  verifica;on  ensuring  that  data  files  are  readable  
  27. 27. Samples: IGSN International Geo Sample Number MGD000973! Provides  persistent  unique  iden;fica;on  for  physical   samples   •  URN  type  syntax   •  centralized  registra+on  via  interna+onal  governance   organiza+on  IGSN  e.V.  (DataCite  model)   Ensure  access  to  ‘virtual  representa;ons’  of  samples   27 •  standardized  ‘core’  metadata  profiles  (ISO19115,  GeoSciML)   •  extended  metadata  profiles  at  alloca+ng  agents  (community   specific)    
  28. 28. IGSN Attributes persistent   resolvable  (via  handle  service)   broad  applica;on   compliant  with  interna;onal  standards   interna;onally  governed   does  not  replace  personal  or  ins;tu;onal  naming  protocols   tracks  sample  geneologies     28   28
  29. 29. Need for Unique Sample Identifiers Names  of  dredge  sample  3  of   the  Amphitrite  cruise   (PetDB  database,  www.petdb.org)   29 The  EarthChem  Portal  shows   75  publica+ons  with   geochemical  data   referenced  to  a  sample  with   the  name  M1  (or  M-­‐1).   (www.earthchem.org)  
  30. 30. IGSN Metadata Profile User  submi^ed  metadata   QC  by  IGSN  Alloca;ng  Agent   Access  via  IGSN  handle  or  UI   search   QR  code  with  URL   30 Long-­‐term  preserved  
  31. 31. A Scalable IGSN Architecture IGSN eV SESAR LDEO USGS IGSN Registry ExoPlanet Near Space Observatory (invented example) (invented example) Geoscience Australia ICDP GFZ Metadata Clearinghouse … Allocating Agent Investigator Analytical Lab Repository … 31 Registrant
  32. 32. IGSN Applications Unambiguously  cite  physical  samples  (link  to  data  and   publica;ons).   Find,  link,  &  integrate  distributed  data  for  a  single  sample   Build  a  catalog  of  available  specimens,  cores,  etc.  to  find  and   access  these  objects  and  their  metadata   Publica+on      doi:10.1029/2011GC003804   Dataset              doi:10.1594/IEDA/100050   32 Sample      igsn:OSU0056FT    
  33. 33. slide courtesy of Bethan Keall, Elsevier Elsevier creates a text link to http:// www.geosamples.org/profile? igsn:HRV0035F0 Author highlights/ mentions IGSN of their sample in text of paper Researchers can link through to the sample at SESAR in one click – more efficient 33 … igsn:HRV0035F0….
  34. 34. People – GeoPass ID 148! GeoPass  IDs  iden;fy  users  across  mul;ple  IEDA   systems  (single  sign-­‐on)                 Log in allows saved content:" • "data management plans" • "database search results" • "sample metadata profiles" • "submitted content" 34  
  35. 35. Coming Soon: ORCID IDs 35 registry  of  unique  researcher   iden;fiers     transparent  method  of  linking   research  ac;vi;es  and  outputs   to  these  iden;fiers   ability  to  reach  across   disciplines,  research  sectors,   and  na;onal  boundaries   open,  non-­‐profit,  community-­‐ based  effort     coopera;on  with  other   iden;fier  systems  
  36. 36. Cruises & Expeditions AT15-17! Cruise  IDs  group  and  link  documents,  sensor  data,   sample  data,  and  informa;on  across  IEDA.     •  •  •  •  •  •  •  •  •  •  •  Cruise personnel and instruments" Geologic interpretation" Photographs" Bathymetry" Pressure and Temperature" Magnetic" Navigation" Seismic" Photographs" Samples" Fluid Geochemistry" 36  
  37. 37. R2R Cruise IDs 37 “The  Rolling  Deck  to  Repository   (R2R)  program  aims  to  develop   comprehensive  fleet-­‐wide   management  of  underway  data  to   ensure  preserva+on  of  and  access   to  our  na+onal  oceanographic   research  data  resources.”  
  38. 38. 38 Platform IDs
  39. 39. Award numbers 0527053! Award  IDs  in  the  Data  Compliance  Repor;ng  Tool   group  all  data  related  to  a  funding  award,  and   generate  a  dynamic  report  for  funding  agencies.   39 Data Compliance Report"
  40. 40. 0527053! Data Compliance Report" 40 Award numbers
  41. 41. Identifier challenges and future work Challenges     •  Maintaining  iden+fiers  with  growing  content    needs  ac;ve   management   •  Incorpora+ng  legacy  iden+fiers      listen  to  community  feedback   •  Upda+ng  of  content  by  users      allow  versioning     41
  42. 42. IEDA identifiers in a research workflow 3. Sample management" 4. Dataset publication" Dataset DOI! GeoPassID! IGSN! GeoPassID! 5. Article publication" GeoPassID! Researchers" 1. Background research" GeoPassID! Cruise ID! Publication DOI! Publication DOI! IGSN! Dataset DOI! 6. Funding agency report" NSF Award #! 42 2. Data management plan"

×