• Share
  • Email
  • Embed
  • Like
  • Private Content
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
 

NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

on

  • 20,832 views

 

Statistics

Views

Total Views
20,832
Views on SlideShare
4,169
Embed Views
16,663

Actions

Likes
8
Downloads
119
Comments
0

19 Embeds 16,663

http://dublincore.org 13806
http://www.niso.org 1881
http://www.dublincore.org 898
http://translate.googleusercontent.com 52
http://131.253.14.98 8
http://192.168.254.191 3
http://stage.dublincore.org 2
http://www.niso.org.libproxy.lib.unc.edu 2
http://www.scoop.it 1
http://www.google.co.th 1
http://i.creativecommons.org 1
https://translate.googleusercontent.com 1
http://www.google.ca 1
http://biologyfreaks.com 1
https://www.google.de 1
https://www.google.nl 1
https://www.google.co.uk 1
http://dublincore.org. 1
http://211.185.62.34 1
More...

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

NISO/DCMI Webinar: Metadata for Managing Scientific Research Data NISO/DCMI Webinar: Metadata for Managing Scientific Research Data Presentation Transcript

  • Metadata for Managing Scientific Research Data NISO/DCMI Webinar: August 22, 2012Jane Greenberg, Professor and Director ofthe SILS Metadata Research Centerjaneg@email.unc.edu
  • Overview▪ Why should we care?▪ What is data?▪ What is metadata‘s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A
  • Why should we care?BIG stuff▪ Digital data deluge (Hey & Trefethen, 2003)▪ Big data (New York Times) 2008▪ The fourth paradigm (Jim Gray, 2007)Just as important▪ The long tail (Heidorn, 2008)▪ CODATA/Data-at-Risk Task Group▪ Scholarly communications, data citation Technological affordances for improving and advancing science
  • Cultural shift toward data sharing▪ National and international policies – US NSF and NIH [1, 2] – OECD (Organisation for Economic Co-operation and Development) [3] – INSPIRE Infrastructure for Spatial Information in the European Community EU Commission [4] – UK Medical Research Council [5] Dryad ―enables scientists to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies.‖ (http://datadryad.org/)
  • Overview▪ Why should we care?▪ What is data?▪ What is metadata‘s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A
  • Data▪ No single agreed upon definition▪ One person‘s data is another person‘s information▪ Data often implies the ―raw‖ stuff lacking context – Scholarly context, written assessment▪ ―Essence of science‖ (Greenberg, et al, 2009)▪ What is science? – The Archaeology Data Service (ADS) archaeologydataservice.ac.uk
  • Data quantity type The Dryad Repository 3162 Plain TextI know it when I see it 476 Microsoft Excel 308 Adobe Portable Document FormatBy example: Traditional 302 Comma-separated valuesobservations, numbers, and 252 Nexusmeasures stored in spreadsheets 153 Microsoft Excel OpenXMLand databases, fossils, 108 Microsoft Wordphylogenetic trees, and herbarium 80 Zip filesamples (White, 2008) 62 JPEG image 45 Microsoft Word OpenXMLOther disciplines 40 Extensible Markup Language▪ Bioinformatics: Gene 35 Hypertext Markup Language expressions, DNA transcription 21 Rich Text Format to RNA translation 16 FASTA sequence file 15 Tag Image File Format▪ Geology, agriculture, 14 Postscript Files surveillance, and historical 2 Video Quicktime manuscript research: 2 Mathematica Notebook Hyperspectral remote sensing 1 Microsoft Powerpoint (email w/R. Scherle, July 2012)
  • Overview▪ Why should we care?▪ What is data?▪ What is metadata‘s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A
  • Metadata defined……data about data…….information about data▪―Metadata or ‗data about data‘ describes thecontent, quality, condition, and othercharacteristics of data.‖ (FGDC Metadata WG,1998)▪Structured information about an object (data)that facilitates functions associated with theobject. (Greenberg, 2002, 2003, 2009)
  • Typical functions Control Discover Manage rights Identify Certify Indicate versions authenticity statusMark conent Situate Describe strucure geospatially processes
  • Overview▪ Why should we care?▪ What is data?▪ What is metadata‘s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A
  • It gets messy really quickly
  • Metadata for Scientific Research Data Descriptive – General to granular ▪Value (addressing a topic, ―aboutness‖) – Topical (ontologies, subject heading lists/thesauri, taxonomies) ▪Named entities – Name authority files (people, organizations, geographical jurisdictions, structures, and events) ▪Geo-spatial (coordinates) ▪Temporal data (ISO 8601/ W3CDTF, or …)
  • Given the messiness…―I cannot tell you exactly what metadatastandards, vocabularies, etc. to use…‖
  • Examining metadata schemes Objectives and Domains Architectural layout principles • Objectives • Discipline • Structural design • Genre • Extent • Principles • Format • GranularityMetadata Objectives and principles, Domain, andArchitectural Layout (MODAL) framework(Greenberg, 2005; Willis, et al, JASIST 2012)
  • Objectives and Domains ArchitecturalSimple principles layoutschemes[6] • Interoperability • Multi- • Primarily flat • Easy to disciplinary • Minimal with generate, • Any genre or means to lower barrier format extend to produce • General (not granular)Dublin CoreMetadataElement Set(DCMES)ver.1.1US MARC • Need training • Primarily flatbibliographic • ExtensibleformatDataCite • Primarily flat
  • Dublin Core Application Profile- Dryad [7]
  • DataCite example, ver.2.2 [8]National Institute forEnvironmental Studies andCenter for Climate SystemResearch Japan
  • US MARC bibliographicformat: World OceanCirculation Experiment globaldata (Moss Landing MarineLabs and the Monterey BayAquarium Research InstituteLibrary) [9]
  • Objectives and Domains ArchitecturalSimple/ principles layoutmoderate  Interoperability  Greater domain  Primarily flat balanced focus  Extensibility—schemes w/specific  Genera via connecting needs diversity within  Slightly more  Generation a domain granular requires more expertiseDarwin CoreAccess to • Not as flatBiologicalCollections Data(ABCD)EcologicalMetadataLanguageDCMI Terms • Graph approach
  • Wieczorek, et al. (2012). Darwin Core: An Evolving Community-Developed Biodiversity Data Standard.PLoS One. 2012; 7(1): e29715: doi: 10.1371/journal.pone.0029715.
  • Access to Biological Collections Data (ABCD) (A minimum record)<?xml version=1.0 encoding=UTF-8?> <DataSetsxmlns=http://www.tdwg.org/schemas/abcd/2.06><DataSet><TechnicalContacts> <TechnicalContact> <Name>GerdMÃŒller</Name> <Email>gerd@dfb.de</Email></TechnicalContact> </TechnicalContacts><ContentContacts> <ContentContact> <Name>AAnother</Name> <Email>a.another@fake.org</Email></ContentContact> </ContentContacts> <Metadata><Description> <Representation language=en><Title>PonTaurus collection</Title> </Representation></Description> <RevisionData> <DateModified>2001-03-01T00:00:00</DateModified> </RevisionData> </Metadata><Units> <Unit><SourceInstitutionID>BGBM</SourceInstitutionID><SourceID>PonTaurus</SourceID> <UnitID>1136</UnitID></Unit> </Units> </DataSet> </DataSets>
  • abstract educationLevel modifiedaccessRights extent provenanceaccrualMethod format publisheraccrualPeriodicity hasFormat referencesaccrualPolicy hasPart relationalternative hasVersion replacesaudience identifier requiresavailable instructionalMethod rightsbibliographicCitation isFormatOf rightsHolderconformsTo isPartOf sourcecontributor isReferencedBy spatialcoverage isReplacedBy subjectcreated isRequiredBy tableOfContentscreator issued temporaldate isVersionOf titledateAccepted language typedateCopyrighted license validdateSubmitted mediator Properties in the /terms/description medium namespace
  • Objectives and Domains ArchitecturalComplex principles layoutschemes  Interoperability • Genre focus  Hierarchical level • Format  Extensive  Generation variation  Granular requires greater expertiseFGDCDDIContent Standard for Digital Data Document Initiative (DDI)Geospatial Metadata(CSDGM)/FGDC1. Identification Information (M) 1. Concept2. Data Quality Information 2. Collecting3. Spatial Data Organization Information 3. Processing  Archiving4. Spatial Reference Information 4. Distribution  Archiving5. Entity and Attribute Information 5. Discovery6. Distribution Information 6. Analysis7. Metadata Reference Information (M) 7. Repurposing
  • Summary for descriptive schemes▪ Simple: Interoperable, Easy to generate/low barrier, generally multidisciplinary, genera/format agnostics, primarily flat, general (not granular), 15-25 properties▪ Simple/moderate: Interoperability balanced w/specific needs, generation requires more expertise, greater domain focus, extensible--via connecting to other schemes, more granular, more properties▪ Complex: Interoperable level, generation requires expertise, genera focus/format variation, hierarchical, granular, and extensive (100+ properties)
  • Overview▪ Why should we care?▪ What is data?▪ What is metadata‘s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A
  • Challenges and opportunitiesChallenges OpportunitiesWorkflow/When to Educate scientists early (Qin, 2009) ▪ Stopgenerate the here Integrate into social setting w/Center formetadata? Embedded Networked Sensing (CENS) (Borgman, Mayernik, etc., 2009-current; Mayernik‘s dissertation, 2011)Methods for generating Use automatic techniques as much as possible,metadata (labor leverage human expertise (Dryad, DataOne Excelintensive) project)Too many standards Don‘t panic, join communities, look forWhich one do I use? examples. (If you can‘t find them?)Do I need to No. Explore and develop a best practice.implement my Pursue a 2 pronged approach (Greenberg, et al,metadata as linked 2009)data.
  • Jumping in…1. DCMI/NISO Seminars !!2. DCMI Science and Metadata Community (http://wiki.dublincore.org/index.php/DCMI_Science_And_Metadata)3. Digital Curation Center (DCC) (http://www.dcc.ac.uk/)4. The Research Data Management Training, or MANTRA project (http://datalib.edina.ac.uk/mantra/)5. DataONE workshops and tutorials (www.dataone.org/)
  • Overview▪ Why should we care?▪ What is data?▪ What is metadata‘s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A
  • Concluding comments▪ Standards are guidelines; no police – Aim for reasonable quality▪ KISS: Keep it simple stupid – What’s vital; what will aid reuse?▪ Help to move the practice forward – Share what you learn▪ Nothing new/it‘s all new – Data documentation since ancient times – SILOS; let‘s break them down (Willis, et al, 2012) – Greater connectivity than ever – Cross-disciplinary approaches for problem solving
  • Overview▪ Why should we care?▪ What is data?▪ What is metadata‘s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A
  • Footnotes[1] NSF Data Sharing Policy: http://www.nsf.gov/bfa/dias/policy/dmp.jsp.[2] NIH Data Sharing Policy: http://grants.nih.gov/grants/policy/data_sharing/.[3] ORGANISATION FOR ECONOMIC CO-OPERATION AND DEVELOPMENT/Data andMetadata Reporting and Presentation Handbook: http://www.oecd.org/std/37671574.pdf.[4] The INSPIRE Infrastructure for Spatial Information in the European Community):http://inspire.ec.europa.eu/index.cfm/pageid/48. directive released 15 May 2007 and will beimplemented in various stages, with full implementation required by 2019, and aims to create aEuropean Union (EU) spatial data infrastructure.[5] UK medical research council:http://www.mrc.ac.uk/Ourresearch/Ethicsresearchguidance/datasharing/index.html.[6] The DCMI Glossary (scroll down for ―schema‖ entry):http://dublincore.org/documents/usageguide/glossary.shtml#schema.[7] Dublin Core Example: Data from: Divergence time estimation using fossils as terminal taxaand the origins of Lissamphibia (Dryad repository):http://datadryad.org/resource/doi:10.5061/dryad.8120?show=full.[8] National Institute for Environmental Studies and Center for Climate System ResearchJapan—animation data (DataCite): http://schema.datacite.org/meta/kernel-2.2/example/datacite-metadata-sample-v2.2.xml.[9] US MARC bibliographic format: World Ocean Circulation Experiment global data (MossLanding Marine Labs and the Monterey Bay Aquarium Research Institute Library):http://mlml.kohalibrary.com/cgi-bin/koha/opac-detail.pl?biblionumber=9282.