Building a Data Discovery Network for Sustainability Science

Robert H. McDonald
Robert H. McDonaldAssociate Dean for Library Technologies | Deputy Director Data to Insight Center
Building a Data Discovery Network
            for Sustainability Science
                                            Robert H. McDonald
                                 Deputy Director Data to Insight (D2I) Center
                                       Associate Dean – IU Libraries
                                             Indiana University
                            rhmcdona@indiana.edu | @mcdonald @SEADdatanet

                                  Presented at the VIVO 2012 Conference

                                         Miami, FL– August 24, 2012
http://slidesha.re/Q9q8VW        Available from: http://slidesha.re/Q9q8VW


                                                 © Trustees of Indiana University
                                                 Released under Creative Commons 3.0
                                                 unported license; license terms on last slide.
NSF DataNet Program
Motivation:
  “… one of the major challenges of this scientific
  generation: how to develop the new methods,
  management structures and technologies to
  manage the diversity, size, and complexity of
  current and future data sets and data streams.”
Response:
  DataNet creates “a set of exemplar national
  and global data research infrastructure
  organizations” to address this challenge.
Current NSF DataNet Projects
• SEAD
   – http://sead-data.net
• DataOne
   – http://www.dataone.org
• DataNet Federation Consortium
   – http://datafed.org
• Terra Populous
   – https://www.pop.umn.edu/terra_pop
Sustainable Environment Actionable Data
              (SEAD) - DataNet
• SEAD Strategy                     SEAD Partners - http://sead-data.net
  ― Serve scientists and
    researchers in the “long
    tail” of science
  ― Leverage social media for
    discovery of
    data, interest, and
    expertise
  ― Move data curation
    upstream in the data life
    cycle of science
  ― Take advantage of
    existing domain and
    institutional infrastructures
    (Institutional
    Repositories, ICPSR) for
    long-term preservation
SEAD TEAMS
              Margaret Hedstrom-PI, Ann Zimmerman-Co-PI, Karen
 Michigan     Woollams, George Alter (ICPSR), Bryan Beecher (ICPSR), Jude Yew

              Beth Plale-Co-PI, Katy Börner, Robert H. McDonald, Robert
  Indiana     Light, Kavitha Chandrasekar, Stacy Kowalczyk, Robert Ping

              James Myers-Co-PI, Ram Prasanna Govind Krishnan, Lindsay Todd
Rensselaear

              Praveen Kumar-Co-PI, Md Aktaruzzaman, Terry McLaren (NCSA), Rob
  Illinois    Kooper (NCSA), Luigi Marini (NCSA)
SEAD 18 month Pilot Phase
•   Domain Engagement:
     – National Center for Earth Systems Dynamics (NCESD), Illinois River
        Basin Observatory
     – Requirements, Use Cases, Prioritization of Data Types and Services
•   Active and Social Curation
     – Pilot Active Content Repository, VIVO deployments
     – Exemplar services for Data Ingest, Discovery, Re-use, Curation
        (Tupelo/Medici)
•   CI for Long-term Access (Virtual Archive)
     – Data model, protocol design/development
     – Pilot Federated Repository infrastructure
•   Education, Outreach, and Training
     – Post-doc mentoring
     – Web site, training materials, meetings, workshops, …
•   Project Oversight
     – Management, reporting, committees
     – Business model development
Sustainability
        Science

               Science


Cooperation               Technology




  Policy                  Economics

              Poverty &
               Justice




                                       7
Data challenges
•   Heterogeneity of
    all kinds
•   Multiple scales
•   Multidisciplinary
•   Many small
    datasets
The long tail of scientific research


• Small and derived data sets
• Heterogeneous data
• Multiple sources of data
• Short-lived data with long-term
  value
• Value of data grows when combined
  & integrated
Building a Data Discovery Network for Sustainability Science
SEAD notions of defined Data Phases
• Phases of data lifecycle acknowledge and accommodate
  the difference between public data and data still in work by a
  researcher.
• Research Data Phase: data set is research data
  collection, owned by individual and under their control.
   – Data need not be licensed at this time because it is not
     ready for broader release
   – Data need not have permanent IDs because still work in
     progress
   – Corresponds to first existence in Active Curation Repository
• Published Phase: Owner of research data collection
  determines that dataset is ready for publication
   – License terms set
   – Persistent ID
   – Made available as part of public profile in VIVO
   – Activated by user-controlled publish event
SEAD CI Technical Approach
       Active and Social Curation                                       OAIS Repository Federation
                                               Curation Boundary
                                                                       Automated Curation
      Data              Metadata                                         Workflow/Rule
  Acquisition, A                                                            Engine
                       Management
   nalysis and                                                               Operates on
   Simulation                  DDI3.
                                                                       Metadata, Content Objects       Scholarly
                      METS, PREMIS, MODS, DC
                        , SensorML, OGC, …                                and Trigger Events         Communication

                                           Ingest scripts:
                                                                                          Ingest, AIPs
                              Appraisal   fixity, integrity,                        Compound Objects - OAI-ORE
                                 and CI Technical Approach
                                          authentication,
   VIVO/
Linked Data     Active        Selection   transformation                     Digital Repository Federation
               Content                                                         (OAIS compliant)
                                                                                                          Preservation
              Repository                                                                                    Actions
                                                                                       Dissemination Packages

                                         Wide-Area File System

  Search, Browse                                                       Migration
          ,                                                               and                  Access Mechanisms and
  Annotation, Vis     Use, Reuse, Rep
                                                                       Emulation                E-Scholarship Services
  ualization Tools    urposing Tools              Contributor   User
                                                                         Tools
SEAD CI Technical Approach
     Active and Social Curation                                       OAIS Repository Federation
                                             Curation Boundary
                                                                     Automated Curation
    Data      Metadata                                                 Workflow/Rule
Acquisition, A                                                            Engine
A            Management
     standardized data
                    model
 nalysis and                                                               Operates on
                             DDI3.                                                                   Scholarly
and federation capability
 Simulation         METS, PREMIS, MODS, DC
                      , SensorML, OGC, …
                                                                     Metadata, Content Objects
                                                                        and Trigger Events         Communication
over OAIS-Standard
                    Ingest scripts:
Institutional Repositories                                                              Ingest, AIPs
                            Appraisal fixity, integrity, aut
                                                                                  Compound Objects - OAI-ORE
                               and CI Technical Approach
                                      hentication, transf
                            Selection       ormation
SEAD Active Data                                                           Digital Repository Federation
                                                                             (OAIS compliant)
    Systems                                                                                             Preservation
                                                                                                          Actions
                                                                                     Dissemination Packages

                                       Wide-Area File System

Search, Browse                                                       Migration
        ,                                                               and                  Access Mechanisms and
Annotation, Vis     Use, Reuse, Rep
                                                                     Emulation                E-Scholarship Services
ualization Tools    urposing Tools              Contributor   User
                                                                       Tools
Building a Data Discovery Network for Sustainability Science
SEAD CI Technical Approach
       Active and Social Curation                                         OAIS Repository Federation
                                                 Curation Boundary
                     A robust, replicated distributed file
                     system used as a large-scaleCuration
                                            Automated
                                                        backing
                                              Workflow/Rule
      Data               Metadata
  Acquisition, A     store
                        Management
                                                 Engine
   nalysis and                                                                 Operates on
   Simulation                    DDI3.
                                                                         Metadata, Content Objects       Scholarly
                        METS, PREMIS, MODS, DC
                          , SensorML, OGC, …                                and Trigger Events         Communication

                                             Ingest scripts:
                                                                                            Ingest, AIPs
                                Appraisal fixity, integrity, aut
                                                                                      Compound Objects - OAI-ORE
                                   and CI Technical Approach
                                          hentication, transf
   VIVO/
Linked Data     Active          Selection       ormation                       SEAD Trusted
                                                                                Digital Repository Federation
               Content                                                           (OAIS compliant)           Preservation
              Repository                                                                                      Actions
                                                                                         Dissemination Packages

                                           Wide-Area File System

  Search, Browse                                                         Migration
          ,                                                                 and                  Access Mechanisms and
  Annotation, Vis       Use, Reuse, Rep
                                                                         Emulation                E-Scholarship Services
  ualization Tools      urposing Tools              Contributor   User
                                                                           Tools
SEAD CI Technical Approach
       Active and Social Curation                                             OAIS Repository Federation
                                               Curation Boundary
                                                                            Automated Curation
      Data              MetadataAn Active Content               Repository    Workflow/Rule
                                                                                 Engine
  Acquisition, A
   nalysis and
                                based on standard
                       Management                               global IDs and
                                                           Operates on
   Simulation                  DDI3.
                                       semantic web technologies Scholarly
                      METS, PREMIS, MODS, DC
                                                     Metadata, Content Objects
                        , SensorML, OGC, …              and Trigger Events                             Communication
                                      - to collect and integrate
                                   data, metadata, and provenance
                                          Ingest scripts:
                                                                  Ingest, AIPs
                              Appraisal  fixity, integrity,
                                   information from multiple sources.
                                                            Compound Objects - OAI-ORE
                                 and CI Technical Approach
                                          authentication,
   VIVO/
Linked Data     Active        Selection   transformation
                                                      DC:Creator                   SEAD Trusted
                                                                   OPM:wasDerivedFrom
                                                                                    Digital Repository Federation
               Content                                             SWAN:isEvidenceFor…
                                                                                    (OAIS compliant)        Preservation
                                                                                                            Content
              Repository                                                                                      Content
                                                                                                              Actions
                                                                                                               Content
                                         Content                                          Dissemination Packages

                                         Wide-Area File System                           Lustre File
                                                                                          System
  Search, Browse                                                             Migration
          ,                                                                     and              Access Mechanisms and
  Annotation, Vis     Use, Reuse, Rep
                                                                             Emulation            E-Scholarship Services
  ualization Tools    urposing Tools              Contributor   User
                                                                               Tools
Building a Data Discovery Network for Sustainability Science
SEAD CI Technical Approach
       Active and Social Curation                                        OAIS Repository Federation
                                                Curation Boundary
                                                                        Automated Curation
      Data              SEAD will run a VIVO instance
                        Metadata                                          Workflow/Rule
  Acquisition, A                                                             Engine
   nalysis and
                        and may harvest Linked Data from
                       Management
                                                 Operates on
                                                                       Scholarly
   Simulation           other sources
                                DDI3.
                                           Metadata, Content Objects
                       METS, PREMIS, MODS, DC
                         , SensorML, OGC, …   and Trigger Events     Communication

                                            Ingest scripts:
                                                                                       Ingest, AIPs
                               Appraisal fixity, integrity, aut
                                                                                 Compound Objects - OAI-ORE
                                  and CI Technical Approach
                                         hentication, transf
   VIVO/
Linked Data     Active         Selection       ormation
                                             Digital Repository Federation
               Content VIVO Application: Open (OAIS compliant)
                                                Source                Preservation
              Repositoryfederatable Researcher                           Actions
                                                     Dissemination Packages
                         Information –
                               Wide-Area File System
                         people, papers, projects, center
                         s, fields, etc.
  Search, Browse                                                        Migration
          ,                                                                and           Access Mechanisms and
  Annotation, Vis      Use, Reuse, Rep
                                                                        Emulation         E-Scholarship Services
  ualization Tools     urposing Tools              Contributor   User
                                                                          Tools
Building a Data Discovery Network for Sustainability Science
SEAD CI Technical Approach
      Active and Social Curation                                        OAIS Repository Federation
                                               Curation Boundary
                                                                       Automated Curation
      Data              Metadata                                         Workflow/Rule
  Acquisition,                                                              Engine
                       Management
  Analysis and                                                               Operates on
   Simulation                  DDI3.
                                                                       Metadata, Content Objects       Scholarly
                      METS, PREMIS, MODS, DC
                        , SensorML, OGC, …                                and Trigger Events         Communication

                                           Ingest scripts:
                              Active and Social Curation AIPs
                              Appraisal fixity, integrity, aut
                                                                        Ingest,
                                                                  Compound Objects - OAI-ORE
   VIVO/
                              Services supporting automated and
                                 and CI Technical Approach
                                        hentication, transf
                              Selection
Linked Data     Active        interactive use use SEAD Trusted
                                              ormation
                              and interactive of SEADSEAD Federation
                                                               of Repository
                                                               Digital
               Content        - leveraging standard (OAIS compliant)
                                                    web                      Preservation
              Repository                                                       Actions
                              application/web service toolkits and
                                                           Dissemination Packages
                              virtual machine infrastructure
                              and virtual machine infrastructure
                                         Wide-Area File System

  Search, Browse                                                       Migration
          ,                                                               and                  Access Mechanisms and
  Annotation, Vis     Use, Reuse, Rep
                                                                       Emulation                E-Scholarship Services
  ualization Tools    urposing Tools              Contributor   User
                                                                         Tools
SEAD CI Technical Approach
     Active and Social Curation                                       OAIS Repository Federation
                                             Curation Boundary
                                                                     Automated Curation
    Data             Metadata                                          Workflow/Rule
Acquisition, A                                                            Engine
                    Management
 nalysis and                                                               Operates on
 Simulation                  DDI3.
                                                                     Metadata, Content Objects       Scholarly
                    METS, PREMIS, MODS, DC
                      , SensorML, OGC, …                                and Trigger Events         Communication

                                         Ingest scripts:
                                                                                        Ingest, AIPs
                            Appraisal fixity, integrity, aut
                                                                                  Compound Objects - OAI-ORE
                               and CI Technical Approach
                                      hentication, transf
                            Selection       ormation                       SEAD Trusted
  Active Content and Preservation Services Repository Federation
    Curation                           Digital also
   Repository                          (OAIS compliant)    Preservation
     leveraging standard web application/web                  Actions

     service toolkits and virtual machine Dissemination Packages

     infrastructure Wide-Area File System
Search, Browse                                                       Migration
        ,                                                               and                  Access Mechanisms and
Annotation, Vis     Use, Reuse, Rep
                                                                     Emulation                E-Scholarship Services
ualization Tools    urposing Tools              Contributor   User
                                                                       Tools
Active and Social Curation                                        OAIS Repository Federation
                                               Curation Boundary
                                                                       Automated Curation
      Data              Metadata                                         Workflow/Rule
  Acquisition,                                                              Engine
                       Management
  Analysis and                                                               Operates on
   Simulation                  DDI3.
                                                                       Metadata, Content Objects       Scholarly
                      METS, PREMIS, MODS, DC
                        , SensorML, OGC, …                                and Trigger Events         Communication

                                           Ingest scripts:
                                                                                          Ingest, AIPs
                              Appraisal fixity, integrity, aut
                                                                                    Compound Objects - OAI-ORE
                                 and    hentication, transf
   VIVO/
Linked Data     Active        Selection       ormation                       Digital Repository Federation
               Content                                                         (OAIS compliant)
                                                                                                          Preservation
              Repository                                                                                    Actions
                                                                                       Dissemination Packages

                                         Wide-Area File System

  Search, Browse                                                       Migration
          ,                                                               and                  Access Mechanisms and
  Annotation, Vis     Use, Reuse, Rep
                                                                       Emulation                E-Scholarship Services
  ualization Tools    urposing Tools              Contributor   User
                                                                         Tools
SEAD Active/Social Curation
        Repository
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
SEAD VIVO: RIS2N3
SEAD Virtual Archive
Faceted search(Solr-based)




                         Facets
Search Result
A dataset or file looks like this
Geospatial search(from Postgres
             index)
Geospatial search results
Login for data upload
Upload file




Files from Medici can also be added
Create collection (can have
       multiple files)
Upload complete
Data ingested to DSpace
  (Mississippi example)
SEAD Virtual Archive Architecture

                                     VIVO
                                                               IU                                          IR
                                                           DataCite                                     DSpace
                                    server
                                                           ID Server

                                                                                Store data object, its metadata object, and
                                                                                its relationship record (latter as RDF) in IR
                                                                                as a collection



                                       Register            Obtain
                                       DOI with           DOI from              register metadata to SOLR and PostgreS
                                        VIVO              DataCite              for rapid retrieval of metadata
            SIP
          (Data+                                                                                                     Core
         Metadata)                                                                                    Solr         Property+
                        Data
 SEAD                Validation
                                       Feature               SIP         AIP                         Index          Domain
Ingest                                Extraction         breakdown                                                 Metadata
                     (Fixity, Vir                                        +
                                      from Data
Client                   us)                                            Data
  /UI      Ack
                                                                                                                Geospatial
                                                                                                   PostgreS     +Temporal
                                    Preservation Metadata Generation (Events)                      QL Index     Metadata
Key Questions for SEAD
          Prototype
• What could SEAD capture when?
• How can SEAD provide direct
  value to data
  producers, users, and curators?
• How can web 2.0/3.0 and social
  computing lower barriers and
  reduce/realign costs?
Towards A Shared Data Future
                          Data                                                  User functionalities, data
                                                   Users                        capture & transfer, virtual
                        Generators                                              research environments
        Data Curation




                                                                                Data discovery &
                                                                                navigation, workflow
                          Community Support Services
Trust




                                                                                generation, annotation, interpre
                                                                                tability




                                                                                Persistent
                                                                                storage, identification, authentic
                             Common Data Services                               ity (provenance), workflow
                                                                                execution, data mining


                                          Source: EU HLEG Report on Data Deluge: Riding the Wave, pg 31, 2010
Data Interoperability
• NSF OCI: DataNet and INTEROP now
  DIBBs
• EUDAT
• Data Web Forum
• IETF Research Data Identifier BOF
• Upcoming Oct. US Meeting of
  DataNet, INTEROP, Data Web Forum
Acknowledgements
SEAD is funded by the National Science
Foundation under cooperative agreement
#OCI0940824

• For more on SEAD go to:
• http://sead-data.net

• Follow us on Twitter
  @SEADdatanet



                            http://sead-data.net
License terms
•   Please cite as: McDonald, R.H. et. al. Building a Data Discovery Network for
    Sustainability Science. 3rd International VIVO Conference, Miami, FL, 24
    August 2012. Available from: [http://slidesha.re/Q9q8VW]

•   Thanks to Margaret Hedstrom, who’s guided the team through the (really)
    lengthy review process and to Jim Myers, Beth Plale, Praveen Kumar, Terry
    McLaren, Luigi Marini, Kavitha Chandrasekar and others who provided
    content for this presentation.
•   The concepts and software being leveraged in SEAD represent the work of a
    broad range of people over multiple years – their contributions have been
    critical to launching SEAD.
•   Items indicated with a © are under copyright and used here with permission.
    Such items may not be reused without permission from the holder of copyright
    except where license terms noted on a slide permit reuse.
•   This document is released under the Creative Commons Attribution 3.0
    Unported license (http://creativecommons.org/licenses/by/3.0/). This license
    includes the following terms: You are free to share – to copy, distribute and
    transmit the work and to remix – to adapt the work under the following
    conditions: attribution – you must attribute the work in the manner specified
    by the author or licensor (but not in any way that suggests that they endorse
    you or your use of the work). For any reuse or distribution, you must make clear
    to others the license terms of this work.
1 of 54

Recommended

Repository Federation: Towards Data Interoperability by
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRobert H. McDonald
807 views8 slides
SEAD Virtual Archive: Building a Federation of Institutional Repositories fo... by
 SEAD Virtual Archive: Building a Federation of Institutional Repositories fo... SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...skonkiel
1.5K views18 slides
SEAD Datanet and Sustainability Science by
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
996 views42 slides
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a... by
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
1.1K views29 slides
SEAD slide set (October 2011) by
SEAD slide set (October 2011)SEAD slide set (October 2011)
SEAD slide set (October 2011)SEAD
539 views11 slides
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011) by
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)SEAD
869 views15 slides

More Related Content

What's hot

ESA14 Workshop on SEAD's Data Services and Tools by
ESA14 Workshop on SEAD's Data Services and ToolsESA14 Workshop on SEAD's Data Services and Tools
ESA14 Workshop on SEAD's Data Services and ToolsSEAD
1.4K views17 slides
Digital Curation Technology: JHU Summit, October 2015 by
Digital Curation Technology: JHU Summit, October 2015Digital Curation Technology: JHU Summit, October 2015
Digital Curation Technology: JHU Summit, October 2015The Metropolitan Museum of Art
2.2K views47 slides
Practical and Conceptual Considerations of Research Object Preservation by
Practical and Conceptual Considerations of Research Object PreservationPractical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object PreservationSEAD
1.3K views8 slides
How Portable Are the Metadata Standards for Scientific Data? by
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?Jian Qin
784 views20 slides
D paul ecn2013 by
D paul ecn2013D paul ecn2013
D paul ecn2013ECNOfficer
400 views21 slides
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ... by
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...SEAD
827 views18 slides

What's hot(20)

ESA14 Workshop on SEAD's Data Services and Tools by SEAD
ESA14 Workshop on SEAD's Data Services and ToolsESA14 Workshop on SEAD's Data Services and Tools
ESA14 Workshop on SEAD's Data Services and Tools
SEAD1.4K views
Practical and Conceptual Considerations of Research Object Preservation by SEAD
Practical and Conceptual Considerations of Research Object PreservationPractical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object Preservation
SEAD1.3K views
How Portable Are the Metadata Standards for Scientific Data? by Jian Qin
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
Jian Qin784 views
D paul ecn2013 by ECNOfficer
D paul ecn2013D paul ecn2013
D paul ecn2013
ECNOfficer400 views
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ... by SEAD
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
SEAD827 views
From policy to practice with DMP Online by Sarah Jones
From policy to practice with DMP OnlineFrom policy to practice with DMP Online
From policy to practice with DMP Online
Sarah Jones850 views
Iassist 2012 dms public version by jhudms
Iassist 2012 dms public versionIassist 2012 dms public version
Iassist 2012 dms public version
jhudms865 views
Data Curation Models JHU Barbara Pralle RDAP12 by ASIS&T
Data Curation Models JHU Barbara Pralle RDAP12Data Curation Models JHU Barbara Pralle RDAP12
Data Curation Models JHU Barbara Pralle RDAP12
ASIS&T1.5K views
Needs for Data Management & Citation Throughout the Information Lifecycle by Micah Altman
Needs for Data Management & Citation Throughout  the Information LifecycleNeeds for Data Management & Citation Throughout  the Information Lifecycle
Needs for Data Management & Citation Throughout the Information Lifecycle
Micah Altman8.7K views
Preparing eScience librarians -- RDAP 2012 by Jian Qin
Preparing eScience librarians -- RDAP 2012 Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012
Jian Qin674 views
Supporting UC Research Data Management by slabrams
Supporting UC Research Data ManagementSupporting UC Research Data Management
Supporting UC Research Data Management
slabrams284 views
Rdap12 wrap up reagan moore by ASIS&T
Rdap12 wrap up reagan mooreRdap12 wrap up reagan moore
Rdap12 wrap up reagan moore
ASIS&T909 views

Similar to Building a Data Discovery Network for Sustainability Science

CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ... by
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...SEAD
545 views28 slides
Data mining by
Data miningData mining
Data miningAkannsha Totewar
297.5K views35 slides
Saadallah vtls by
Saadallah vtlsSaadallah vtls
Saadallah vtlsJohannes Phaladi
715 views37 slides
Anthony J brookes by
Anthony J brookesAnthony J brookes
Anthony J brookesEduserv
756 views30 slides
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg... by
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...i_scienceEU
2.1K views56 slides
Big Data Beyond Hadoop*: Research Directions for the Future by
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureOdinot Stanislas
3.1K views36 slides

Similar to Building a Data Discovery Network for Sustainability Science(20)

CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ... by SEAD
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
SEAD545 views
Anthony J brookes by Eduserv
Anthony J brookesAnthony J brookes
Anthony J brookes
Eduserv756 views
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg... by i_scienceEU
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
i_scienceEU2.1K views
Big Data Beyond Hadoop*: Research Directions for the Future by Odinot Stanislas
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
Odinot Stanislas3.1K views
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr by Grant Ingersoll
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Grant Ingersoll846 views
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr by Grant Ingersoll
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Grant Ingersoll770 views
Analytic Platforms in the Real World with 451Research and Calpont_July 2012 by Calpont Corporation
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Calpont Corporation1.3K views
e-Science, Research Data and Libaries by Rob Grim
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
Rob Grim599 views
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final by Deborah McGuinness
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
Deborah McGuinness658 views
Metadata in general and Dublin Core in specific; some experiences by Kerstin Forsberg
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiences
Kerstin Forsberg980 views
Functional and Architectural Requirements for Metadata: Supporting Discovery... by Jian Qin
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Jian Qin1.8K views
Metadata and Taxonomies for More Flexible Information Architecture by jrhowe
Metadata and Taxonomies for More Flexible Information Architecture Metadata and Taxonomies for More Flexible Information Architecture
Metadata and Taxonomies for More Flexible Information Architecture
jrhowe1.1K views
Semantic Web powering Enterprise and Web Applications by Amit Sheth
Semantic Web powering Enterprise and Web ApplicationsSemantic Web powering Enterprise and Web Applications
Semantic Web powering Enterprise and Web Applications
Amit Sheth1K views
FAIRy stories: the FAIR Data principles in theory and in practice by Carole Goble
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble246 views
Mending the Gap between Library's Electronic and Print Collections in ILS and... by New York University
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...
New York University1.2K views

More from Robert H. McDonald

ER&L The Role of Choice in the Future of Discovery Evaluations Panel by
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelRobert H. McDonald
2.6K views38 slides
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras... by
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...Robert H. McDonald
888 views28 slides
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat... by
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Robert H. McDonald
2K views44 slides
JCDL 2015 Tutorial Opening Slides by
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesRobert H. McDonald
1.6K views23 slides
TLT Discussion on "Saving My Stuff" - 06.05.15 by
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15Robert H. McDonald
2.2K views11 slides
The HathiTrust Research Center: An Overview of Advanced Computational Services by
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesRobert H. McDonald
1.1K views24 slides

More from Robert H. McDonald(20)

ER&L The Role of Choice in the Future of Discovery Evaluations Panel by Robert H. McDonald
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations Panel
Robert H. McDonald2.6K views
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras... by Robert H. McDonald
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
Robert H. McDonald888 views
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat... by Robert H. McDonald
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
TLT Discussion on "Saving My Stuff" - 06.05.15 by Robert H. McDonald
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15
Robert H. McDonald2.2K views
The HathiTrust Research Center: An Overview of Advanced Computational Services by Robert H. McDonald
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational Services
Robert H. McDonald1.1K views
Elephant in the Room: Scaling Storage for the HathiTrust Research Center by Robert H. McDonald
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Robert H. McDonald1.4K views
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO... by Robert H. McDonald
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Robert H. McDonald2.1K views
HathiTrust Research Center Data Capsule Overview 09.10.14 by Robert H. McDonald
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14
Robert H. McDonald945 views
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework by Robert H. McDonald
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
Robert H. McDonald6.2K views
Owning the Discovery Experience for Your Patrons by Robert H. McDonald
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your Patrons
Robert H. McDonald1.6K views
Kuali OLE: Enabling Choices for Libraries by Robert H. McDonald
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for Libraries
Robert H. McDonald1.6K views
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud by Robert H. McDonald
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Robert H. McDonald827 views
The HathiTrust Research Center (HTRC): An Overview and Demo by Robert H. McDonald
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and Demo
Robert H. McDonald2.6K views
New Perspectives for Business Intelligence: Library and Research Technologies... by Robert H. McDonald
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...
Robert H. McDonald2.8K views
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ... by Robert H. McDonald
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Robert H. McDonald943 views
GOKb & KB+: An International Partnership to leverage Open Access and Communit... by Robert H. McDonald
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
Robert H. McDonald1.7K views

Recently uploaded

Mineral nutrition and Fertilizer use of Cashew by
 Mineral nutrition and Fertilizer use of Cashew Mineral nutrition and Fertilizer use of Cashew
Mineral nutrition and Fertilizer use of CashewAruna Srikantha Jayawardana
58 views107 slides
Education of marginalized and socially disadvantages segments.pptx by
Education of marginalized and socially disadvantages segments.pptxEducation of marginalized and socially disadvantages segments.pptx
Education of marginalized and socially disadvantages segments.pptxGarimaBhati5
47 views36 slides
BUSINESS ETHICS MODULE 1 UNIT I_A.pdf by
BUSINESS ETHICS MODULE 1 UNIT I_A.pdfBUSINESS ETHICS MODULE 1 UNIT I_A.pdf
BUSINESS ETHICS MODULE 1 UNIT I_A.pdfDr Vijay Vishwakarma
92 views25 slides
EILO EXCURSION PROGRAMME 2023 by
EILO EXCURSION PROGRAMME 2023EILO EXCURSION PROGRAMME 2023
EILO EXCURSION PROGRAMME 2023info33492
208 views40 slides
unidad 3.pdf by
unidad 3.pdfunidad 3.pdf
unidad 3.pdfMarcosRodriguezUcedo
138 views38 slides
DISTILLATION.pptx by
DISTILLATION.pptxDISTILLATION.pptx
DISTILLATION.pptxAnupkumar Sharma
75 views47 slides

Recently uploaded(20)

Education of marginalized and socially disadvantages segments.pptx by GarimaBhati5
Education of marginalized and socially disadvantages segments.pptxEducation of marginalized and socially disadvantages segments.pptx
Education of marginalized and socially disadvantages segments.pptx
GarimaBhati547 views
EILO EXCURSION PROGRAMME 2023 by info33492
EILO EXCURSION PROGRAMME 2023EILO EXCURSION PROGRAMME 2023
EILO EXCURSION PROGRAMME 2023
info33492208 views
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx by Niranjan Chavan
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptxGuidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx
Niranjan Chavan42 views
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE... by Nguyen Thanh Tu Collection
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE...
Nelson_RecordStore.pdf by BrynNelson5
Nelson_RecordStore.pdfNelson_RecordStore.pdf
Nelson_RecordStore.pdf
BrynNelson550 views
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv... by Taste
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...
Taste62 views
Creative Restart 2023: Atila Martins - Craft: A Necessity, Not a Choice by Taste
Creative Restart 2023: Atila Martins - Craft: A Necessity, Not a ChoiceCreative Restart 2023: Atila Martins - Craft: A Necessity, Not a Choice
Creative Restart 2023: Atila Martins - Craft: A Necessity, Not a Choice
Taste52 views
11.30.23A Poverty and Inequality in America.pptx by mary850239
11.30.23A Poverty and Inequality in America.pptx11.30.23A Poverty and Inequality in America.pptx
11.30.23A Poverty and Inequality in America.pptx
mary850239181 views
Introduction to AERO Supply Chain - #BEAERO Trainning program by Guennoun Wajih
Introduction to AERO Supply Chain  - #BEAERO Trainning programIntroduction to AERO Supply Chain  - #BEAERO Trainning program
Introduction to AERO Supply Chain - #BEAERO Trainning program
Guennoun Wajih123 views
12.5.23 Poverty and Precarity.pptx by mary850239
12.5.23 Poverty and Precarity.pptx12.5.23 Poverty and Precarity.pptx
12.5.23 Poverty and Precarity.pptx
mary850239514 views
INT-244 Topic 6b Confucianism by S Meyer
INT-244 Topic 6b ConfucianismINT-244 Topic 6b Confucianism
INT-244 Topic 6b Confucianism
S Meyer49 views

Building a Data Discovery Network for Sustainability Science

  • 1. Building a Data Discovery Network for Sustainability Science Robert H. McDonald Deputy Director Data to Insight (D2I) Center Associate Dean – IU Libraries Indiana University rhmcdona@indiana.edu | @mcdonald @SEADdatanet Presented at the VIVO 2012 Conference Miami, FL– August 24, 2012 http://slidesha.re/Q9q8VW Available from: http://slidesha.re/Q9q8VW © Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide.
  • 2. NSF DataNet Program Motivation: “… one of the major challenges of this scientific generation: how to develop the new methods, management structures and technologies to manage the diversity, size, and complexity of current and future data sets and data streams.” Response: DataNet creates “a set of exemplar national and global data research infrastructure organizations” to address this challenge.
  • 3. Current NSF DataNet Projects • SEAD – http://sead-data.net • DataOne – http://www.dataone.org • DataNet Federation Consortium – http://datafed.org • Terra Populous – https://www.pop.umn.edu/terra_pop
  • 4. Sustainable Environment Actionable Data (SEAD) - DataNet • SEAD Strategy SEAD Partners - http://sead-data.net ― Serve scientists and researchers in the “long tail” of science ― Leverage social media for discovery of data, interest, and expertise ― Move data curation upstream in the data life cycle of science ― Take advantage of existing domain and institutional infrastructures (Institutional Repositories, ICPSR) for long-term preservation
  • 5. SEAD TEAMS Margaret Hedstrom-PI, Ann Zimmerman-Co-PI, Karen Michigan Woollams, George Alter (ICPSR), Bryan Beecher (ICPSR), Jude Yew Beth Plale-Co-PI, Katy Börner, Robert H. McDonald, Robert Indiana Light, Kavitha Chandrasekar, Stacy Kowalczyk, Robert Ping James Myers-Co-PI, Ram Prasanna Govind Krishnan, Lindsay Todd Rensselaear Praveen Kumar-Co-PI, Md Aktaruzzaman, Terry McLaren (NCSA), Rob Illinois Kooper (NCSA), Luigi Marini (NCSA)
  • 6. SEAD 18 month Pilot Phase • Domain Engagement: – National Center for Earth Systems Dynamics (NCESD), Illinois River Basin Observatory – Requirements, Use Cases, Prioritization of Data Types and Services • Active and Social Curation – Pilot Active Content Repository, VIVO deployments – Exemplar services for Data Ingest, Discovery, Re-use, Curation (Tupelo/Medici) • CI for Long-term Access (Virtual Archive) – Data model, protocol design/development – Pilot Federated Repository infrastructure • Education, Outreach, and Training – Post-doc mentoring – Web site, training materials, meetings, workshops, … • Project Oversight – Management, reporting, committees – Business model development
  • 7. Sustainability Science Science Cooperation Technology Policy Economics Poverty & Justice 7
  • 8. Data challenges • Heterogeneity of all kinds • Multiple scales • Multidisciplinary • Many small datasets
  • 9. The long tail of scientific research • Small and derived data sets • Heterogeneous data • Multiple sources of data • Short-lived data with long-term value • Value of data grows when combined & integrated
  • 11. SEAD notions of defined Data Phases • Phases of data lifecycle acknowledge and accommodate the difference between public data and data still in work by a researcher. • Research Data Phase: data set is research data collection, owned by individual and under their control. – Data need not be licensed at this time because it is not ready for broader release – Data need not have permanent IDs because still work in progress – Corresponds to first existence in Active Curation Repository • Published Phase: Owner of research data collection determines that dataset is ready for publication – License terms set – Persistent ID – Made available as part of public profile in VIVO – Activated by user-controlled publish event
  • 12. SEAD CI Technical Approach Active and Social Curation OAIS Repository Federation Curation Boundary Automated Curation Data Metadata Workflow/Rule Acquisition, A Engine Management nalysis and Operates on Simulation DDI3. Metadata, Content Objects Scholarly METS, PREMIS, MODS, DC , SensorML, OGC, … and Trigger Events Communication Ingest scripts: Ingest, AIPs Appraisal fixity, integrity, Compound Objects - OAI-ORE and CI Technical Approach authentication, VIVO/ Linked Data Active Selection transformation Digital Repository Federation Content (OAIS compliant) Preservation Repository Actions Dissemination Packages Wide-Area File System Search, Browse Migration , and Access Mechanisms and Annotation, Vis Use, Reuse, Rep Emulation E-Scholarship Services ualization Tools urposing Tools Contributor User Tools
  • 13. SEAD CI Technical Approach Active and Social Curation OAIS Repository Federation Curation Boundary Automated Curation Data Metadata Workflow/Rule Acquisition, A Engine A Management standardized data model nalysis and Operates on DDI3. Scholarly and federation capability Simulation METS, PREMIS, MODS, DC , SensorML, OGC, … Metadata, Content Objects and Trigger Events Communication over OAIS-Standard Ingest scripts: Institutional Repositories Ingest, AIPs Appraisal fixity, integrity, aut Compound Objects - OAI-ORE and CI Technical Approach hentication, transf Selection ormation SEAD Active Data Digital Repository Federation (OAIS compliant) Systems Preservation Actions Dissemination Packages Wide-Area File System Search, Browse Migration , and Access Mechanisms and Annotation, Vis Use, Reuse, Rep Emulation E-Scholarship Services ualization Tools urposing Tools Contributor User Tools
  • 15. SEAD CI Technical Approach Active and Social Curation OAIS Repository Federation Curation Boundary A robust, replicated distributed file system used as a large-scaleCuration Automated backing Workflow/Rule Data Metadata Acquisition, A store Management Engine nalysis and Operates on Simulation DDI3. Metadata, Content Objects Scholarly METS, PREMIS, MODS, DC , SensorML, OGC, … and Trigger Events Communication Ingest scripts: Ingest, AIPs Appraisal fixity, integrity, aut Compound Objects - OAI-ORE and CI Technical Approach hentication, transf VIVO/ Linked Data Active Selection ormation SEAD Trusted Digital Repository Federation Content (OAIS compliant) Preservation Repository Actions Dissemination Packages Wide-Area File System Search, Browse Migration , and Access Mechanisms and Annotation, Vis Use, Reuse, Rep Emulation E-Scholarship Services ualization Tools urposing Tools Contributor User Tools
  • 16. SEAD CI Technical Approach Active and Social Curation OAIS Repository Federation Curation Boundary Automated Curation Data MetadataAn Active Content Repository Workflow/Rule Engine Acquisition, A nalysis and based on standard Management global IDs and Operates on Simulation DDI3. semantic web technologies Scholarly METS, PREMIS, MODS, DC Metadata, Content Objects , SensorML, OGC, … and Trigger Events Communication - to collect and integrate data, metadata, and provenance Ingest scripts: Ingest, AIPs Appraisal fixity, integrity, information from multiple sources. Compound Objects - OAI-ORE and CI Technical Approach authentication, VIVO/ Linked Data Active Selection transformation DC:Creator SEAD Trusted OPM:wasDerivedFrom Digital Repository Federation Content SWAN:isEvidenceFor… (OAIS compliant) Preservation Content Repository Content Actions Content Content Dissemination Packages Wide-Area File System Lustre File System Search, Browse Migration , and Access Mechanisms and Annotation, Vis Use, Reuse, Rep Emulation E-Scholarship Services ualization Tools urposing Tools Contributor User Tools
  • 18. SEAD CI Technical Approach Active and Social Curation OAIS Repository Federation Curation Boundary Automated Curation Data SEAD will run a VIVO instance Metadata Workflow/Rule Acquisition, A Engine nalysis and and may harvest Linked Data from Management Operates on Scholarly Simulation other sources DDI3. Metadata, Content Objects METS, PREMIS, MODS, DC , SensorML, OGC, … and Trigger Events Communication Ingest scripts: Ingest, AIPs Appraisal fixity, integrity, aut Compound Objects - OAI-ORE and CI Technical Approach hentication, transf VIVO/ Linked Data Active Selection ormation Digital Repository Federation Content VIVO Application: Open (OAIS compliant) Source Preservation Repositoryfederatable Researcher Actions Dissemination Packages Information – Wide-Area File System people, papers, projects, center s, fields, etc. Search, Browse Migration , and Access Mechanisms and Annotation, Vis Use, Reuse, Rep Emulation E-Scholarship Services ualization Tools urposing Tools Contributor User Tools
  • 20. SEAD CI Technical Approach Active and Social Curation OAIS Repository Federation Curation Boundary Automated Curation Data Metadata Workflow/Rule Acquisition, Engine Management Analysis and Operates on Simulation DDI3. Metadata, Content Objects Scholarly METS, PREMIS, MODS, DC , SensorML, OGC, … and Trigger Events Communication Ingest scripts: Active and Social Curation AIPs Appraisal fixity, integrity, aut Ingest, Compound Objects - OAI-ORE VIVO/ Services supporting automated and and CI Technical Approach hentication, transf Selection Linked Data Active interactive use use SEAD Trusted ormation and interactive of SEADSEAD Federation of Repository Digital Content - leveraging standard (OAIS compliant) web Preservation Repository Actions application/web service toolkits and Dissemination Packages virtual machine infrastructure and virtual machine infrastructure Wide-Area File System Search, Browse Migration , and Access Mechanisms and Annotation, Vis Use, Reuse, Rep Emulation E-Scholarship Services ualization Tools urposing Tools Contributor User Tools
  • 21. SEAD CI Technical Approach Active and Social Curation OAIS Repository Federation Curation Boundary Automated Curation Data Metadata Workflow/Rule Acquisition, A Engine Management nalysis and Operates on Simulation DDI3. Metadata, Content Objects Scholarly METS, PREMIS, MODS, DC , SensorML, OGC, … and Trigger Events Communication Ingest scripts: Ingest, AIPs Appraisal fixity, integrity, aut Compound Objects - OAI-ORE and CI Technical Approach hentication, transf Selection ormation SEAD Trusted Active Content and Preservation Services Repository Federation Curation Digital also Repository (OAIS compliant) Preservation leveraging standard web application/web Actions service toolkits and virtual machine Dissemination Packages infrastructure Wide-Area File System Search, Browse Migration , and Access Mechanisms and Annotation, Vis Use, Reuse, Rep Emulation E-Scholarship Services ualization Tools urposing Tools Contributor User Tools
  • 22. Active and Social Curation OAIS Repository Federation Curation Boundary Automated Curation Data Metadata Workflow/Rule Acquisition, Engine Management Analysis and Operates on Simulation DDI3. Metadata, Content Objects Scholarly METS, PREMIS, MODS, DC , SensorML, OGC, … and Trigger Events Communication Ingest scripts: Ingest, AIPs Appraisal fixity, integrity, aut Compound Objects - OAI-ORE and hentication, transf VIVO/ Linked Data Active Selection ormation Digital Repository Federation Content (OAIS compliant) Preservation Repository Actions Dissemination Packages Wide-Area File System Search, Browse Migration , and Access Mechanisms and Annotation, Vis Use, Reuse, Rep Emulation E-Scholarship Services ualization Tools urposing Tools Contributor User Tools
  • 41. A dataset or file looks like this
  • 44. Login for data upload
  • 45. Upload file Files from Medici can also be added
  • 46. Create collection (can have multiple files)
  • 48. Data ingested to DSpace (Mississippi example)
  • 49. SEAD Virtual Archive Architecture VIVO IU IR DataCite DSpace server ID Server Store data object, its metadata object, and its relationship record (latter as RDF) in IR as a collection Register Obtain DOI with DOI from register metadata to SOLR and PostgreS VIVO DataCite for rapid retrieval of metadata SIP (Data+ Core Metadata) Solr Property+ Data SEAD Validation Feature SIP AIP Index Domain Ingest Extraction breakdown Metadata (Fixity, Vir + from Data Client us) Data /UI Ack Geospatial PostgreS +Temporal Preservation Metadata Generation (Events) QL Index Metadata
  • 50. Key Questions for SEAD Prototype • What could SEAD capture when? • How can SEAD provide direct value to data producers, users, and curators? • How can web 2.0/3.0 and social computing lower barriers and reduce/realign costs?
  • 51. Towards A Shared Data Future Data User functionalities, data Users capture & transfer, virtual Generators research environments Data Curation Data discovery & navigation, workflow Community Support Services Trust generation, annotation, interpre tability Persistent storage, identification, authentic Common Data Services ity (provenance), workflow execution, data mining Source: EU HLEG Report on Data Deluge: Riding the Wave, pg 31, 2010
  • 52. Data Interoperability • NSF OCI: DataNet and INTEROP now DIBBs • EUDAT • Data Web Forum • IETF Research Data Identifier BOF • Upcoming Oct. US Meeting of DataNet, INTEROP, Data Web Forum
  • 53. Acknowledgements SEAD is funded by the National Science Foundation under cooperative agreement #OCI0940824 • For more on SEAD go to: • http://sead-data.net • Follow us on Twitter @SEADdatanet http://sead-data.net
  • 54. License terms • Please cite as: McDonald, R.H. et. al. Building a Data Discovery Network for Sustainability Science. 3rd International VIVO Conference, Miami, FL, 24 August 2012. Available from: [http://slidesha.re/Q9q8VW] • Thanks to Margaret Hedstrom, who’s guided the team through the (really) lengthy review process and to Jim Myers, Beth Plale, Praveen Kumar, Terry McLaren, Luigi Marini, Kavitha Chandrasekar and others who provided content for this presentation. • The concepts and software being leveraged in SEAD represent the work of a broad range of people over multiple years – their contributions have been critical to launching SEAD. • Items indicated with a © are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse. • This document is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.

Editor's Notes

  1. A Collection of heterogeneous files. Users can tag and add comments to the entire ‘collection’ and individually tag and comment on the objects in the collection. Note: Extraction services and previewers are all driven by the file MIME type. Extraction services are customizable and are designed to automate derived data products from the file being uploaded. Examples follow…
  2. A Collection of heterogeneous files. Users can tag and add comments to the entire ‘collection’ and individually tag and comment on the objects in the collection. Note: Extraction services and previewers are all driven by the file MIME type. Extraction services are customizable and are designed to automate derived data products from the file being uploaded. Examples follow…
  3. Lidar data saved as .png.The Image extraction service does the following:Creates the thumbnail and preview imageCreates an image pyramid of the image (zoom/pan large images w/o downloading entire image via the SeaDragon webapp )Extract all header information from image file to include: Exif, GPS, Interoperability, etc… Extracted data is view by clicking on the “Extracted Information” section.
  4. A data set saved as a simple ASCII text file.- Users can preview the first 80 lines of the text file.
  5. Preview the contents of .csv files
  6. Simple map image User defined informationImage is part of multiple collectionsImage is tagged
  7. 3 Images (3 clicks)Standard Medici InfoScroll down to show location and annotationThis image file also contained geo location data which become visible in “Location”. Geo-location can be extracted from the image Exif data or authors can add a geo-location to any file in the repository.Note the creator tag and vivo reference.
  8. Tif support - relatively large 71MB fileClicks…Click Zoom to enable SeaDragon to explore the details of the file via zoom and pan with mouse.Click the lower right icon to enable full screen. Use + or – key to zoom (or wheel on mouse), click image and drag to panClick lower right icon to return to embedded window in Medici
  9. Image file that contains GPS data which is extracted by Medici as part of the upload process.
  10. Mpeg file uploads:Extraction service creates a flash version of the file for preview.
  11. PDF files Extraction service generates an image per page of the file. In this case a slide set from a presentation. Click ‘Pages’ to enable the slide set mode and click on the left or right arrows to navigate the pages. 2 images – click to advance slide.
  12. 3D object supportPreviewer provides multiple view options of the object which are accessible from the links above the preview.
  13. .shp files The components of shape file get uploaded to Medici as a zip Medici saves the zip blob and the extraction service registers the contents of the shp file with GeoServerOpenStreetMap displays the contents of the zipLayers are on by default but can be turned by clicking the ‘show’ button.Opacity of layers can be varied using the opacity scale.(WIP) We plan to embed OpenStreetMap in Medici as a previewer for .shp and .kml
  14. All layers off except Illinois Flood Zone map. Map zoomed into the Champaign region of interest.