SlideShare a Scribd company logo
1 of 18
Data Sets, Ensemble Cloud
Computing, and the University
Library:
Getting the Most Out of Research Support
Jim Myers1, Margaret Hedstrom1, Beth A Plale2, Praveen Kumar3, Robert
McDonald4, Rob Kooper5, Luigi Marini5, Inna Kouper4, Kavitha Chandrasekar4

myersjd@umich.edu
1 School on

Information, University of Michigan, Ann Arbor, MI, United States.
School of Informatics and Computing, Indiana University, Bloomington, IN, United States.
3 Civil and Environmental Engineering, University of Illinois, Urbana-Champaign, IL, United States.
4 Data To Insight Center, Indiana University, Bloomington, IN, United States.
5 National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign, IL, United States.
2
Overview
• Technological advances are making it ever easier to
move computation, data, and metadata around
• With decreasing costs and increasing recognition of the
value of data re-use, many organization are exploring
their role in data curation/preservation
• If we look at the nature of the problem
– How should data be curated to scalably support research?
• Lifecycle approaches to manage value-defined research objects

– Can we do it?
• SEAD as an end-to-end demonstration…

– What organization(s) are best positioned/the most capable
of leading/providing such services long-term?
• Primary research organizations have a combination of capability,
motivation, and long-term commitment.
Technology – the world is flat
• Today’s researchers can employ computing
and data resources from anywhere, using
scalable search technologies …

Enough said.
Data as a key resource, Big Data
• Data is increasingly recognized as valuable
beyond its initial use:
–
–
–
–

Data reproducibility
Re-analysis
Reference Data
Data mining/machine learning/…

– NSF Data plan requirement
– Paper publication with data requirements
– Community and institutional collections growing
Data Publication today
• Data cited in papers (to limited depth)
• Project file archives (large, limited description,
gray/dark)
• Reference/analytical data (standardized content,
limited breadth)
• Historical collections (temporal breadth, limited
numbers)

- do any of these solve the problem?
Researchers think, and work, like this:
• Multi
– Disciplinary
– Format
– Model
– Semantics
– Location
and this
–
–
–

–

–

–

Raw and derived data
~5 levels of quality,
processing, maturity
Observations,
calibrations,
experiments, models,
statistical ensembles, …
Also organized by
location, time, variables,
technique, creator,
project, provenance, …
Large amount of
reference information
from external sources
(e.g. NASA)
Evidence for ‘nonorthogonal’ subcollections
What’s Really Needed?
Scalable Research Productivity Requires:
• A way to
– store what you want
– Reference what you want
– Organize how you want (search, filter, tag, collect)

• At the scale, and level of detail/richness, you want
• When you figure that out
• In a way that is self-describing/high-fidelity across
applications and owners
• In the vocabularies and formats you find efficient
• Beyond the lifetime of individual/project interest
• For active use and external credit
• With minimal training/IT support required.
How can we approach magic?
• Global identifiers – data, terms, metadata
• Content management abstractions (blob + type +
metadata)
• Service architectures and automated processing
(conversion, preview, extraction, derivation, cataloging,
…)
• Applications that share these abstractions – write what
you know, display/ignore what you don’t
• Research Object management (structured, interrelated collections)
Web 2.0, Web3.0, + explicit context management …
SEAD: Sustainable Environment Actionable Data
• An NSF DataNet project started in
October, 2011
• An international resource for
sustainability science
• A provider of light-weight Data Services
based on novel technical and business
approaches:
– Supporting the long-tail of research
– Enabling active and social curation
– Providing integrated lifecycle support for data
http://sead-data.net/

Margaret Hedstrom, PI
Praveen Kumar, co-PI
Jim Myers, co-PI
Beth Plale, co-PI
SEAD is:
• Data discovery
• Project workspaces
• A data-aware
community network
• Curation and
preservation services
that link to multiple archives and discovery
services
SEAD is:
• An active repository that creates data pages with
–
–
–
–
–
–
–
–

Previews
Extracted Metadata
Overlays
Tags
Comments
Provenance
Use information
Download/Embed
SEAD is:
• A tool for community exploration:
– Personal and
Project Profiles
– Publications and
Data Citations
– Co-author,
co-investigator
graphs
– Temporal analysis
SEAD is:
• Curation and Preservation Services:
– Research Object
management
– ID assignment
– Matchmaking to
long-term repositories
Citation Generation
– Catalog Registration SEAD’s Virtual Archive allows curators to
access, assess, enhance, package, and submit
data from SEAD project repositories for long– Discovery services
term storage in SEAD-managed storage or
external institutional repositories and cloud
data services.
–
–
–
–

Apps read what they need and write what they know
Curation snapshots meaningful Research Objects
Multiple ROs can be defined/managed re-using the same underlying ‘living’ content
The larger graph can be ~reassembled w/o the ongoing cost of managing at the item level

Flickr-style web management of data

Sensor data

Semantic Content Middleware
over Scalable File System and
Triple Store

Geospatial, social
network mash-ups,
workflows and services
Curation Services to harvest
and package specific data sets

Federation of OAI
repositories for
long-term
preservation
Key Points
• Research Objects have meaning/value but data comes in
smaller chunks
• Research Objects are not orthogonal, but individual data
sets/files are
• Lifecycle approaches for datasets are becoming possible
• Managing intermixed ROs is the problem that needs to be
tackled to meet the research community’s needs

• Research Data Alliance (RDA) can help drive
standardization/scaling
What will drive research data preservation?
• The most valuable data service(s) are
active/actionable research service(s)…
– The ability to define Research Objects is more
important than any given RO

• Led by research organizations as part of their
long-term mission?
– The only organizations with the focus, scope, and
scale to solve the whole problem (end-to-end
research productivity)
Acknowledgements
• SEAD Team @ UM, UI, IU
• NSF
• NCED, IRBO, WSC-Reach, IMLCZO, ICPSR, other
sustainability researchers
• and Thank You!
… stop by the SEAD booth and share your thoughts!

http://sead-data.net/

More Related Content

What's hot

Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...
Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...
Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...SEAD
 
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)SEAD
 
SEAD slide set (October 2011)
SEAD slide set (October 2011)SEAD slide set (October 2011)
SEAD slide set (October 2011)SEAD
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel ASIS&T
 
RDAP 15 Local ICPSR Data Curation Workshop Pilot Project
RDAP 15 Local ICPSR Data Curation Workshop Pilot ProjectRDAP 15 Local ICPSR Data Curation Workshop Pilot Project
RDAP 15 Local ICPSR Data Curation Workshop Pilot ProjectASIS&T
 
Ignite@AGU14
Ignite@AGU14Ignite@AGU14
Ignite@AGU14SEAD
 
Best practices data collection
Best practices data collectionBest practices data collection
Best practices data collectionSherry Lake
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ LibraryARDC
 
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...ASIS&T
 
Natasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptxNatasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptxARDC
 
RDAP 15 Navigating the Rocky Road to Research Data Acceptance
RDAP 15 Navigating the Rocky Road to Research Data AcceptanceRDAP 15 Navigating the Rocky Road to Research Data Acceptance
RDAP 15 Navigating the Rocky Road to Research Data AcceptanceASIS&T
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-supportSherry Lake
 
Sue cook c3 dis dm-ps 1.pptx
Sue cook c3 dis dm-ps 1.pptxSue cook c3 dis dm-ps 1.pptx
Sue cook c3 dis dm-ps 1.pptxARDC
 
John morrissey c3 dis fair working data.pptx
John morrissey c3 dis fair working data.pptxJohn morrissey c3 dis fair working data.pptx
John morrissey c3 dis fair working data.pptxARDC
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset useHeather Piwowar
 

What's hot (20)

Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...
Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...
Using SEAD to Support Collaboration among Land Managers, Scientists, and the ...
 
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
 
SEAD slide set (October 2011)
SEAD slide set (October 2011)SEAD slide set (October 2011)
SEAD slide set (October 2011)
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
RDAP 15 Local ICPSR Data Curation Workshop Pilot Project
RDAP 15 Local ICPSR Data Curation Workshop Pilot ProjectRDAP 15 Local ICPSR Data Curation Workshop Pilot Project
RDAP 15 Local ICPSR Data Curation Workshop Pilot Project
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Ignite@AGU14
Ignite@AGU14Ignite@AGU14
Ignite@AGU14
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Best practices data collection
Best practices data collectionBest practices data collection
Best practices data collection
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ Library
 
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
 
Natasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptxNatasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptx
 
Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"Strasser "Effective data management and its role in open research"
Strasser "Effective data management and its role in open research"
 
RDAP 15 Navigating the Rocky Road to Research Data Acceptance
RDAP 15 Navigating the Rocky Road to Research Data AcceptanceRDAP 15 Navigating the Rocky Road to Research Data Acceptance
RDAP 15 Navigating the Rocky Road to Research Data Acceptance
 
Re tooling for data management-support
Re tooling for data management-supportRe tooling for data management-support
Re tooling for data management-support
 
Sue cook c3 dis dm-ps 1.pptx
Sue cook c3 dis dm-ps 1.pptxSue cook c3 dis dm-ps 1.pptx
Sue cook c3 dis dm-ps 1.pptx
 
John morrissey c3 dis fair working data.pptx
John morrissey c3 dis fair working data.pptxJohn morrissey c3 dis fair working data.pptx
John morrissey c3 dis fair working data.pptx
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset use
 

Viewers also liked

Mobile Cloud Computing
Mobile Cloud ComputingMobile Cloud Computing
Mobile Cloud ComputingSimeon Oriko
 
Penetrating the Cloud: Opportunities & Challenges for Businesses
Penetrating the Cloud: Opportunities & Challenges for BusinessesPenetrating the Cloud: Opportunities & Challenges for Businesses
Penetrating the Cloud: Opportunities & Challenges for BusinessesCompTIA
 
Avoiding Cloud Outage
Avoiding Cloud OutageAvoiding Cloud Outage
Avoiding Cloud OutageNati Shalom
 
The Inevitable Cloud Outage
The Inevitable Cloud OutageThe Inevitable Cloud Outage
The Inevitable Cloud OutageNewvewm
 
Summer School Scale Cloud Across the Enterprise
Summer School   Scale Cloud Across the EnterpriseSummer School   Scale Cloud Across the Enterprise
Summer School Scale Cloud Across the EnterpriseWSO2
 
Can we hack open source #cloud platforms to help reduce emissions?
Can we hack open source #cloud platforms to help reduce emissions?Can we hack open source #cloud platforms to help reduce emissions?
Can we hack open source #cloud platforms to help reduce emissions?Tom Raftery
 
Linthicum what is-the-true-future-of-cloud-computing
Linthicum what is-the-true-future-of-cloud-computingLinthicum what is-the-true-future-of-cloud-computing
Linthicum what is-the-true-future-of-cloud-computingDavid Linthicum
 
2013 State of Cloud Survey SMB Results
2013 State of Cloud Survey SMB Results2013 State of Cloud Survey SMB Results
2013 State of Cloud Survey SMB ResultsSymantec
 
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...Amazon Web Services
 
Delivering IaaS with Open Source Software
Delivering IaaS with Open Source SoftwareDelivering IaaS with Open Source Software
Delivering IaaS with Open Source SoftwareMark Hinkle
 
LinuxFest NW 2013: Hitchhiker's Guide to Open Source Cloud Computing
LinuxFest NW 2013: Hitchhiker's Guide to Open Source Cloud ComputingLinuxFest NW 2013: Hitchhiker's Guide to Open Source Cloud Computing
LinuxFest NW 2013: Hitchhiker's Guide to Open Source Cloud ComputingMark Hinkle
 
2015 Future of Cloud Computing Study
2015 Future of Cloud Computing Study2015 Future of Cloud Computing Study
2015 Future of Cloud Computing StudyNorth Bridge
 
Simplifying The Cloud Top 10 Questions By SMBs
Simplifying The Cloud Top 10 Questions By SMBsSimplifying The Cloud Top 10 Questions By SMBs
Simplifying The Cloud Top 10 Questions By SMBsSun Digital, Inc.
 
Intro to cloud computing — MegaCOMM 2013, Jerusalem
Intro to cloud computing — MegaCOMM 2013, JerusalemIntro to cloud computing — MegaCOMM 2013, Jerusalem
Intro to cloud computing — MegaCOMM 2013, JerusalemReuven Lerner
 
Breaking through the Clouds
Breaking through the CloudsBreaking through the Clouds
Breaking through the CloudsAndy Piper
 
Best Practices for Architecting in the Cloud - Jeff Barr
Best Practices for Architecting in the Cloud - Jeff BarrBest Practices for Architecting in the Cloud - Jeff Barr
Best Practices for Architecting in the Cloud - Jeff BarrAmazon Web Services
 
2013 Future of Cloud Computing - 3rd Annual Survey Results
2013 Future of Cloud Computing - 3rd Annual Survey Results2013 Future of Cloud Computing - 3rd Annual Survey Results
2013 Future of Cloud Computing - 3rd Annual Survey ResultsMichael Skok
 
Cloud computing simple ppt
Cloud computing simple pptCloud computing simple ppt
Cloud computing simple pptAgarwaljay
 

Viewers also liked (18)

Mobile Cloud Computing
Mobile Cloud ComputingMobile Cloud Computing
Mobile Cloud Computing
 
Penetrating the Cloud: Opportunities & Challenges for Businesses
Penetrating the Cloud: Opportunities & Challenges for BusinessesPenetrating the Cloud: Opportunities & Challenges for Businesses
Penetrating the Cloud: Opportunities & Challenges for Businesses
 
Avoiding Cloud Outage
Avoiding Cloud OutageAvoiding Cloud Outage
Avoiding Cloud Outage
 
The Inevitable Cloud Outage
The Inevitable Cloud OutageThe Inevitable Cloud Outage
The Inevitable Cloud Outage
 
Summer School Scale Cloud Across the Enterprise
Summer School   Scale Cloud Across the EnterpriseSummer School   Scale Cloud Across the Enterprise
Summer School Scale Cloud Across the Enterprise
 
Can we hack open source #cloud platforms to help reduce emissions?
Can we hack open source #cloud platforms to help reduce emissions?Can we hack open source #cloud platforms to help reduce emissions?
Can we hack open source #cloud platforms to help reduce emissions?
 
Linthicum what is-the-true-future-of-cloud-computing
Linthicum what is-the-true-future-of-cloud-computingLinthicum what is-the-true-future-of-cloud-computing
Linthicum what is-the-true-future-of-cloud-computing
 
2013 State of Cloud Survey SMB Results
2013 State of Cloud Survey SMB Results2013 State of Cloud Survey SMB Results
2013 State of Cloud Survey SMB Results
 
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...
 
Delivering IaaS with Open Source Software
Delivering IaaS with Open Source SoftwareDelivering IaaS with Open Source Software
Delivering IaaS with Open Source Software
 
LinuxFest NW 2013: Hitchhiker's Guide to Open Source Cloud Computing
LinuxFest NW 2013: Hitchhiker's Guide to Open Source Cloud ComputingLinuxFest NW 2013: Hitchhiker's Guide to Open Source Cloud Computing
LinuxFest NW 2013: Hitchhiker's Guide to Open Source Cloud Computing
 
2015 Future of Cloud Computing Study
2015 Future of Cloud Computing Study2015 Future of Cloud Computing Study
2015 Future of Cloud Computing Study
 
Simplifying The Cloud Top 10 Questions By SMBs
Simplifying The Cloud Top 10 Questions By SMBsSimplifying The Cloud Top 10 Questions By SMBs
Simplifying The Cloud Top 10 Questions By SMBs
 
Intro to cloud computing — MegaCOMM 2013, Jerusalem
Intro to cloud computing — MegaCOMM 2013, JerusalemIntro to cloud computing — MegaCOMM 2013, Jerusalem
Intro to cloud computing — MegaCOMM 2013, Jerusalem
 
Breaking through the Clouds
Breaking through the CloudsBreaking through the Clouds
Breaking through the Clouds
 
Best Practices for Architecting in the Cloud - Jeff Barr
Best Practices for Architecting in the Cloud - Jeff BarrBest Practices for Architecting in the Cloud - Jeff Barr
Best Practices for Architecting in the Cloud - Jeff Barr
 
2013 Future of Cloud Computing - 3rd Annual Survey Results
2013 Future of Cloud Computing - 3rd Annual Survey Results2013 Future of Cloud Computing - 3rd Annual Survey Results
2013 Future of Cloud Computing - 3rd Annual Survey Results
 
Cloud computing simple ppt
Cloud computing simple pptCloud computing simple ppt
Cloud computing simple ppt
 

Similar to Data Sets, Ensemble Cloud Computing, and the University Library: Getting the Most Out of Research Support

Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management PlanKristin Briney
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxelisarosa29
 
eCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design ChallengeeCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design Challengehopbeat
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data LocallyErin D. Foster
 
FSCI Data Discovery
FSCI Data DiscoveryFSCI Data Discovery
FSCI Data DiscoveryARDC
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesCelia Emmelhainz
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅kulibrarians
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in EducationPhilip Piety
 
Data management woolfrey
Data management woolfreyData management woolfrey
Data management woolfreypvhead123
 
Research Lifecycles and RDM
Research Lifecycles and RDMResearch Lifecycles and RDM
Research Lifecycles and RDMMarieke Guy
 

Similar to Data Sets, Ensemble Cloud Computing, and the University Library: Getting the Most Out of Research Support (20)

Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management Plan
 
ROER4D Open Data Initiative
ROER4D Open Data InitiativeROER4D Open Data Initiative
ROER4D Open Data Initiative
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
eCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design ChallengeeCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design Challenge
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis Critical infrastructure to promote data synthesis
Critical infrastructure to promote data synthesis
 
FSCI Data Discovery
FSCI Data DiscoveryFSCI Data Discovery
FSCI Data Discovery
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social Sciences
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
 
Rdm slides march 2014
Rdm slides march 2014Rdm slides march 2014
Rdm slides march 2014
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
 
Data management woolfrey
Data management woolfreyData management woolfrey
Data management woolfrey
 
Managing your research data
Managing your research dataManaging your research data
Managing your research data
 
Research Data Management and your PhD
Research Data Management and your PhDResearch Data Management and your PhD
Research Data Management and your PhD
 
Demography pro sem
Demography pro semDemography pro sem
Demography pro sem
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
What is-rdm
What is-rdmWhat is-rdm
What is-rdm
 
Research Lifecycles and RDM
Research Lifecycles and RDMResearch Lifecycles and RDM
Research Lifecycles and RDM
 

More from SEAD

Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...
Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...
Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...SEAD
 
Preservation, Publishing, and People: A SEAD View
Preservation, Publishing, and People: A SEAD ViewPreservation, Publishing, and People: A SEAD View
Preservation, Publishing, and People: A SEAD ViewSEAD
 
An Overview of Plans for SEAD
An Overview of Plans for SEADAn Overview of Plans for SEAD
An Overview of Plans for SEADSEAD
 
SEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability ScienceSEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability ScienceSEAD
 
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social CurationSEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social CurationSEAD
 
SEAD: A system to support social and active data curation
SEAD: A system to support social and active data curationSEAD: A system to support social and active data curation
SEAD: A system to support social and active data curationSEAD
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...SEAD
 

More from SEAD (7)

Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...
Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...
Poster: Using SEAD to Support Collaboration among Land Managers, Scientists, ...
 
Preservation, Publishing, and People: A SEAD View
Preservation, Publishing, and People: A SEAD ViewPreservation, Publishing, and People: A SEAD View
Preservation, Publishing, and People: A SEAD View
 
An Overview of Plans for SEAD
An Overview of Plans for SEADAn Overview of Plans for SEAD
An Overview of Plans for SEAD
 
SEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability ScienceSEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability Science
 
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social CurationSEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
 
SEAD: A system to support social and active data curation
SEAD: A system to support social and active data curationSEAD: A system to support social and active data curation
SEAD: A system to support social and active data curation
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Data Sets, Ensemble Cloud Computing, and the University Library: Getting the Most Out of Research Support

  • 1. Data Sets, Ensemble Cloud Computing, and the University Library: Getting the Most Out of Research Support Jim Myers1, Margaret Hedstrom1, Beth A Plale2, Praveen Kumar3, Robert McDonald4, Rob Kooper5, Luigi Marini5, Inna Kouper4, Kavitha Chandrasekar4 myersjd@umich.edu 1 School on Information, University of Michigan, Ann Arbor, MI, United States. School of Informatics and Computing, Indiana University, Bloomington, IN, United States. 3 Civil and Environmental Engineering, University of Illinois, Urbana-Champaign, IL, United States. 4 Data To Insight Center, Indiana University, Bloomington, IN, United States. 5 National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign, IL, United States. 2
  • 2. Overview • Technological advances are making it ever easier to move computation, data, and metadata around • With decreasing costs and increasing recognition of the value of data re-use, many organization are exploring their role in data curation/preservation • If we look at the nature of the problem – How should data be curated to scalably support research? • Lifecycle approaches to manage value-defined research objects – Can we do it? • SEAD as an end-to-end demonstration… – What organization(s) are best positioned/the most capable of leading/providing such services long-term? • Primary research organizations have a combination of capability, motivation, and long-term commitment.
  • 3. Technology – the world is flat • Today’s researchers can employ computing and data resources from anywhere, using scalable search technologies … Enough said.
  • 4. Data as a key resource, Big Data • Data is increasingly recognized as valuable beyond its initial use: – – – – Data reproducibility Re-analysis Reference Data Data mining/machine learning/… – NSF Data plan requirement – Paper publication with data requirements – Community and institutional collections growing
  • 5. Data Publication today • Data cited in papers (to limited depth) • Project file archives (large, limited description, gray/dark) • Reference/analytical data (standardized content, limited breadth) • Historical collections (temporal breadth, limited numbers) - do any of these solve the problem?
  • 6. Researchers think, and work, like this: • Multi – Disciplinary – Format – Model – Semantics – Location
  • 7. and this – – – – – – Raw and derived data ~5 levels of quality, processing, maturity Observations, calibrations, experiments, models, statistical ensembles, … Also organized by location, time, variables, technique, creator, project, provenance, … Large amount of reference information from external sources (e.g. NASA) Evidence for ‘nonorthogonal’ subcollections
  • 8. What’s Really Needed? Scalable Research Productivity Requires: • A way to – store what you want – Reference what you want – Organize how you want (search, filter, tag, collect) • At the scale, and level of detail/richness, you want • When you figure that out • In a way that is self-describing/high-fidelity across applications and owners • In the vocabularies and formats you find efficient • Beyond the lifetime of individual/project interest • For active use and external credit • With minimal training/IT support required.
  • 9. How can we approach magic? • Global identifiers – data, terms, metadata • Content management abstractions (blob + type + metadata) • Service architectures and automated processing (conversion, preview, extraction, derivation, cataloging, …) • Applications that share these abstractions – write what you know, display/ignore what you don’t • Research Object management (structured, interrelated collections) Web 2.0, Web3.0, + explicit context management …
  • 10. SEAD: Sustainable Environment Actionable Data • An NSF DataNet project started in October, 2011 • An international resource for sustainability science • A provider of light-weight Data Services based on novel technical and business approaches: – Supporting the long-tail of research – Enabling active and social curation – Providing integrated lifecycle support for data http://sead-data.net/ Margaret Hedstrom, PI Praveen Kumar, co-PI Jim Myers, co-PI Beth Plale, co-PI
  • 11. SEAD is: • Data discovery • Project workspaces • A data-aware community network • Curation and preservation services that link to multiple archives and discovery services
  • 12. SEAD is: • An active repository that creates data pages with – – – – – – – – Previews Extracted Metadata Overlays Tags Comments Provenance Use information Download/Embed
  • 13. SEAD is: • A tool for community exploration: – Personal and Project Profiles – Publications and Data Citations – Co-author, co-investigator graphs – Temporal analysis
  • 14. SEAD is: • Curation and Preservation Services: – Research Object management – ID assignment – Matchmaking to long-term repositories Citation Generation – Catalog Registration SEAD’s Virtual Archive allows curators to access, assess, enhance, package, and submit data from SEAD project repositories for long– Discovery services term storage in SEAD-managed storage or external institutional repositories and cloud data services.
  • 15. – – – – Apps read what they need and write what they know Curation snapshots meaningful Research Objects Multiple ROs can be defined/managed re-using the same underlying ‘living’ content The larger graph can be ~reassembled w/o the ongoing cost of managing at the item level Flickr-style web management of data Sensor data Semantic Content Middleware over Scalable File System and Triple Store Geospatial, social network mash-ups, workflows and services Curation Services to harvest and package specific data sets Federation of OAI repositories for long-term preservation
  • 16. Key Points • Research Objects have meaning/value but data comes in smaller chunks • Research Objects are not orthogonal, but individual data sets/files are • Lifecycle approaches for datasets are becoming possible • Managing intermixed ROs is the problem that needs to be tackled to meet the research community’s needs • Research Data Alliance (RDA) can help drive standardization/scaling
  • 17. What will drive research data preservation? • The most valuable data service(s) are active/actionable research service(s)… – The ability to define Research Objects is more important than any given RO • Led by research organizations as part of their long-term mission? – The only organizations with the focus, scope, and scale to solve the whole problem (end-to-end research productivity)
  • 18. Acknowledgements • SEAD Team @ UM, UI, IU • NSF • NCED, IRBO, WSC-Reach, IMLCZO, ICPSR, other sustainability researchers • and Thank You! … stop by the SEAD booth and share your thoughts! http://sead-data.net/