SlideShare a Scribd company logo
1 of 1
Download to read offline
Jude Yew
                                                                                                                                                                                                                                                                   Ann Zimmerman

                                                                     SEAD: A system to support active and social data curation                                                                                                                                   Magaret Hedstrom



                               SE AD
                                                                                                                                                                                                                                                                    Praveen Kumar
                                                                                                                                                                                                                                                                  Robert McDonald


                                                                     in sustainability science
                                                                                                                                                                                                                                                                      James Myers
                                                                                                                                                                                                                                                                         Beth Plale
Sustainable Environment — Actionable Data
                                                                                                                                                                                                                                                                        University of Michigan
                                                                                                                                                                                                                                                 University of Illinois at Urbana-Champaign
                                                                                                                                                                                                                                                                            Indiana University
                                                                                                                                                                                                                                                            Rensselaer Polytechnic Institute




                  1. Problem/Domain                                                                                                            4. SEAD Strategy
                                                                                                                                                                                                                           Networked data
                                                                               Active curation                                            Social curation




                                                                                                               +                                                                                            -
                                                                                                                                                                                                            -
                                                                                                                                                                                                            -
                                                                                                                                                                                                            -
                                                                                                                                                                                                                Tag and annotate data
                                                                                                                                                                                                                Overlay it with reference data
                                                                                                                                                                                                                Organize it in domain terminology
                                                                                                                                                                                                                Link it to people, papers, projects, conversations



                                                                                                                                                                                                                    Long-term archive solution
                                                                                                                             - Leverage social media for discovery of data,
                                                                         - Move data curation upstream in the data             interest & expertise
                                                                           life-cycle                                        - Support annotation of data by users of data
- Sustainability science is a data-intensive area that focuses on        - Record metadata at ingest                         - Record conversations and comments surrounding
  the complex interactions between nature and human activities.                                                                the data
- Sustainability research requires access to data from the physical                                                          - Make connections between data & researchers                               - Take advantage of existing infrastructures (Institutional
  and social sciences.                                                                                                         through social networking                                                   Repositories, ICPSR) for long-term preservation
- But data are di cult to nd, obtain and use because di erent
  disciplines collect, describe and store their data in di erent ways.

                                                                                                                                                      5. SEAD Use Cases
2. Data Challenges in Sustainability Science                              i) Able to ingest a variety of data types                                 ii) Support data discovery                                             iii) Add value to existing data


       The long tail of scienti c research data:
                  - Small and derived data sets
                  - Heterogeneous data
                  - Multiple sources of data
                                                                                                                                           +                 =
                  - Short-lived data with long-term value
                  - Value of data grows when combined & integrated          - Users can store, manage and share
                                                                              heterogeneous data types (e.g. images,                                                                                              - Users of data can provide additional metadata and
                                                                                                                                         - Provide links between data, people & publications
                                                                              geo-spatial images, sensor data etc.)                                                                                                 annotations

                                                                                                              iv) Create new data                                                           v) Community curation of data

                     3. SEAD Goals
SEAD will address the needs of sustainability researchers to search
for, aggregate, and maintain valuable data for the long term.

To do this, the project seeks to build a prototype that:
                                                                                                                                                                                - The community identi es and curates data of value
                                                                                                                                                                                - These valued data will be moved to existing institutional repositories
- Applies existing tools and services to sustainability research                                  - Combine data from multiple sources and contribute derived                     for long-term storage
- Integrates these services into a generalizable “Active and Social                                 data back to SEAD
  Curation” infrastructure
- Enables researchers to collaborate and share data during active         - The SEAD team will work closely with the community of sustainability scientists to evolve these use cases.
  projects                                                                - In the rst two years of the project, SEAD will collaborate with scientists studying the Upper Great Lakes and Upper Mississippi River Basin.
- Packages and migrates data valued by the users to a                     - Through this collaboration, SEAD will prototype a system that helps researchers manage their data and motivates them to share data and information about their data with others.
  federated repository for long-term preservation
                                                                                                                                                                        SEAD is funded by the National Science Foundation under cooperative agreement #OCI0940824.

More Related Content

More from SEAD

An Overview of Plans for SEAD
An Overview of Plans for SEADAn Overview of Plans for SEAD
An Overview of Plans for SEAD
SEAD
 
NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14
SEAD
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
SEAD
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
SEAD
 
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social CurationSEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD
 

More from SEAD (11)

An Overview of Plans for SEAD
An Overview of Plans for SEADAn Overview of Plans for SEAD
An Overview of Plans for SEAD
 
Presentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesPresentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research Series
 
NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
SEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability ScienceSEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability Science
 
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social CurationSEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
 
Data 2012 -- Presentation by Margaret Hedstrom (Jan 2012
Data 2012 -- Presentation by Margaret Hedstrom (Jan 2012Data 2012 -- Presentation by Margaret Hedstrom (Jan 2012
Data 2012 -- Presentation by Margaret Hedstrom (Jan 2012
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
 
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
 
SEAD slide set (October 2011)
SEAD slide set (October 2011)SEAD slide set (October 2011)
SEAD slide set (October 2011)
 

Recently uploaded

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Recently uploaded (20)

PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 

SEAD: A system to support social and active data curation

  • 1. Jude Yew Ann Zimmerman SEAD: A system to support active and social data curation Magaret Hedstrom SE AD Praveen Kumar Robert McDonald in sustainability science James Myers Beth Plale Sustainable Environment — Actionable Data University of Michigan University of Illinois at Urbana-Champaign Indiana University Rensselaer Polytechnic Institute 1. Problem/Domain 4. SEAD Strategy Networked data Active curation Social curation + - - - - Tag and annotate data Overlay it with reference data Organize it in domain terminology Link it to people, papers, projects, conversations Long-term archive solution - Leverage social media for discovery of data, - Move data curation upstream in the data interest & expertise life-cycle - Support annotation of data by users of data - Sustainability science is a data-intensive area that focuses on - Record metadata at ingest - Record conversations and comments surrounding the complex interactions between nature and human activities. the data - Sustainability research requires access to data from the physical - Make connections between data & researchers - Take advantage of existing infrastructures (Institutional and social sciences. through social networking Repositories, ICPSR) for long-term preservation - But data are di cult to nd, obtain and use because di erent disciplines collect, describe and store their data in di erent ways. 5. SEAD Use Cases 2. Data Challenges in Sustainability Science i) Able to ingest a variety of data types ii) Support data discovery iii) Add value to existing data The long tail of scienti c research data: - Small and derived data sets - Heterogeneous data - Multiple sources of data + = - Short-lived data with long-term value - Value of data grows when combined & integrated - Users can store, manage and share heterogeneous data types (e.g. images, - Users of data can provide additional metadata and - Provide links between data, people & publications geo-spatial images, sensor data etc.) annotations iv) Create new data v) Community curation of data 3. SEAD Goals SEAD will address the needs of sustainability researchers to search for, aggregate, and maintain valuable data for the long term. To do this, the project seeks to build a prototype that: - The community identi es and curates data of value - These valued data will be moved to existing institutional repositories - Applies existing tools and services to sustainability research - Combine data from multiple sources and contribute derived for long-term storage - Integrates these services into a generalizable “Active and Social data back to SEAD Curation” infrastructure - Enables researchers to collaborate and share data during active - The SEAD team will work closely with the community of sustainability scientists to evolve these use cases. projects - In the rst two years of the project, SEAD will collaborate with scientists studying the Upper Great Lakes and Upper Mississippi River Basin. - Packages and migrates data valued by the users to a - Through this collaboration, SEAD will prototype a system that helps researchers manage their data and motivates them to share data and information about their data with others. federated repository for long-term preservation SEAD is funded by the National Science Foundation under cooperative agreement #OCI0940824.