SlideShare a Scribd company logo
1 of 1
Download to read offline
Jude Yew
                                                                                                                                                                                                                                                                   Ann Zimmerman

                                                                     SEAD: A system to support active and social data curation                                                                                                                                   Magaret Hedstrom



                               SE AD
                                                                                                                                                                                                                                                                    Praveen Kumar
                                                                                                                                                                                                                                                                  Robert McDonald


                                                                     in sustainability science
                                                                                                                                                                                                                                                                      James Myers
                                                                                                                                                                                                                                                                         Beth Plale
Sustainable Environment — Actionable Data
                                                                                                                                                                                                                                                                        University of Michigan
                                                                                                                                                                                                                                                 University of Illinois at Urbana-Champaign
                                                                                                                                                                                                                                                                            Indiana University
                                                                                                                                                                                                                                                            Rensselaer Polytechnic Institute




                  1. Problem/Domain                                                                                                            4. SEAD Strategy
                                                                                                                                                                                                                           Networked data
                                                                               Active curation                                            Social curation




                                                                                                               +                                                                                            -
                                                                                                                                                                                                            -
                                                                                                                                                                                                            -
                                                                                                                                                                                                            -
                                                                                                                                                                                                                Tag and annotate data
                                                                                                                                                                                                                Overlay it with reference data
                                                                                                                                                                                                                Organize it in domain terminology
                                                                                                                                                                                                                Link it to people, papers, projects, conversations



                                                                                                                                                                                                                    Long-term archive solution
                                                                                                                             - Leverage social media for discovery of data,
                                                                         - Move data curation upstream in the data             interest & expertise
                                                                           life-cycle                                        - Support annotation of data by users of data
- Sustainability science is a data-intensive area that focuses on        - Record metadata at ingest                         - Record conversations and comments surrounding
  the complex interactions between nature and human activities.                                                                the data
- Sustainability research requires access to data from the physical                                                          - Make connections between data & researchers                               - Take advantage of existing infrastructures (Institutional
  and social sciences.                                                                                                         through social networking                                                   Repositories, ICPSR) for long-term preservation
- But data are di cult to nd, obtain and use because di erent
  disciplines collect, describe and store their data in di erent ways.

                                                                                                                                                      5. SEAD Use Cases
2. Data Challenges in Sustainability Science                              i) Able to ingest a variety of data types                                 ii) Support data discovery                                             iii) Add value to existing data


       The long tail of scienti c research data:
                  - Small and derived data sets
                  - Heterogeneous data
                  - Multiple sources of data
                                                                                                                                           +                 =
                  - Short-lived data with long-term value
                  - Value of data grows when combined & integrated          - Users can store, manage and share
                                                                              heterogeneous data types (e.g. images,                                                                                              - Users of data can provide additional metadata and
                                                                                                                                         - Provide links between data, people & publications
                                                                              geo-spatial images, sensor data etc.)                                                                                                 annotations

                                                                                                              iv) Create new data                                                           v) Community curation of data

                     3. SEAD Goals
SEAD will address the needs of sustainability researchers to search
for, aggregate, and maintain valuable data for the long term.

To do this, the project seeks to build a prototype that:
                                                                                                                                                                                - The community identi es and curates data of value
                                                                                                                                                                                - These valued data will be moved to existing institutional repositories
- Applies existing tools and services to sustainability research                                  - Combine data from multiple sources and contribute derived                     for long-term storage
- Integrates these services into a generalizable “Active and Social                                 data back to SEAD
  Curation” infrastructure
- Enables researchers to collaborate and share data during active         - The SEAD team will work closely with the community of sustainability scientists to evolve these use cases.
  projects                                                                - In the rst two years of the project, SEAD will collaborate with scientists studying the Upper Great Lakes and Upper Mississippi River Basin.
- Packages and migrates data valued by the users to a                     - Through this collaboration, SEAD will prototype a system that helps researchers manage their data and motivates them to share data and information about their data with others.
  federated repository for long-term preservation
                                                                                                                                                                        SEAD is funded by the National Science Foundation under cooperative agreement #OCI0940824.

More Related Content

More from SEAD

An Overview of Plans for SEAD
An Overview of Plans for SEADAn Overview of Plans for SEAD
An Overview of Plans for SEAD
SEAD
 
NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14
SEAD
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
SEAD
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
SEAD
 
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social CurationSEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD
 

More from SEAD (11)

An Overview of Plans for SEAD
An Overview of Plans for SEADAn Overview of Plans for SEAD
An Overview of Plans for SEAD
 
Presentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesPresentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research Series
 
NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
SEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability ScienceSEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability Science
 
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social CurationSEAD: Opening Data in the "Long Tail" for Active and Social Curation
SEAD: Opening Data in the "Long Tail" for Active and Social Curation
 
Data 2012 -- Presentation by Margaret Hedstrom (Jan 2012
Data 2012 -- Presentation by Margaret Hedstrom (Jan 2012Data 2012 -- Presentation by Margaret Hedstrom (Jan 2012
Data 2012 -- Presentation by Margaret Hedstrom (Jan 2012
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
 
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
 
SEAD slide set (October 2011)
SEAD slide set (October 2011)SEAD slide set (October 2011)
SEAD slide set (October 2011)
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

SEAD: A system to support social and active data curation

  • 1. Jude Yew Ann Zimmerman SEAD: A system to support active and social data curation Magaret Hedstrom SE AD Praveen Kumar Robert McDonald in sustainability science James Myers Beth Plale Sustainable Environment — Actionable Data University of Michigan University of Illinois at Urbana-Champaign Indiana University Rensselaer Polytechnic Institute 1. Problem/Domain 4. SEAD Strategy Networked data Active curation Social curation + - - - - Tag and annotate data Overlay it with reference data Organize it in domain terminology Link it to people, papers, projects, conversations Long-term archive solution - Leverage social media for discovery of data, - Move data curation upstream in the data interest & expertise life-cycle - Support annotation of data by users of data - Sustainability science is a data-intensive area that focuses on - Record metadata at ingest - Record conversations and comments surrounding the complex interactions between nature and human activities. the data - Sustainability research requires access to data from the physical - Make connections between data & researchers - Take advantage of existing infrastructures (Institutional and social sciences. through social networking Repositories, ICPSR) for long-term preservation - But data are di cult to nd, obtain and use because di erent disciplines collect, describe and store their data in di erent ways. 5. SEAD Use Cases 2. Data Challenges in Sustainability Science i) Able to ingest a variety of data types ii) Support data discovery iii) Add value to existing data The long tail of scienti c research data: - Small and derived data sets - Heterogeneous data - Multiple sources of data + = - Short-lived data with long-term value - Value of data grows when combined & integrated - Users can store, manage and share heterogeneous data types (e.g. images, - Users of data can provide additional metadata and - Provide links between data, people & publications geo-spatial images, sensor data etc.) annotations iv) Create new data v) Community curation of data 3. SEAD Goals SEAD will address the needs of sustainability researchers to search for, aggregate, and maintain valuable data for the long term. To do this, the project seeks to build a prototype that: - The community identi es and curates data of value - These valued data will be moved to existing institutional repositories - Applies existing tools and services to sustainability research - Combine data from multiple sources and contribute derived for long-term storage - Integrates these services into a generalizable “Active and Social data back to SEAD Curation” infrastructure - Enables researchers to collaborate and share data during active - The SEAD team will work closely with the community of sustainability scientists to evolve these use cases. projects - In the rst two years of the project, SEAD will collaborate with scientists studying the Upper Great Lakes and Upper Mississippi River Basin. - Packages and migrates data valued by the users to a - Through this collaboration, SEAD will prototype a system that helps researchers manage their data and motivates them to share data and information about their data with others. federated repository for long-term preservation SEAD is funded by the National Science Foundation under cooperative agreement #OCI0940824.