SlideShare a Scribd company logo
Stewarding Big Data: Perspectives on Public Access
   to Federally Funded Scientific Research Data

 Big Data and Big Challenges for Law and Legal Information
                   Georgetown Law Library
                      January 30, 2013



                    William G. LeFurgy
                    Library of Congress
                         @blefurgy
My Perspective on Big Data
             Stewardship
• Realizing full potential from big data depends
  keeping it accessible over time
• Accessibility depends on life cycle management,
  most especially preservation
• Advocate for collaborative, distributed model
• Understand that “stewardship” has a different
  meaning for many data creators
White House RFI Input
             Instructive
• Request for Information on Public Access to
  Federally Funded Scientific Research Data, Nov.
  2011
• Interested individuals and organizations to
  provide recommendations on approaches for
  ensuring long-term stewardship and
  encouraging broad public access
• Input provided to inform development of agency
  policies and standards for managing big data
Summary of Responses

• 118 individual responses
  – 50% from academic research departments,
    professional organizations
  – 35% from libraries, repositories and allied
    organizations
  – 10% from publishers and commercial organizations
  – 5% other
• Excellent (unstructured!) data set to analyze
  current thinking on big data stewardship
Top-Level Policy Recommendations

• Remarkable degree of congruence among
  comments
  – Broadly allocate adequate resources for data
    stewardship
  – Extend a collaborative national digital stewardship
    infrastructure
  – Institute and enforce a data preservation mandate
  – Strongly encourage policies to support secondary
    use, respect for data
• But… conflicted about IP, copyright, privacy
Need: Resources
• Funders to include money in awards for data
  stewardship
• Need cost models, other guidance for estimating
  data life cycle costs
• Allocate expanded resources to support national
  data repositories
Need: National Digital Stewardship
          Infrastructure
• Leverage current institutional efforts to
  define best practices, tools, services
• Extend community of practice for data
  stewardship through collaborative action
  across disciplines
• Develop a skilled workforce with data
  stewardship expertise
Need: A Data Preservation Mandate
• Incentivize grant applicants to make realistic
  plans for data
   – Stronger data manager requirements in application
     process
   – Tie future awards to demonstrated success with data
     stewardship
   – Enable direct support of PIs by data stewardship
     specialists
Support: Secondary Use, Respect for Data

• Broadly apply a citation mechanism for
  data sets (e.g., DataCite, DOIs)
• Criteria for evaluating grant applications
  tied to secondary use of data
• Give equal credit for publishing articles
  and data sets
• Develop robust metrics to track data
  publication and use
Muddled Picture for IP
• Opinions diverge about role of copyright,
  patents, etc., in regard to research data
  – Commercial interests see IP as critical
  – Many data users favor Creative Commons or public
    domain approach
  – Data creators fall between these positions
• A significant degree of concern raised regarding
  privacy in connection with IRB, personal data
Next Steps

• Two interagency working groups within the
  National Science and Technology Council
  reviewing recommendations
• Groups will develop science agency policies for
  data dissemination and stewardship
• Potential for major change, as policies may have
  association with funding from the Federal
  science agencies
Websites
Request for Information: Public Access to Digital Data Resulting From
Federally Funded Scientific Research, http://ow.ly/ePB93
Your Comments on Access to Federally Funded Scientific Research
Results, http://ow.ly/ePBb9
National Science and Technology Council, http://ow.ly/h87Li

More Related Content

What's hot

Standardising research data policies, research data network
Standardising research data policies, research data networkStandardising research data policies, research data network
Standardising research data policies, research data network
Jisc RDM
 
2013 ICPSR Data Services
2013 ICPSR Data Services2013 ICPSR Data Services
2013 ICPSR Data Services
ICPSR
 
Frances Burton on sensitive data
Frances Burton on sensitive dataFrances Burton on sensitive data
Frances Burton on sensitive data
Jisc RDM
 
Connected health cities
Connected health citiesConnected health cities
Connected health cities
Jisc
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open Science
Research Data Alliance
 
The African Open Science Platform/Susan Veldsman
The African Open Science Platform/Susan VeldsmanThe African Open Science Platform/Susan Veldsman
The African Open Science Platform/Susan Veldsman
African Open Science Platform
 
ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Me...
ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Me...ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Me...
ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Me...
ASIS&T
 
Journal research data policy update
Journal research data policy updateJournal research data policy update
Journal research data policy update
Jisc RDM
 
‘Good, better, best’? Examining the range and rationales of institutional dat...
‘Good, better, best’? Examining the range and rationales of institutional dat...‘Good, better, best’? Examining the range and rationales of institutional dat...
‘Good, better, best’? Examining the range and rationales of institutional dat...
Robin Rice
 
Research Week 2014: Tri-council Open-Access Policies and Data Management Plan...
Research Week 2014: Tri-council Open-Access Policies and Data Management Plan...Research Week 2014: Tri-council Open-Access Policies and Data Management Plan...
Research Week 2014: Tri-council Open-Access Policies and Data Management Plan...
Wilfrid Laurier University
 
A SWOT Analysis of Data Science @ NIH
A SWOT Analysis of Data Science @ NIHA SWOT Analysis of Data Science @ NIH
A SWOT Analysis of Data Science @ NIH
Philip Bourne
 
Making Biomedical Research More Like Airbnb
Making Biomedical Research More Like AirbnbMaking Biomedical Research More Like Airbnb
Making Biomedical Research More Like Airbnb
Philip Bourne
 
NIH BD2K DataMed model, DATS
NIH BD2K DataMed model, DATSNIH BD2K DataMed model, DATS
NIH BD2K DataMed model, DATS
Susanna-Assunta Sansone
 
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
ASIS&T
 
HESA data, describing research activity and #REF2021
HESA data, describing research activity and #REF2021HESA data, describing research activity and #REF2021
HESA data, describing research activity and #REF2021
Jisc RDM
 
Joy Davidson “Data Management Planning: an introduction” SALCTG June 2013
Joy Davidson “Data Management Planning: an introduction” SALCTG June 2013Joy Davidson “Data Management Planning: an introduction” SALCTG June 2013
Joy Davidson “Data Management Planning: an introduction” SALCTG June 2013
SALCTG
 
Managing sensitive data at the University of Bristol
Managing sensitive data at the University of BristolManaging sensitive data at the University of Bristol
Managing sensitive data at the University of Bristol
Jisc RDM
 
Borgman - Privacy, Policy and Data Governance in the University
Borgman - Privacy, Policy and Data Governance in the UniversityBorgman - Privacy, Policy and Data Governance in the University
Borgman - Privacy, Policy and Data Governance in the University
National Information Standards Organization (NISO)
 
Libraries, RDM and e-infrastructure requirements
Libraries, RDM and e-infrastructure requirementsLibraries, RDM and e-infrastructure requirements
Libraries, RDM and e-infrastructure requirements
Susan Reilly
 
State of open research data open con
State of open research data   open conState of open research data   open con
State of open research data open con
Amye Kenall
 

What's hot (20)

Standardising research data policies, research data network
Standardising research data policies, research data networkStandardising research data policies, research data network
Standardising research data policies, research data network
 
2013 ICPSR Data Services
2013 ICPSR Data Services2013 ICPSR Data Services
2013 ICPSR Data Services
 
Frances Burton on sensitive data
Frances Burton on sensitive dataFrances Burton on sensitive data
Frances Burton on sensitive data
 
Connected health cities
Connected health citiesConnected health cities
Connected health cities
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open Science
 
The African Open Science Platform/Susan Veldsman
The African Open Science Platform/Susan VeldsmanThe African Open Science Platform/Susan Veldsman
The African Open Science Platform/Susan Veldsman
 
ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Me...
ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Me...ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Me...
ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Me...
 
Journal research data policy update
Journal research data policy updateJournal research data policy update
Journal research data policy update
 
‘Good, better, best’? Examining the range and rationales of institutional dat...
‘Good, better, best’? Examining the range and rationales of institutional dat...‘Good, better, best’? Examining the range and rationales of institutional dat...
‘Good, better, best’? Examining the range and rationales of institutional dat...
 
Research Week 2014: Tri-council Open-Access Policies and Data Management Plan...
Research Week 2014: Tri-council Open-Access Policies and Data Management Plan...Research Week 2014: Tri-council Open-Access Policies and Data Management Plan...
Research Week 2014: Tri-council Open-Access Policies and Data Management Plan...
 
A SWOT Analysis of Data Science @ NIH
A SWOT Analysis of Data Science @ NIHA SWOT Analysis of Data Science @ NIH
A SWOT Analysis of Data Science @ NIH
 
Making Biomedical Research More Like Airbnb
Making Biomedical Research More Like AirbnbMaking Biomedical Research More Like Airbnb
Making Biomedical Research More Like Airbnb
 
NIH BD2K DataMed model, DATS
NIH BD2K DataMed model, DATSNIH BD2K DataMed model, DATS
NIH BD2K DataMed model, DATS
 
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
 
HESA data, describing research activity and #REF2021
HESA data, describing research activity and #REF2021HESA data, describing research activity and #REF2021
HESA data, describing research activity and #REF2021
 
Joy Davidson “Data Management Planning: an introduction” SALCTG June 2013
Joy Davidson “Data Management Planning: an introduction” SALCTG June 2013Joy Davidson “Data Management Planning: an introduction” SALCTG June 2013
Joy Davidson “Data Management Planning: an introduction” SALCTG June 2013
 
Managing sensitive data at the University of Bristol
Managing sensitive data at the University of BristolManaging sensitive data at the University of Bristol
Managing sensitive data at the University of Bristol
 
Borgman - Privacy, Policy and Data Governance in the University
Borgman - Privacy, Policy and Data Governance in the UniversityBorgman - Privacy, Policy and Data Governance in the University
Borgman - Privacy, Policy and Data Governance in the University
 
Libraries, RDM and e-infrastructure requirements
Libraries, RDM and e-infrastructure requirementsLibraries, RDM and e-infrastructure requirements
Libraries, RDM and e-infrastructure requirements
 
State of open research data open con
State of open research data   open conState of open research data   open con
State of open research data open con
 

Similar to Stewarding Big Data

Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for Engineers
Sherry Lake
 
Data Publishing Overview
Data Publishing OverviewData Publishing Overview
Data Publishing Overview
Richard Huffine
 
Library resources and services for grant development
Library resources and services for grant developmentLibrary resources and services for grant development
Library resources and services for grant development
rds-wayne-edu
 
Guidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access PlansGuidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access Plans
ICPSR
 
From Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipFrom Data Sharing to Data Stewardship
From Data Sharing to Data Stewardship
ICPSR
 
Data management: The new frontier for libraries
Data management: The new frontier for librariesData management: The new frontier for libraries
Data management: The new frontier for libraries
LEARN Project
 
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
ICPSR
 
Open Data: an Open and Shut Case?
Open Data: an Open and Shut Case?Open Data: an Open and Shut Case?
Open Data: an Open and Shut Case?
Dublinked .
 
Overview and library support for data management/sharing
Overview and library support for data management/sharingOverview and library support for data management/sharing
Overview and library support for data management/sharing
rds-wayne-edu
 
ACRL STS Liaisons Forum - AIBS
ACRL STS Liaisons Forum - AIBSACRL STS Liaisons Forum - AIBS
ACRL STS Liaisons Forum - AIBS
Virginia Pannabecker
 
Open data: an open and shut case?
Open data: an open and shut case?Open data: an open and shut case?
Open data: an open and shut case?
robkitchin
 
Overview of Emerging Requirements for Data Management of Federally Funded Res...
Overview of Emerging Requirements for Data Management of Federally Funded Res...Overview of Emerging Requirements for Data Management of Federally Funded Res...
Overview of Emerging Requirements for Data Management of Federally Funded Res...
Richard Huffine
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersIncisive_Events
 
Federal funder mandates
Federal funder mandatesFederal funder mandates
Federal funder mandates
Sherry Lake
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
ICPSR
 
Data!
Data!Data!
David Carr: Maximising the availability and use of research outputs – a funde...
David Carr: Maximising the availability and use of research outputs – a funde...David Carr: Maximising the availability and use of research outputs – a funde...
David Carr: Maximising the availability and use of research outputs – a funde...NeilStewartCity
 
Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...
Carolyn Ten Holter
 
Why managedata
Why managedataWhy managedata
Why managedata
Sherry Lake
 
Yale Day of Data
Yale Day of Data Yale Day of Data
Yale Day of Data
Philip Bourne
 

Similar to Stewarding Big Data (20)

Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for Engineers
 
Data Publishing Overview
Data Publishing OverviewData Publishing Overview
Data Publishing Overview
 
Library resources and services for grant development
Library resources and services for grant developmentLibrary resources and services for grant development
Library resources and services for grant development
 
Guidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access PlansGuidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access Plans
 
From Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipFrom Data Sharing to Data Stewardship
From Data Sharing to Data Stewardship
 
Data management: The new frontier for libraries
Data management: The new frontier for librariesData management: The new frontier for libraries
Data management: The new frontier for libraries
 
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
 
Open Data: an Open and Shut Case?
Open Data: an Open and Shut Case?Open Data: an Open and Shut Case?
Open Data: an Open and Shut Case?
 
Overview and library support for data management/sharing
Overview and library support for data management/sharingOverview and library support for data management/sharing
Overview and library support for data management/sharing
 
ACRL STS Liaisons Forum - AIBS
ACRL STS Liaisons Forum - AIBSACRL STS Liaisons Forum - AIBS
ACRL STS Liaisons Forum - AIBS
 
Open data: an open and shut case?
Open data: an open and shut case?Open data: an open and shut case?
Open data: an open and shut case?
 
Overview of Emerging Requirements for Data Management of Federally Funded Res...
Overview of Emerging Requirements for Data Management of Federally Funded Res...Overview of Emerging Requirements for Data Management of Federally Funded Res...
Overview of Emerging Requirements for Data Management of Federally Funded Res...
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producers
 
Federal funder mandates
Federal funder mandatesFederal funder mandates
Federal funder mandates
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Data!
Data!Data!
Data!
 
David Carr: Maximising the availability and use of research outputs – a funde...
David Carr: Maximising the availability and use of research outputs – a funde...David Carr: Maximising the availability and use of research outputs – a funde...
David Carr: Maximising the availability and use of research outputs – a funde...
 
Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...
 
Why managedata
Why managedataWhy managedata
Why managedata
 
Yale Day of Data
Yale Day of Data Yale Day of Data
Yale Day of Data
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

Stewarding Big Data

  • 1. Stewarding Big Data: Perspectives on Public Access to Federally Funded Scientific Research Data Big Data and Big Challenges for Law and Legal Information Georgetown Law Library January 30, 2013 William G. LeFurgy Library of Congress @blefurgy
  • 2. My Perspective on Big Data Stewardship • Realizing full potential from big data depends keeping it accessible over time • Accessibility depends on life cycle management, most especially preservation • Advocate for collaborative, distributed model • Understand that “stewardship” has a different meaning for many data creators
  • 3. White House RFI Input Instructive • Request for Information on Public Access to Federally Funded Scientific Research Data, Nov. 2011 • Interested individuals and organizations to provide recommendations on approaches for ensuring long-term stewardship and encouraging broad public access • Input provided to inform development of agency policies and standards for managing big data
  • 4. Summary of Responses • 118 individual responses – 50% from academic research departments, professional organizations – 35% from libraries, repositories and allied organizations – 10% from publishers and commercial organizations – 5% other • Excellent (unstructured!) data set to analyze current thinking on big data stewardship
  • 5. Top-Level Policy Recommendations • Remarkable degree of congruence among comments – Broadly allocate adequate resources for data stewardship – Extend a collaborative national digital stewardship infrastructure – Institute and enforce a data preservation mandate – Strongly encourage policies to support secondary use, respect for data • But… conflicted about IP, copyright, privacy
  • 6. Need: Resources • Funders to include money in awards for data stewardship • Need cost models, other guidance for estimating data life cycle costs • Allocate expanded resources to support national data repositories
  • 7. Need: National Digital Stewardship Infrastructure • Leverage current institutional efforts to define best practices, tools, services • Extend community of practice for data stewardship through collaborative action across disciplines • Develop a skilled workforce with data stewardship expertise
  • 8. Need: A Data Preservation Mandate • Incentivize grant applicants to make realistic plans for data – Stronger data manager requirements in application process – Tie future awards to demonstrated success with data stewardship – Enable direct support of PIs by data stewardship specialists
  • 9. Support: Secondary Use, Respect for Data • Broadly apply a citation mechanism for data sets (e.g., DataCite, DOIs) • Criteria for evaluating grant applications tied to secondary use of data • Give equal credit for publishing articles and data sets • Develop robust metrics to track data publication and use
  • 10. Muddled Picture for IP • Opinions diverge about role of copyright, patents, etc., in regard to research data – Commercial interests see IP as critical – Many data users favor Creative Commons or public domain approach – Data creators fall between these positions • A significant degree of concern raised regarding privacy in connection with IRB, personal data
  • 11. Next Steps • Two interagency working groups within the National Science and Technology Council reviewing recommendations • Groups will develop science agency policies for data dissemination and stewardship • Potential for major change, as policies may have association with funding from the Federal science agencies
  • 12. Websites Request for Information: Public Access to Digital Data Resulting From Federally Funded Scientific Research, http://ow.ly/ePB93 Your Comments on Access to Federally Funded Scientific Research Results, http://ow.ly/ePBb9 National Science and Technology Council, http://ow.ly/h87Li

Editor's Notes

  1. Thanks for having me here today. I’m going to do my best to give you an overview from the perspective of libraries and archives on keeping big data for scholarship and public policy.
  2. I like the term “stewarding” to sum up all the activities involved in acquiring, preserving and making available data sets. Stewarding is essential if we as a society are going to see the full potential from big data. It’s a pretty basic proposition: somebody must devote time and effort to keeping data and to helping users access it. If this doesn’t happen, data will be hard to use, scattered and even lost. There are two basic considerations here. Collecting organizations need to concern themselves with the full life cycle of data, from initial creation, through use, to “archiving,” to long-term preservation and access, and The job is bigger than any one organization can handle; the volume and complexity of data require many organizations to work together in new ways.
  3. I thought a good way to frame this discussion would be to summarize what a variety of organizations said in response to a recent White House request for information. This request asked for input about ensuring stewardship and encouraging broad public access to federally funded scientific research data. The White House will use the information submitted to draft revised agency policies in connection with big data. This has huge potential. The revised policies could cover requirements for data management tied to billions in funding from the National Science Foundation, the National Institutes of Health and other funding agencies.
  4. The White House says they received 118 individual responses, all of which are made available on their website. There’s an interesting mix of respondents. Half came from discipline-specific academic research departments or professional organizations. I’d characterize them as data creators and data users. About a third of the submissions came from libraries, archives and other collecting entities. The rest came from a mix of individuals, publishers and commercial organizations. What we have here is an excellent data set that offers a broad-based snapshot of current thinking on data preservation. The response data set is seriously unstructured, as it is made up of randomly formatted textual documents, but it fairly easy to analyze.
  5. I was pleasantly surprised at the degree of congruence among the comments. Nearly everyone enthusiastically agreed that enhanced data stewardship was critical, both to support primary scientific research and broad secondary use by the public. Most submissions explicitly called for increased resources for data stewardship. There was heavy agreement that a distributed national digital stewardship infrastructure was the right vehicle for the infusion of new funding. Apart from money, the comments also aligned in calling for a strong data preservation mandate from funding agencies. The basic idea is that receipt of funding awards should be tied to a clear expectation for long-term data management. Many of you won’t be surprised to hear that there was much less agreement on traditionally thorny topics such as intellectual property, copyright and personal privacy.
  6. In terms of a push for increased resources, the comments clustered around three intentions. Individual funding awards should include a dedicated line item for data stewardship There is a need for models for projecting the lifetime cost of keeping data. Funding also needs to be channeled to a national infrastructure, most especially to support a distributed network of data repositories.
  7. The focus on a national data infrastructure zeroed in on ideas for extending work that’s already underway in terms of standards, tools, and best practices. There was enthusiasm for boosting the present community of practice for data stewardship, most particularly in a way that bridges different research disciplines. This really makes sense to me. While there is excellent work going on, much of it tends to reside in specialty silos. We need to accept that, at a certain level, data stewardship has a common set of requirements that are best addressed collectively. Related to this is a pressing need for a much expanded work force of data stewards.
  8. The need for a data preservation mandate comes down to what economists call “incentivizing.” In other words, if we want better data management, principal investigators have to be properly motivated. This motivation can come in different forms. Funding applications could call for detailed attention to data management. Evaluation of funding awards can be tied to prior demonstrated success with data stewardship And there could be provisions for data stewardship specialists to support PIs.
  9. There was strong support for what I characterize as “respect for data,” which is linked to recognizing the broad potential for secondary use. Ideas for enabling this include adoption of a citation mechanism for data sets, such as that offered by the DateCite organization. Related to this was the proposal to give the same credit for providing useful data sets as is now given for published articles. Securing this kind of credit depends on developing a new set of metrics to track data sets and their use.
  10. It’s no shock that consensus evaporated when it came to traditional hot-button issues. Commercial interests see control of IP as critical, while data users want relaxed IP barriers. Data creators fall between these endpoints—some want more stringent control, while others see the benefit of wider use. The issue of data privacy, most especially in connection with personal data collected under IRB rules, was strongly voiced by a number of creators, some of whom said that rules essentially barred any secondary use of certain data sets.
  11. In terms of next steps, the ball is in the White House’s court. Two interagency working groups are mulling over the comments and will use them to draft new policies governing data stewardship. As I noted earlier, we have the potential for major improvements in how federally funded data is kept and used. But I hasten to add that the outcome is still uncertain. What is clear, however, is that there is a strong consensus among data producers, users and keepers about what should happen.
  12. Here is a list of the websites I used in developing this presentation. Thank you.