UVA LIBRARY SCIENTIFIC DATA
CONSULTING GROUP (SCIDAC):
NEW PARTNERSHIPS AND SERVICES TO
SUPPORT SCIENTIFIC DATA IN THE LIBRARY


   Andrew Sallans
   Head of Strategic Data Initiatives


   Sherry Lake
   Senior Scientific Data Consultant


   IASSIST 2011
   1 June 2011
OUTLINE
 Phase 1 – Research Computing Lab
 Phase 2 – Scientific Data Consulting Group
    1.   Data assessment interviews
    2.   Data management planning
    3.   Integration of processes with IR
    Partnerships
        Internal
        External
    Challenges
    Future opportunities
                                               2
BACKGROUND ON THE UNIVERSITY OF VIRGINIA
                     “Mr. Jefferson’s University”
                     Size
                       About 14,000 undergraduate
                        students
                       About 6,000 graduate students
                       About 2,000 faculty

                     Annual research dollars –
                      FY10 $375 million
                         DE (Ed) - $10 million
                         DOE- $10 million
                         DOD- $15 million
                         NSF - $29 million
                                                        3
                         DHHS - $197 million
4


Source: “Men's Lacrosse NCAA CHAMPS! (by Matt Riley) 5/31/2011” photo gallery, http://www.virginiasports.com/
PHASE 1: RESEARCH COMPUTING LAB
   Began planning in 2005.
   Central IT: seeking greater
    visibility.
   Library: seeking new ways
    to support scientific research.
   Collocation provided mutual
    benefits.
   Staff combined in 2006,
    moved to Library locations
    (Research Computing Lab &
    Scholars’ Lab), setup new
    service points and services.
                                      5
RESEARCH COMPUTING LAB RESPONSE
 Aiming to provide support across the entire
  scientific research data lifecycle
 Staff with expertise in:
     Data
     Quantitative data, statistics
     Modeling, visualization
     Scientific publishing

 Emphasis on consulting, not drop-off services
 Partnership with traditional librarians to help
  ease transition to new support models
                                                    6
SAMPLE RCL CONSULTATIONS
   STS Undergrad Environmental Justice (2008)
     Development of technology solutions for empowering the
      citizen scientist
     Web 2.0 tools, data collection/management
     Data analysis
   Economics Graduate Student (2008/2009)
     Airline flight price modeling
     Screen scraping, data collection/management
     Data analysis
   Mountain Lake Beetle Project (2009)
     Mobile data acquisition/collection solution
     Database development/management, programming
     Data analysis
   Archiving of dissertation data (2009)
     EVSC student, ModelMaker 4.0 data
     Biology student, IDL, Matlab, R code                     7
TAKE-AWAYS
 This is the future
 Heavily growing space, lots of opportunity

 Requires big investment and commitment, the
  biggest being training and priority alignment
 Libraries and institutions need to make decisions
  on what to do and what not to do
 It’s a culture change for both libraries,
  institutions, and researchers



                                                      8
PHASE 2 - SCIENTIFIC DATA CONSULTING
GROUP
    December 2009/January 2010: rethinking the
     model
      Budgetary pressures
      Changes in organizational priorities
      Emerging demands in research community

  Spring 2010: decision to focus on data
  May 2010: close of RCL, start of SciDaC




                                                  9
WHAT’S HOT IN 2010?

 Open data: growing governmental interest in
  making publicly-funded research more
  transparent and more available (NIH, NSF)
 Broader critical review: greater interest
  evaluating original research data (Nature)
 Technological advances: sharing of research
  results easier and faster (Repositories, Web 2.0)
 Reuse/preservation of research data:
  increased consideration of the cost and value of
  research data and need to ensure its longevity
                                                      10
“SCIENTISTS SEEKING NSF FUNDING WILL SOON BE
REQUIRED TO SUBMIT DATA MANAGEMENT PLANS”
Press Release 10-077, May 5, 2010


     Current Policy:
     o “To advance science by encouraging data sharing among
       researchers”
     o Data obtained with federal funds be accessible to the general
       public
     o Grantees must develop and submit specific plans to share
       materials collected with NSF support, except where this is
       inappropriate or impossible

     On or around October 2010:
     o All new NSF proposals will be required to include a data
       management plan in the form of a 2 pg supplementary document
       (peer reviewed)
     o New policy is meant to be a 1st step toward a more
       comprehensive approach to data management
     o Exact requirements vague                                    11
THE CHALLENGE FOR INSTITUTIONS

Data is expensive
 Time, instrumentation, inability to reproduce

Increasing regulation
 Granting agencies and journals require
  submission
Inadequate training
 No formal data management curriculum

Preservation is not a priority
 For most researchers, preservation takes time
  away from the work that is rewarded
  (publication, teaching)                         12
SO…WHO’S GOING TO TAKE THIS ON?
 Researchers?
 VPR?

 CIO?

 OSP?

 UL?




                                  13
WHY THE LIBRARY?
 Neutral: works across the entire institution
 Strong in relationship building: has
  experience fostering discussion and relationships,
  and cultivates an existing support network
 Intellectual Property experts: has dealt with
  copyright, can translate to data
 Service-oriented: uniquely positioned as an
  intellectual service unit within the institution



                                                       14
GETTING STARTED…
   Take what we learned in the RCL experience and
    apply it to the focused demands around data

Steps:
 Conduct a stakeholder analysis

 Make a short term plan (12 months)

 Develop clear priorities

 Refine and standardize consulting methods

 Communicate heavily


                                                     15
STAKEHOLDER ANALYSIS (ABBREVIATED)
Internal               External
 Researchers           Funding agencies

 Graduate Students     Broader research

 Grant Administrators   community
 Deans                 “The Public”

 VP/CIO

 VPR

 OSP

 UL

                                            16
SHORT TERM PLAN
 Survey OSP to match grant holders with
  regulations
 Educate/engage subject librarians

 Build political awareness/support

 Build partnerships with
  local/national/international groups

Resource requests:
 Staffing commitment

 Travel/partnership support
                                           17
 Promotion of initiative to institution
CLEAR PRIORITIES
1.   Data interviews/assessments
2.   Response to NSF Data Management Plan
     (DMP) Mandate
3.   Leadership on data for the Institutional
     Repository (IR)




                                                18
CONSULTING ACTIVITIES
 Interviews/assessments
 Data management planning templates

 LOTS of documentation

 Constant and continuous refinement of process

 Focus on helping researchers improve process




                                                  19
COMMUNICATE HEAVILY
   Internal
     Inform staff of processes, priorities, and progress
     Keep stakeholders engaged
     Reach the consumers from many angles

   External
     Discuss and share experiences with colleagues at other
      institutions
     Create partnerships to share, build upon resources and
      experiences, collaborate on tools
     Networking (Twitter, LinkedIn, listserves, conference calls,
      conference presentations)


    Bottom line: this is a big culture shift, and you do have to     20
        say the same thing many times in different ways
PRIORITY 1 – DATA ASSESSMENT INTERVIEWS
 Initially a means of growing awareness of
  consulting service and doing assessment, now a
  means of establishing a baseline for research
  data management practices with any new “client”
 Protocol involves:
     60 minute interview discussion (researcher / SciDaC
      consultants / subject librarian)
     Development of a report
     SciDaC consultants give researchers
      recommendations to improve data management
     SciDaC consultants work with researchers to
      implement recommended solutions
   Approach has proven to be very effective thus far       21
PRIORITY 2 – DATA MANAGEMENT
PLANNING

 Highest priority of responding to and addressing
  support needs for funding agency requirements
  (ie. NSF, others)
 Getting a handle on data management as a
  means of institutional risk management
 Coordination of effort across institution




                                                     22
NSF DATA MANAGEMENT PLAN MANDATE
 Official mandate became active Jan. 18, 2011
 New NSF Directorates/Divisions continue to
  release and specify guidelines (examples below)
       Education and Human Resources (EHR)
       Engineering (ENG)
       Geological Sciences (GEO)
       Mathematical and Physical Sciences (MPS)
       Social, Behavioral, and Economic Sciences (SBE)
   Researchers continue to be mostly unaware of the
    mandate and how to prepare a DMP
                                                          23
UVA SCIDAC NSF DMP RESPONSE
UVa Library’s Original Request
 Develop boilerplate for researchers to use in proposals


SciDaC Group’s Response
 No boilerplate, successful proposals need customized plans
 Our approach involves:
    Knowledge across many communities (“translational” opportunities)
    Leadership on policy/infrastructure development
    Development of a template that simplifies writing the plan


Principles
 Must be easy for researcher
 Must be supportable by available UVA resources/infrastructure
 Must be able to be followed-through on if grant is awarded
                                                                     24
PRIORITY 3 – INTEGRATION WITH IR
   Institutional repository “Libra” (http://libra.virginia.edu)
     Built upon Hydra architecture
     Three components: open access publications, data, and electronic
      theses/dissertations
   Working on figuring out storage and cost models to
    support management of big and small data from across
    institution’s research community
   Will provide preservation assurance for data in form of
    “blobs” or packages (bit preservation, no format migration)
   Currently in process of developing user
    interface/ingestion prototype that addresses needs of
    small data for release in late July 2011
                                                                         25
COLLABORATIONS
   Internal
     Library / VPR / CIO / OSP
     Institutional Repository Team
     Kuali Coeus team


   External
     DMP Tool
     DataONE
     Conference/professional networks



                                         26
27
CHALLENGES
 Involving subject librarians?
 Gaining institutional buy-in?

 Meeting demand?




                                  28
HOW TO INVOLVE SUBJECT LIBRARIANS?
UVa Library Staff Model
 Scientific Data Consultants
 Subject Librarians


Current Training Model
 Brown Bag Data Curation
  Discussions
 Data Interviews


Goals and Objectives
 Build Data Literacy
 Create Collaborative Opportunities
 Establish the Library for Data
  Preservation
                                       29
HOW TO GAIN INSTITUTIONAL BUY-IN?
 Regulations are helpful
 Partnerships between key stakeholders:
     University libraries (UL)
     Central IT (CIO)
     Research Office (VP for Research)
     Sponsored Programs/Research

   Strategic investment: take ownership, allocate
    resources, and demonstrate capability



                                                     30
HOW TO MEET DEMAND?
   Time: how to best manage staff time
       NSF research support alone is going to be very time
        consuming (UVA had about 140 proposals over the past
        year, 44 in November alone)


   Funding: work with leaders to find money
     Redirection/reallocation of grant overhead dollars
     Write-in of library staff on grants


   Strategy: decide how to invest
     How might units be reorganized?
     How do we expand to other disciplines?
     How could staff resources and expertise be refocused?
     What additional partnerships would add value?            31
FUTURE DIRECTIONS
 Addressing data management needs of other
  disciplines across the institution
 Integration into formal research proposal
  process
 Broader data management education
 Increased funded research project
  consulting
 Technology consulting
 Expansion of virtual organization partners
  and creation of research advisory board
 Guiding of policy revision to address new
  interests in data management and preservation   32
THANK YOU!
Andrew Sallans
Head of Strategic Data Initiatives, SciDaC Group
University of Virginia Library
Email: als9q@virginia.edu
Twitter: asallans
http://www.lib.virginia.edu/brown/data




                                                   33

UVa Library Scientific Data Consulting Group (SciDaC): New Partnerships and Services to Support Scientific Data in the Library

  • 1.
    UVA LIBRARY SCIENTIFICDATA CONSULTING GROUP (SCIDAC): NEW PARTNERSHIPS AND SERVICES TO SUPPORT SCIENTIFIC DATA IN THE LIBRARY Andrew Sallans Head of Strategic Data Initiatives Sherry Lake Senior Scientific Data Consultant IASSIST 2011 1 June 2011
  • 2.
    OUTLINE  Phase 1– Research Computing Lab  Phase 2 – Scientific Data Consulting Group 1. Data assessment interviews 2. Data management planning 3. Integration of processes with IR  Partnerships  Internal  External  Challenges  Future opportunities 2
  • 3.
    BACKGROUND ON THEUNIVERSITY OF VIRGINIA  “Mr. Jefferson’s University”  Size  About 14,000 undergraduate students  About 6,000 graduate students  About 2,000 faculty  Annual research dollars – FY10 $375 million  DE (Ed) - $10 million  DOE- $10 million  DOD- $15 million  NSF - $29 million 3  DHHS - $197 million
  • 4.
    4 Source: “Men's LacrosseNCAA CHAMPS! (by Matt Riley) 5/31/2011” photo gallery, http://www.virginiasports.com/
  • 5.
    PHASE 1: RESEARCHCOMPUTING LAB  Began planning in 2005.  Central IT: seeking greater visibility.  Library: seeking new ways to support scientific research.  Collocation provided mutual benefits.  Staff combined in 2006, moved to Library locations (Research Computing Lab & Scholars’ Lab), setup new service points and services. 5
  • 6.
    RESEARCH COMPUTING LABRESPONSE  Aiming to provide support across the entire scientific research data lifecycle  Staff with expertise in:  Data  Quantitative data, statistics  Modeling, visualization  Scientific publishing  Emphasis on consulting, not drop-off services  Partnership with traditional librarians to help ease transition to new support models 6
  • 7.
    SAMPLE RCL CONSULTATIONS  STS Undergrad Environmental Justice (2008)  Development of technology solutions for empowering the citizen scientist  Web 2.0 tools, data collection/management  Data analysis  Economics Graduate Student (2008/2009)  Airline flight price modeling  Screen scraping, data collection/management  Data analysis  Mountain Lake Beetle Project (2009)  Mobile data acquisition/collection solution  Database development/management, programming  Data analysis  Archiving of dissertation data (2009)  EVSC student, ModelMaker 4.0 data  Biology student, IDL, Matlab, R code 7
  • 8.
    TAKE-AWAYS  This isthe future  Heavily growing space, lots of opportunity  Requires big investment and commitment, the biggest being training and priority alignment  Libraries and institutions need to make decisions on what to do and what not to do  It’s a culture change for both libraries, institutions, and researchers 8
  • 9.
    PHASE 2 -SCIENTIFIC DATA CONSULTING GROUP  December 2009/January 2010: rethinking the model  Budgetary pressures  Changes in organizational priorities  Emerging demands in research community  Spring 2010: decision to focus on data  May 2010: close of RCL, start of SciDaC 9
  • 10.
    WHAT’S HOT IN2010?  Open data: growing governmental interest in making publicly-funded research more transparent and more available (NIH, NSF)  Broader critical review: greater interest evaluating original research data (Nature)  Technological advances: sharing of research results easier and faster (Repositories, Web 2.0)  Reuse/preservation of research data: increased consideration of the cost and value of research data and need to ensure its longevity 10
  • 11.
    “SCIENTISTS SEEKING NSFFUNDING WILL SOON BE REQUIRED TO SUBMIT DATA MANAGEMENT PLANS” Press Release 10-077, May 5, 2010 Current Policy: o “To advance science by encouraging data sharing among researchers” o Data obtained with federal funds be accessible to the general public o Grantees must develop and submit specific plans to share materials collected with NSF support, except where this is inappropriate or impossible On or around October 2010: o All new NSF proposals will be required to include a data management plan in the form of a 2 pg supplementary document (peer reviewed) o New policy is meant to be a 1st step toward a more comprehensive approach to data management o Exact requirements vague 11
  • 12.
    THE CHALLENGE FORINSTITUTIONS Data is expensive  Time, instrumentation, inability to reproduce Increasing regulation  Granting agencies and journals require submission Inadequate training  No formal data management curriculum Preservation is not a priority  For most researchers, preservation takes time away from the work that is rewarded (publication, teaching) 12
  • 13.
    SO…WHO’S GOING TOTAKE THIS ON?  Researchers?  VPR?  CIO?  OSP?  UL? 13
  • 14.
    WHY THE LIBRARY? Neutral: works across the entire institution  Strong in relationship building: has experience fostering discussion and relationships, and cultivates an existing support network  Intellectual Property experts: has dealt with copyright, can translate to data  Service-oriented: uniquely positioned as an intellectual service unit within the institution 14
  • 15.
    GETTING STARTED…  Take what we learned in the RCL experience and apply it to the focused demands around data Steps:  Conduct a stakeholder analysis  Make a short term plan (12 months)  Develop clear priorities  Refine and standardize consulting methods  Communicate heavily 15
  • 16.
    STAKEHOLDER ANALYSIS (ABBREVIATED) Internal External  Researchers  Funding agencies  Graduate Students  Broader research  Grant Administrators community  Deans  “The Public”  VP/CIO  VPR  OSP  UL 16
  • 17.
    SHORT TERM PLAN Survey OSP to match grant holders with regulations  Educate/engage subject librarians  Build political awareness/support  Build partnerships with local/national/international groups Resource requests:  Staffing commitment  Travel/partnership support 17  Promotion of initiative to institution
  • 18.
    CLEAR PRIORITIES 1. Data interviews/assessments 2. Response to NSF Data Management Plan (DMP) Mandate 3. Leadership on data for the Institutional Repository (IR) 18
  • 19.
    CONSULTING ACTIVITIES  Interviews/assessments Data management planning templates  LOTS of documentation  Constant and continuous refinement of process  Focus on helping researchers improve process 19
  • 20.
    COMMUNICATE HEAVILY  Internal  Inform staff of processes, priorities, and progress  Keep stakeholders engaged  Reach the consumers from many angles  External  Discuss and share experiences with colleagues at other institutions  Create partnerships to share, build upon resources and experiences, collaborate on tools  Networking (Twitter, LinkedIn, listserves, conference calls, conference presentations) Bottom line: this is a big culture shift, and you do have to 20 say the same thing many times in different ways
  • 21.
    PRIORITY 1 –DATA ASSESSMENT INTERVIEWS  Initially a means of growing awareness of consulting service and doing assessment, now a means of establishing a baseline for research data management practices with any new “client”  Protocol involves:  60 minute interview discussion (researcher / SciDaC consultants / subject librarian)  Development of a report  SciDaC consultants give researchers recommendations to improve data management  SciDaC consultants work with researchers to implement recommended solutions  Approach has proven to be very effective thus far 21
  • 22.
    PRIORITY 2 –DATA MANAGEMENT PLANNING  Highest priority of responding to and addressing support needs for funding agency requirements (ie. NSF, others)  Getting a handle on data management as a means of institutional risk management  Coordination of effort across institution 22
  • 23.
    NSF DATA MANAGEMENTPLAN MANDATE  Official mandate became active Jan. 18, 2011  New NSF Directorates/Divisions continue to release and specify guidelines (examples below)  Education and Human Resources (EHR)  Engineering (ENG)  Geological Sciences (GEO)  Mathematical and Physical Sciences (MPS)  Social, Behavioral, and Economic Sciences (SBE)  Researchers continue to be mostly unaware of the mandate and how to prepare a DMP 23
  • 24.
    UVA SCIDAC NSFDMP RESPONSE UVa Library’s Original Request  Develop boilerplate for researchers to use in proposals SciDaC Group’s Response  No boilerplate, successful proposals need customized plans  Our approach involves:  Knowledge across many communities (“translational” opportunities)  Leadership on policy/infrastructure development  Development of a template that simplifies writing the plan Principles  Must be easy for researcher  Must be supportable by available UVA resources/infrastructure  Must be able to be followed-through on if grant is awarded 24
  • 25.
    PRIORITY 3 –INTEGRATION WITH IR  Institutional repository “Libra” (http://libra.virginia.edu)  Built upon Hydra architecture  Three components: open access publications, data, and electronic theses/dissertations  Working on figuring out storage and cost models to support management of big and small data from across institution’s research community  Will provide preservation assurance for data in form of “blobs” or packages (bit preservation, no format migration)  Currently in process of developing user interface/ingestion prototype that addresses needs of small data for release in late July 2011 25
  • 26.
    COLLABORATIONS  Internal  Library / VPR / CIO / OSP  Institutional Repository Team  Kuali Coeus team  External  DMP Tool  DataONE  Conference/professional networks 26
  • 27.
  • 28.
    CHALLENGES  Involving subjectlibrarians?  Gaining institutional buy-in?  Meeting demand? 28
  • 29.
    HOW TO INVOLVESUBJECT LIBRARIANS? UVa Library Staff Model  Scientific Data Consultants  Subject Librarians Current Training Model  Brown Bag Data Curation Discussions  Data Interviews Goals and Objectives  Build Data Literacy  Create Collaborative Opportunities  Establish the Library for Data Preservation 29
  • 30.
    HOW TO GAININSTITUTIONAL BUY-IN?  Regulations are helpful  Partnerships between key stakeholders:  University libraries (UL)  Central IT (CIO)  Research Office (VP for Research)  Sponsored Programs/Research  Strategic investment: take ownership, allocate resources, and demonstrate capability 30
  • 31.
    HOW TO MEETDEMAND?  Time: how to best manage staff time  NSF research support alone is going to be very time consuming (UVA had about 140 proposals over the past year, 44 in November alone)  Funding: work with leaders to find money  Redirection/reallocation of grant overhead dollars  Write-in of library staff on grants  Strategy: decide how to invest  How might units be reorganized?  How do we expand to other disciplines?  How could staff resources and expertise be refocused?  What additional partnerships would add value? 31
  • 32.
    FUTURE DIRECTIONS  Addressingdata management needs of other disciplines across the institution  Integration into formal research proposal process  Broader data management education  Increased funded research project consulting  Technology consulting  Expansion of virtual organization partners and creation of research advisory board  Guiding of policy revision to address new interests in data management and preservation 32
  • 33.
    THANK YOU! Andrew Sallans Headof Strategic Data Initiatives, SciDaC Group University of Virginia Library Email: als9q@virginia.edu Twitter: asallans http://www.lib.virginia.edu/brown/data 33