UNDERSTANDING THE
BIG PICTURE OF E-SCIENCE

Andrew Sallans
Head of Strategic Data Initiatives
University of Virginia Library

E-Science Bootcamp
Claude Moore Health Sciences Library, University of Virginia
4 March 2011
OUTLINE
 What it‟s all about
 Examples

 Implications

 UVA Libraries Response (Round 1)




                                     2
WHAT IT‟S ALL ABOUT (AROUND 1999)
"e-Science is about global collaboration in key areas
  of science, and the next generation of
  infrastructure that will enable it."

"e-Science will change the dynamic of the way
  science is undertaken."

                    Dr Sir John Taylor
                    Director General of Research Councils,
                    Office of Science and Technology
                    United Kingdom
                                                                  3
                    Source: http://webscience.org/person/8.html
WHAT MADE THIS POSSIBLE?
 Internet/World Wide Web
 Faster networking (fiber, special research
  networks, advances in grids)
 Better storage (higher capacity, faster access,
  better reliability)
 Cheap storage (costs keep decreasing)

 Major funding initiatives

 Broader interest in collaboration




                                                    4
SOME COMMON TERMS
 Computational science
 Scientific computing

 Research computing

 High-performance computing

 Cyberscience

 Cyberinfrastructure




                               5
CLIMATOLOGY RESEARCH




Sources:
1) Climate Simulation on Cray XT5 “Jaguar” supercomputer, ORNL                  6
   (http://www.ornl.gov/info/ornlreview/v42_3_09/article02.shtml)
2) Cray XT5 “Jaguar” supercomputer, ORNL
   (http://www.ornl.gov/info/ornlreview/v42_1_09/images/a05_p04_xt5_full.jpg)
LARGE HADRON COLLIDER AT CERN
                                               Circumference: 26,659 meters
                                               Magnets: 9,300
                                               Speed: protons move at
                                                99.9999991% speed of light)
                                               Collisions/second: 600 million
                                               Data produced: equivalent to
                                                100,000 dual layer DVDs per
                                                year
                                               LHC Grid: tens of thousands
                                                of computers around the world
                                                used collectively to analyze
                                                data (will take 15 years)

                                                                                                   7
Source: CERN website (http://cdsweb.cern.ch/record/975468/files/its-2006-003.gif?subformat=icon)
BIOMEDICAL INFORMATICS GRID (CABIG)
 Launched as test in 2004
 Adopted by over 50 NCI-designated cancer centers

 Focused on:
       Connecting scientists and practitioners through a
        shareable and interoperable infrastructure
       Development of standard rules and a common
        language to more easily share information
       Building or adapting tools for collecting, analyzing,
        integrating, and disseminating information associated
        with cancer research and care

    Source: caBIG website, National Cancer Institute (https://cabig.nci.nih.gov/)   8
CITIZEN SCIENCE…THE SOCIAL SIDE




   34,617,406 clicks done by 82,931 users!

 Source: Zooniverse, Real Science Online (http://www.zooniverse.org/home)   9
IMPLICATIONS FOR RESEARCH
 Greater emphasis on technology
 Increase in interdisciplinary research and
  collaboration
 Often bigger data, with far more complex
  associated issues (storage, access, expertise,
  funding, preservation, etc.)
 Need for innovative approaches and integration
  into education/curriculum



                                                   10
DATA TSUNAMI




     IDC estimate of about 1.7 zetabytes (1 trillion terabytes) around 2011
     ….twice the available space
Source:                                                                                                11
1) The Great Wave off Kanagawa, Katsushika Hokusai. Found on Wikipedia.
2) The Diverse and Exploding Digital Universe, IDC, May 2010 (http://www.emc.com/collateral/analyst-
   reports/diverse-exploding-digital-universe.pdf)
BUT, NOT ALL DATA IS EQUAL….




 Source: Long Tail, Wikipedia (http://en.wikipedia.org/wiki/The_Long_Tail)   12
CASE STUDY: UVA LIBRARIES RESPONSE
(ROUND 1)
 Collaboration established around 2005 through
  discussions between ITC and Library, and
  impetus of Frye Institute capstones.
 Research Computing Support services in need of
  greater visibility, Library seeking ways to
  support changes in scientific research, collocation
  provides mutual benefits.
 In 2006, staff moved to Library locations
  (Research Computing Lab & Scholars‟ Lab),
  setup new service points and services.

                                                        13
RESEARCH IN THE E-SCIENCE WORLD
 Heavy use of electronic information resources
 Work is predominantly done from a lab/office, not
  in the Library
 Collaboration is fundamental, but don‟t always
  know people in other domains
 Grad students are usually bringing new
  technology/methods into the team (learning more
  about grad students in a research study now)



                                                      14
IDENTIFIED E-SCIENCE TRENDS
   Various components
     Computationally intensive science
     IT/software/infrastructure
     Collaboration
     Data

   Often intertwined with Open Access initiatives




                                                     15
E-SCIENCE IN         OTHER LIBRARIES
   Purdue University
     Focus on data curation
     IATUL Conference, June 2010

   University of Illinois – Urbana Champaign
     Focus on data curation
     Summer Institute on Data Curation

   Cornell University
       Metadata consulting services
   University of New Mexico
       Major DataONE grant
                                                16
RESEARCH COMPUTING LAB RESPONSE
 Aiming to provide support across the entire
  scientific research data lifecycle
 Staff with expertise in:
     Data
     Quantitative data, statistics
     Modeling, visualization
     Scientific publishing

 Emphasis on consulting, not drop-off services
 Partnership with traditional librarians to help
  ease transition to new support models
                                                    17
RCL OUTREACH
University Community
 Speaker series 2006, 2007, 2008
 Research 2.0 Symposium
 Partnerships with courses, other units (ie.
  MLBS)
 Short course series each semester


Library Community
 Panel at the ACCS Conference in 2007
 Poster at ARL/CNI Forum in 2008
 Poster at STS Section of ALA in 2009
                                                18
 Journal article in JLA in 2009
SAMPLE RCL CONSULTATIONS
   STS Undergrad Environmental Justice (2008)
     Development of technology solutions for empowering the
      citizen scientist
     Web 2.0 tools, data collection/management
     Data analysis
   Economics Graduate Student (2008/2009)
     Airline flight price modeling
     Screen scraping, data collection/management
     Data analysis
   Mountain Lake Beetle Project (2009)
     Mobile data acquisition/collection solution
     Database development/management, programming
     Data analysis
   Archiving of dissertation data (2009)
     EVSC student, ModelMaker 4.0 data
     Biology student, IDL, Matlab, R code                     19
SPECIFICS FOR MEDICAL CENTER
 At least 600 RCL support requests from Medical
  Center from October „07 through December „09
 Medical Center patrons are heavy users of
  computational software like Matlab, SAS,
  LabView
 Increasing emphasis on collaboration
  (translational research)
 Greater attention to open access (NIH policy)

 Growing interest in areas like image integrity



                                                   20
TAKE-AWAYS
 This is the future
 Heavily growing space, lots of opportunity

 Requires big investment and commitment, the
  biggest being training and priority alignment
 Libraries and institutions need to make decisions
  on what to do and what not to do
 It‟s a culture change for both libraries,
  institutions, and researchers



                                                      21
COMING LATER….(ROUND 2)
   “Practical Applications of e-Science” in UVA
    Libraries today




                                                   22
QUESTIONS?
   Please feel free to contact me with questions:
     als9q@virginia.edu
     434-243-2180
     Twitter: asallans




                                                     23
ADDITIONAL INFORMATION
   E-Science Talking Points for ARL Deans and
    Directors, Elisabeth Jones, University of
    Washington, October 2008
    (http://www.arl.org/rtl/escience/)




                                                 24

Understanding the Big Picture of e-Science

  • 1.
    UNDERSTANDING THE BIG PICTUREOF E-SCIENCE Andrew Sallans Head of Strategic Data Initiatives University of Virginia Library E-Science Bootcamp Claude Moore Health Sciences Library, University of Virginia 4 March 2011
  • 2.
    OUTLINE  What it‟sall about  Examples  Implications  UVA Libraries Response (Round 1) 2
  • 3.
    WHAT IT‟S ALLABOUT (AROUND 1999) "e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it." "e-Science will change the dynamic of the way science is undertaken." Dr Sir John Taylor Director General of Research Councils, Office of Science and Technology United Kingdom 3 Source: http://webscience.org/person/8.html
  • 4.
    WHAT MADE THISPOSSIBLE?  Internet/World Wide Web  Faster networking (fiber, special research networks, advances in grids)  Better storage (higher capacity, faster access, better reliability)  Cheap storage (costs keep decreasing)  Major funding initiatives  Broader interest in collaboration 4
  • 5.
    SOME COMMON TERMS Computational science  Scientific computing  Research computing  High-performance computing  Cyberscience  Cyberinfrastructure 5
  • 6.
    CLIMATOLOGY RESEARCH Sources: 1) ClimateSimulation on Cray XT5 “Jaguar” supercomputer, ORNL 6 (http://www.ornl.gov/info/ornlreview/v42_3_09/article02.shtml) 2) Cray XT5 “Jaguar” supercomputer, ORNL (http://www.ornl.gov/info/ornlreview/v42_1_09/images/a05_p04_xt5_full.jpg)
  • 7.
    LARGE HADRON COLLIDERAT CERN  Circumference: 26,659 meters  Magnets: 9,300  Speed: protons move at 99.9999991% speed of light)  Collisions/second: 600 million  Data produced: equivalent to 100,000 dual layer DVDs per year  LHC Grid: tens of thousands of computers around the world used collectively to analyze data (will take 15 years) 7 Source: CERN website (http://cdsweb.cern.ch/record/975468/files/its-2006-003.gif?subformat=icon)
  • 8.
    BIOMEDICAL INFORMATICS GRID(CABIG)  Launched as test in 2004  Adopted by over 50 NCI-designated cancer centers  Focused on:  Connecting scientists and practitioners through a shareable and interoperable infrastructure  Development of standard rules and a common language to more easily share information  Building or adapting tools for collecting, analyzing, integrating, and disseminating information associated with cancer research and care Source: caBIG website, National Cancer Institute (https://cabig.nci.nih.gov/) 8
  • 9.
    CITIZEN SCIENCE…THE SOCIALSIDE 34,617,406 clicks done by 82,931 users! Source: Zooniverse, Real Science Online (http://www.zooniverse.org/home) 9
  • 10.
    IMPLICATIONS FOR RESEARCH Greater emphasis on technology  Increase in interdisciplinary research and collaboration  Often bigger data, with far more complex associated issues (storage, access, expertise, funding, preservation, etc.)  Need for innovative approaches and integration into education/curriculum 10
  • 11.
    DATA TSUNAMI IDC estimate of about 1.7 zetabytes (1 trillion terabytes) around 2011 ….twice the available space Source: 11 1) The Great Wave off Kanagawa, Katsushika Hokusai. Found on Wikipedia. 2) The Diverse and Exploding Digital Universe, IDC, May 2010 (http://www.emc.com/collateral/analyst- reports/diverse-exploding-digital-universe.pdf)
  • 12.
    BUT, NOT ALLDATA IS EQUAL…. Source: Long Tail, Wikipedia (http://en.wikipedia.org/wiki/The_Long_Tail) 12
  • 13.
    CASE STUDY: UVALIBRARIES RESPONSE (ROUND 1)  Collaboration established around 2005 through discussions between ITC and Library, and impetus of Frye Institute capstones.  Research Computing Support services in need of greater visibility, Library seeking ways to support changes in scientific research, collocation provides mutual benefits.  In 2006, staff moved to Library locations (Research Computing Lab & Scholars‟ Lab), setup new service points and services. 13
  • 14.
    RESEARCH IN THEE-SCIENCE WORLD  Heavy use of electronic information resources  Work is predominantly done from a lab/office, not in the Library  Collaboration is fundamental, but don‟t always know people in other domains  Grad students are usually bringing new technology/methods into the team (learning more about grad students in a research study now) 14
  • 15.
    IDENTIFIED E-SCIENCE TRENDS  Various components  Computationally intensive science  IT/software/infrastructure  Collaboration  Data  Often intertwined with Open Access initiatives 15
  • 16.
    E-SCIENCE IN OTHER LIBRARIES  Purdue University  Focus on data curation  IATUL Conference, June 2010  University of Illinois – Urbana Champaign  Focus on data curation  Summer Institute on Data Curation  Cornell University  Metadata consulting services  University of New Mexico  Major DataONE grant 16
  • 17.
    RESEARCH COMPUTING LABRESPONSE  Aiming to provide support across the entire scientific research data lifecycle  Staff with expertise in:  Data  Quantitative data, statistics  Modeling, visualization  Scientific publishing  Emphasis on consulting, not drop-off services  Partnership with traditional librarians to help ease transition to new support models 17
  • 18.
    RCL OUTREACH University Community Speaker series 2006, 2007, 2008  Research 2.0 Symposium  Partnerships with courses, other units (ie. MLBS)  Short course series each semester Library Community  Panel at the ACCS Conference in 2007  Poster at ARL/CNI Forum in 2008  Poster at STS Section of ALA in 2009 18  Journal article in JLA in 2009
  • 19.
    SAMPLE RCL CONSULTATIONS  STS Undergrad Environmental Justice (2008)  Development of technology solutions for empowering the citizen scientist  Web 2.0 tools, data collection/management  Data analysis  Economics Graduate Student (2008/2009)  Airline flight price modeling  Screen scraping, data collection/management  Data analysis  Mountain Lake Beetle Project (2009)  Mobile data acquisition/collection solution  Database development/management, programming  Data analysis  Archiving of dissertation data (2009)  EVSC student, ModelMaker 4.0 data  Biology student, IDL, Matlab, R code 19
  • 20.
    SPECIFICS FOR MEDICALCENTER  At least 600 RCL support requests from Medical Center from October „07 through December „09  Medical Center patrons are heavy users of computational software like Matlab, SAS, LabView  Increasing emphasis on collaboration (translational research)  Greater attention to open access (NIH policy)  Growing interest in areas like image integrity 20
  • 21.
    TAKE-AWAYS  This isthe future  Heavily growing space, lots of opportunity  Requires big investment and commitment, the biggest being training and priority alignment  Libraries and institutions need to make decisions on what to do and what not to do  It‟s a culture change for both libraries, institutions, and researchers 21
  • 22.
    COMING LATER….(ROUND 2)  “Practical Applications of e-Science” in UVA Libraries today 22
  • 23.
    QUESTIONS?  Please feel free to contact me with questions:  als9q@virginia.edu  434-243-2180  Twitter: asallans 23
  • 24.
    ADDITIONAL INFORMATION  E-Science Talking Points for ARL Deans and Directors, Elisabeth Jones, University of Washington, October 2008 (http://www.arl.org/rtl/escience/) 24