Understanding the Big Picture of e-Science


Published on

A. Sallans. "Understanding the Big Picture of e-Science." Presented at the 2011 eScience Bootcamp at the University of Virginia's Claude Moore Health Sciences Library. 4 March 2011

  • Be the first to comment

Understanding the Big Picture of e-Science

  1. 1. UNDERSTANDING THEBIG PICTURE OF E-SCIENCEAndrew SallansHead of Strategic Data InitiativesUniversity of Virginia LibraryE-Science BootcampClaude Moore Health Sciences Library, University of Virginia4 March 2011
  2. 2. OUTLINE What it‟s all about Examples Implications UVA Libraries Response (Round 1) 2
  3. 3. WHAT IT‟S ALL ABOUT (AROUND 1999)"e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.""e-Science will change the dynamic of the way science is undertaken." Dr Sir John Taylor Director General of Research Councils, Office of Science and Technology United Kingdom 3 Source: http://webscience.org/person/8.html
  4. 4. WHAT MADE THIS POSSIBLE? Internet/World Wide Web Faster networking (fiber, special research networks, advances in grids) Better storage (higher capacity, faster access, better reliability) Cheap storage (costs keep decreasing) Major funding initiatives Broader interest in collaboration 4
  5. 5. SOME COMMON TERMS Computational science Scientific computing Research computing High-performance computing Cyberscience Cyberinfrastructure 5
  6. 6. CLIMATOLOGY RESEARCHSources:1) Climate Simulation on Cray XT5 “Jaguar” supercomputer, ORNL 6 (http://www.ornl.gov/info/ornlreview/v42_3_09/article02.shtml)2) Cray XT5 “Jaguar” supercomputer, ORNL (http://www.ornl.gov/info/ornlreview/v42_1_09/images/a05_p04_xt5_full.jpg)
  7. 7. LARGE HADRON COLLIDER AT CERN  Circumference: 26,659 meters  Magnets: 9,300  Speed: protons move at 99.9999991% speed of light)  Collisions/second: 600 million  Data produced: equivalent to 100,000 dual layer DVDs per year  LHC Grid: tens of thousands of computers around the world used collectively to analyze data (will take 15 years) 7Source: CERN website (http://cdsweb.cern.ch/record/975468/files/its-2006-003.gif?subformat=icon)
  8. 8. BIOMEDICAL INFORMATICS GRID (CABIG) Launched as test in 2004 Adopted by over 50 NCI-designated cancer centers Focused on:  Connecting scientists and practitioners through a shareable and interoperable infrastructure  Development of standard rules and a common language to more easily share information  Building or adapting tools for collecting, analyzing, integrating, and disseminating information associated with cancer research and care Source: caBIG website, National Cancer Institute (https://cabig.nci.nih.gov/) 8
  9. 9. CITIZEN SCIENCE…THE SOCIAL SIDE 34,617,406 clicks done by 82,931 users! Source: Zooniverse, Real Science Online (http://www.zooniverse.org/home) 9
  10. 10. IMPLICATIONS FOR RESEARCH Greater emphasis on technology Increase in interdisciplinary research and collaboration Often bigger data, with far more complex associated issues (storage, access, expertise, funding, preservation, etc.) Need for innovative approaches and integration into education/curriculum 10
  11. 11. DATA TSUNAMI IDC estimate of about 1.7 zetabytes (1 trillion terabytes) around 2011 ….twice the available spaceSource: 111) The Great Wave off Kanagawa, Katsushika Hokusai. Found on Wikipedia.2) The Diverse and Exploding Digital Universe, IDC, May 2010 (http://www.emc.com/collateral/analyst- reports/diverse-exploding-digital-universe.pdf)
  12. 12. BUT, NOT ALL DATA IS EQUAL…. Source: Long Tail, Wikipedia (http://en.wikipedia.org/wiki/The_Long_Tail) 12
  13. 13. CASE STUDY: UVA LIBRARIES RESPONSE(ROUND 1) Collaboration established around 2005 through discussions between ITC and Library, and impetus of Frye Institute capstones. Research Computing Support services in need of greater visibility, Library seeking ways to support changes in scientific research, collocation provides mutual benefits. In 2006, staff moved to Library locations (Research Computing Lab & Scholars‟ Lab), setup new service points and services. 13
  14. 14. RESEARCH IN THE E-SCIENCE WORLD Heavy use of electronic information resources Work is predominantly done from a lab/office, not in the Library Collaboration is fundamental, but don‟t always know people in other domains Grad students are usually bringing new technology/methods into the team (learning more about grad students in a research study now) 14
  15. 15. IDENTIFIED E-SCIENCE TRENDS Various components  Computationally intensive science  IT/software/infrastructure  Collaboration  Data Often intertwined with Open Access initiatives 15
  16. 16. E-SCIENCE IN OTHER LIBRARIES Purdue University  Focus on data curation  IATUL Conference, June 2010 University of Illinois – Urbana Champaign  Focus on data curation  Summer Institute on Data Curation Cornell University  Metadata consulting services University of New Mexico  Major DataONE grant 16
  17. 17. RESEARCH COMPUTING LAB RESPONSE Aiming to provide support across the entire scientific research data lifecycle Staff with expertise in:  Data  Quantitative data, statistics  Modeling, visualization  Scientific publishing Emphasis on consulting, not drop-off services Partnership with traditional librarians to help ease transition to new support models 17
  18. 18. RCL OUTREACHUniversity Community Speaker series 2006, 2007, 2008 Research 2.0 Symposium Partnerships with courses, other units (ie. MLBS) Short course series each semesterLibrary Community Panel at the ACCS Conference in 2007 Poster at ARL/CNI Forum in 2008 Poster at STS Section of ALA in 2009 18 Journal article in JLA in 2009
  19. 19. SAMPLE RCL CONSULTATIONS STS Undergrad Environmental Justice (2008)  Development of technology solutions for empowering the citizen scientist  Web 2.0 tools, data collection/management  Data analysis Economics Graduate Student (2008/2009)  Airline flight price modeling  Screen scraping, data collection/management  Data analysis Mountain Lake Beetle Project (2009)  Mobile data acquisition/collection solution  Database development/management, programming  Data analysis Archiving of dissertation data (2009)  EVSC student, ModelMaker 4.0 data  Biology student, IDL, Matlab, R code 19
  20. 20. SPECIFICS FOR MEDICAL CENTER At least 600 RCL support requests from Medical Center from October „07 through December „09 Medical Center patrons are heavy users of computational software like Matlab, SAS, LabView Increasing emphasis on collaboration (translational research) Greater attention to open access (NIH policy) Growing interest in areas like image integrity 20
  21. 21. TAKE-AWAYS This is the future Heavily growing space, lots of opportunity Requires big investment and commitment, the biggest being training and priority alignment Libraries and institutions need to make decisions on what to do and what not to do It‟s a culture change for both libraries, institutions, and researchers 21
  22. 22. COMING LATER….(ROUND 2) “Practical Applications of e-Science” in UVA Libraries today 22
  23. 23. QUESTIONS? Please feel free to contact me with questions:  als9q@virginia.edu  434-243-2180  Twitter: asallans 23
  24. 24. ADDITIONAL INFORMATION E-Science Talking Points for ARL Deans and Directors, Elisabeth Jones, University of Washington, October 2008 (http://www.arl.org/rtl/escience/) 24