E-science involves large-scale collaborative research enabled by new technologies like high-speed networks and cheap data storage. It produces massive amounts of complex data from areas like climate modeling, particle physics experiments, biomedical research grids, and citizen science projects. This represents a major change for research that requires new infrastructure, expertise, and approaches. Universities like UVA are responding by establishing research computing support services in their libraries to help scientists with the computational and data aspects of e-science throughout the research lifecycle.
Text (personal views position statement) to accompany presentation on what research infrastructures really need for data, XLDB-Europe, 8-10th June 2011, Edinburgh
Text (personal views position statement) to accompany presentation on what research infrastructures really need for data, XLDB-Europe, 8-10th June 2011, Edinburgh
Science is witnessing a data revolution. Data are now created by faster and cheaper physical technologies, software tools and digital collaborations. Examples of these include satellite networks, simulation models and social network data. To transform these data successfully into information then into knowledge and finally into wisdom, we need new forms of computational thinking. These may be enabled by building "instruments" that make data comprehensible for the "naked mind" in a similar fashion to the way in which telescopes reveal the universe to the naked eye. These new instruments must be grounded in well-founded principles to ensure they have the fidelity and capacity to transform the complex and large-scale data into comprehensive forms; this demands new data-intensive methods.
Data-intensive refers to huge volumes of data, complex patterns of data integration and analysis and intricate interactions between data and users. Current methods and tools are failing to address data-intensive challenges effectively: they fail for several reasons, all of which are aspects of scalability. I will introduce three main aspects of data-intensive research and show how we are addressing the challenges that arise from the interaction of these aspects. I will make use of results from our interdisciplinary collaborations as examples of solutions to specific challenges that can arise when scaling up intensity.
Federation and Interoperability in the Nectar Research CloudOpenStack
Audience Level
Beginner
Synopsis
The Nectar Research Cloud provides an OpenStack cloud for Australia’s academic researchers. Since its inception in 2012 it has grown steadily to over 30,000 CPUs, with over 10,000 registered users from more than 50 research institutions. It is different to many clouds in being a federation across eight organisations, each of which runs cloud infrastructure in one or more data centres and contributes to a distributed help desk and user support. A Nectar core services team runs centralised cloud services. This presentation will give an overview of the experiences, challenges and benefits of running a federated OpenStack cloud and a short demonstration on using the Nectar cloud. We will also describe some current approaches that are looking to extend this federation to encompass other institutions including some in New Zealand, to extend the infrastructure using commercial cloud providers, and to move towards interoperability with the growing number of international science and research clouds through the new Open Research Cloud initiative.
Speaker Bio
Dr Paul Coddington is a Deputy Director of Nectar, responsible for the Nectar national Research Cloud, and also Deputy Director of eResearch SA. He has over 30 years experience in eResearch including computational science, high performance and distributed computing, cloud computing, software development, and research data management.
I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.
Science Engagement: A Non-Technical Approach to the Technical DivideCybera Inc.
A presentation for the Future of Networking session at the 2014 Cyber Summit by Jason Zurawski, Science Engagement Engineer, ESnet (Lawrence Berkeley National Laboratory).
Making Small Data BIG (UT Austin, March 2016)Kerstin Lehnert
Presentation given at the Texas Advanced Computing Center. It describes the potential of re-using small data for new science, achievements and the challenges to make small data re-usable.
Scott Edmunds slides for class 8 from the HKU Data Curation (module MLIM7350 from the Faculty of Education) course covering science data, medical data and ethics, and the FAIR data principles.
An update on BeSTGRID activity and plans, in particular in preparation for the planned future developments of a unified approach to high performance and distributed computing in NZ.
In this deck from the 2014 HPC User Forum in Seattle, Jack Collins from the National Cancer Institute presents: Genomes to Structures to Function: The Role of HPC.
Watch the video presentation: http://wp.me/p3RLHQ-d28
Mapping e-science, e-social science, and e-research landscape using Webometrics
박한우
영남대학교 언론정보학과 교수
미국 뉴욕주립대 박사
WCU 웹보메트릭스 연구단 사업단장
hanpark@ynu.ac.kr
http://www.hanpark.net
http://english-webometrics.yu.ac.kr
Our changing state: the realities of austerity and devolutionBrowne Jacobson LLP
One year on from our first roundtable and follow up report ‘The Path to Greater Regional Devolution’, the ‘devolution revolution’ has moved on considerably. Since February 2015 we have seen the Government’s Cities and Local Government Devolution Bill receive Royal Assent, a national programme of area-based reviews of post 16 education and training as part of the Government’s ‘skills devolution’ agenda and the announcement that Cornwall is to become the first rural authority in England to agree a devolution deal.
This period of unprecedented change raises a series of complex challenges, risks and concerns that demand further consideration, discussion and debate. Since the May 2015 General Election devolution deals with more than seven areas have been agreed so will local government structures become more confusing after devolution? What effect will this have on accountability? What conflicts will there be between the new combined authorities and existing local authority arrangements? What lessons can we learn from Welsh devolution? The Government has expressed a desire for greater fiscal devolution but is this realistic?
Chaired by Sir Paul Jenkins, the former Treasury Solicitor, our second roundtable on devolution discussed these issues and many more with local and central government leaders, policy influencers and stakeholders including Centre for Cities, Department for Transport, Grant Thornton, Lawyers in Local Government, LGiU, Local Government Ombudsman, Nottinghamshire Fire & Rescue, Staffordshire County Council, The Department for Communities and Local Government, The Financial Times, The National Forest Company and The Welsh Government.
Our second report, Our Changing State: the Realities of Austerity and Devolution, summarises the key themes and thoughts that emerged from the roundtable and proposes a series of recommendations for further discussion and consideration by both local authorities and other key stakeholders as the country continues along the path towards even greater regional devolution.
https://www.brownejacobson.com/training-and-resources/resources/legal-updates/2016/04/the-realities-of-austerity-and-devolution
Science is witnessing a data revolution. Data are now created by faster and cheaper physical technologies, software tools and digital collaborations. Examples of these include satellite networks, simulation models and social network data. To transform these data successfully into information then into knowledge and finally into wisdom, we need new forms of computational thinking. These may be enabled by building "instruments" that make data comprehensible for the "naked mind" in a similar fashion to the way in which telescopes reveal the universe to the naked eye. These new instruments must be grounded in well-founded principles to ensure they have the fidelity and capacity to transform the complex and large-scale data into comprehensive forms; this demands new data-intensive methods.
Data-intensive refers to huge volumes of data, complex patterns of data integration and analysis and intricate interactions between data and users. Current methods and tools are failing to address data-intensive challenges effectively: they fail for several reasons, all of which are aspects of scalability. I will introduce three main aspects of data-intensive research and show how we are addressing the challenges that arise from the interaction of these aspects. I will make use of results from our interdisciplinary collaborations as examples of solutions to specific challenges that can arise when scaling up intensity.
Federation and Interoperability in the Nectar Research CloudOpenStack
Audience Level
Beginner
Synopsis
The Nectar Research Cloud provides an OpenStack cloud for Australia’s academic researchers. Since its inception in 2012 it has grown steadily to over 30,000 CPUs, with over 10,000 registered users from more than 50 research institutions. It is different to many clouds in being a federation across eight organisations, each of which runs cloud infrastructure in one or more data centres and contributes to a distributed help desk and user support. A Nectar core services team runs centralised cloud services. This presentation will give an overview of the experiences, challenges and benefits of running a federated OpenStack cloud and a short demonstration on using the Nectar cloud. We will also describe some current approaches that are looking to extend this federation to encompass other institutions including some in New Zealand, to extend the infrastructure using commercial cloud providers, and to move towards interoperability with the growing number of international science and research clouds through the new Open Research Cloud initiative.
Speaker Bio
Dr Paul Coddington is a Deputy Director of Nectar, responsible for the Nectar national Research Cloud, and also Deputy Director of eResearch SA. He has over 30 years experience in eResearch including computational science, high performance and distributed computing, cloud computing, software development, and research data management.
I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.
Science Engagement: A Non-Technical Approach to the Technical DivideCybera Inc.
A presentation for the Future of Networking session at the 2014 Cyber Summit by Jason Zurawski, Science Engagement Engineer, ESnet (Lawrence Berkeley National Laboratory).
Making Small Data BIG (UT Austin, March 2016)Kerstin Lehnert
Presentation given at the Texas Advanced Computing Center. It describes the potential of re-using small data for new science, achievements and the challenges to make small data re-usable.
Scott Edmunds slides for class 8 from the HKU Data Curation (module MLIM7350 from the Faculty of Education) course covering science data, medical data and ethics, and the FAIR data principles.
An update on BeSTGRID activity and plans, in particular in preparation for the planned future developments of a unified approach to high performance and distributed computing in NZ.
In this deck from the 2014 HPC User Forum in Seattle, Jack Collins from the National Cancer Institute presents: Genomes to Structures to Function: The Role of HPC.
Watch the video presentation: http://wp.me/p3RLHQ-d28
Mapping e-science, e-social science, and e-research landscape using Webometrics
박한우
영남대학교 언론정보학과 교수
미국 뉴욕주립대 박사
WCU 웹보메트릭스 연구단 사업단장
hanpark@ynu.ac.kr
http://www.hanpark.net
http://english-webometrics.yu.ac.kr
Our changing state: the realities of austerity and devolutionBrowne Jacobson LLP
One year on from our first roundtable and follow up report ‘The Path to Greater Regional Devolution’, the ‘devolution revolution’ has moved on considerably. Since February 2015 we have seen the Government’s Cities and Local Government Devolution Bill receive Royal Assent, a national programme of area-based reviews of post 16 education and training as part of the Government’s ‘skills devolution’ agenda and the announcement that Cornwall is to become the first rural authority in England to agree a devolution deal.
This period of unprecedented change raises a series of complex challenges, risks and concerns that demand further consideration, discussion and debate. Since the May 2015 General Election devolution deals with more than seven areas have been agreed so will local government structures become more confusing after devolution? What effect will this have on accountability? What conflicts will there be between the new combined authorities and existing local authority arrangements? What lessons can we learn from Welsh devolution? The Government has expressed a desire for greater fiscal devolution but is this realistic?
Chaired by Sir Paul Jenkins, the former Treasury Solicitor, our second roundtable on devolution discussed these issues and many more with local and central government leaders, policy influencers and stakeholders including Centre for Cities, Department for Transport, Grant Thornton, Lawyers in Local Government, LGiU, Local Government Ombudsman, Nottinghamshire Fire & Rescue, Staffordshire County Council, The Department for Communities and Local Government, The Financial Times, The National Forest Company and The Welsh Government.
Our second report, Our Changing State: the Realities of Austerity and Devolution, summarises the key themes and thoughts that emerged from the roundtable and proposes a series of recommendations for further discussion and consideration by both local authorities and other key stakeholders as the country continues along the path towards even greater regional devolution.
https://www.brownejacobson.com/training-and-resources/resources/legal-updates/2016/04/the-realities-of-austerity-and-devolution
London Zoo is the world’s oldest scientific zoo, and it’s been open to the public since 1847. However, though 2015 saw a small uplift in visitor numbers, the zoo saw itself slip in the rankings in Visit England’s Annual Visitor Attractions survey last year.
So we got our heads together to try and come up with some ideas that could enhance the experience of visiting the zoo, to boost repeat visits as well as driving more customer recommendations.
Navigating Uncertainty when Launching New Ideashopperomatic
Developing innovative products involves confronting a dizzying amount of uncertainty. Here's a talk from Thomson Reuter's Knowledge Worker Series from 1/21/15 on how to best navigate that uncertainty. Lessons gleaned from prod dev at NPR and through dozens of Techstars startups. More here: http://keithhopper.com/uncertainty
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at ScaleAndy Petrella
A talk given at the BioBankCloud conference in Feb 2015 about distributed computing in the contexts of genomics and health.
In this one, we exposed what results we obtained exploring the 1000genomes data using ADAM, followed by an introduction to our scalable GA4GH server implementation built using ADAM, Apache Spark and Play Framework 2.
Copywriter Liz Painter explains what you can do to improve your website and how that can lead to winning more clients. She talks through some of the most common mistakes business owners make on their websites and explain why it’s vitally important to create web copy that’s tailored to your target market.
Coupling Australia’s Researchers to the Global Innovation EconomyLarry Smarr
08.10.10
Fifth Lecture in the
Australian American Leadership Dialogue Scholar Tour
University of Queensland
Title: Coupling Australia’s Researchers to the Global Innovation Economy
Brisbane, Australia
Coupling Australia’s Researchers to the Global Innovation EconomyLarry Smarr
08.10.08
Third Lecture in the
Australian American Leadership Dialogue Scholar Tour
Monash University
Title: Coupling Australia’s Researchers to the Global Innovation Economy
Clayton, Australia
Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Intero...EarthCube
This series of presentations was given at the EarthCube Data Facilities End-User Workshop held January 15-17, 2014 in Washington, DC. This workshop provided a forum to discuss the unique requirements and challenges associated with developing the communication, collaboration, interoperability, and governance structures that will be required to build EarthCube in conjunction with existing and emerging NSF/GEO facilities.
This panel and discussion, specifically, outlined and explained several current concepts in data sharing and interoperability, featuring presentations by:
Paul Morin (UMN): Polar Cyberinfrastructure
Don Middleton (UCAR): Atmospheric/Climate
Kerstin Lehnert (LDEO): Domain Repositories & Physical Samples
David Schindel (CBOL, GRBio): Biological Perspective & Collections
Hank Leoscher (NEON): Observation Networks
Daniel Fuka (Virginia Tech) and Ruth Duerr (NSIDC): Brokering
Ilya Zaslavsky (UCSD): Cross-Domain Interoperability
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...Larry Smarr
08.06.16
Invited Talk
Association of University Research Parks BioParks 2008
"From Discovery to Innovation"
Salk Institute
Title: High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Research Instruments
La Jolla, CA
Biodiversity Informatics: An Interdisciplinary ChallengeBryan Heidorn
"Impacto de la Informática en el Conocimiento de la Biodiversidad: Actualidad y Futuro” at Universidad Nacional de Colombia on August 12, 2011. https://sites.google.com/site/simposioinformaticaicn/home
A presentation given by Manjula Patel (UKOLN) at the Repository Curation Environments (RECURSE) Workshop held at the 4th International Digital Curation Conference, Edinburgh, 1st December 2008,
http://www.dcc.ac.uk/events/dcc-2008/programme/
Disciplinary and institutional perspectives on digital curationMichael Day
Slides from a presentation jointly given by Alexander Ball and Michael Day of UKOLN in a panel session on Scientific Data Curation at the DigCCurr 2009 Conference, Chapel Hill, NC, USA, 2 April 2009
Supplementary presentation slides from a lecture on digital preservation given at the University of the West of England (UWE) as part of the MSc in Library and Library Management, University of the West of England, Frenchay Campus, Bristol, March 10, 2010
Building a Global Collaboration System for Data-Intensive DiscoveryLarry Smarr
11.01.06
Distinguished Lecture
Hawaii International Conference on System Sciences (HICSS-44)
Title: Building a Global Collaboration System for Data-Intensive Discovery
Kauai, HI
08.04.14
Invited Talk
National Astrobiology Institute Executive Council Meeting
Astrobiology Science Conference 2008
Santa Clara Convention Center
Title: High Performance Collaboration
Santa Clara, CA
British Library Datasets Programme
John Kaye - Lead Content Specialist datasets, British Library spoke on the British Library's Datasets programme and the DataCite project
Similar to Understanding the Big Picture of e-Science (20)
Improving Integrity, Transparency, and Reproducibility Through Connection of ...Andrew Sallans
The Center for Open Science (COS) was founded as a non-profit technology start-up in 2013 with the goal of improving transparency and reproducibility by connecting the scholarly workflow. COS achieves this goal through the development of a free, open source web application called the Open Science Framework (OSF), providing features like file sharing and citing, persistent urls, provenance tracking, and automated versioning. Initial workflow API connections focused on storage services and included Figshare, GitHub, Amazon S3, Dropbox, and Dataverse. The team is now working to connect other parts of the workflow with services like DMPTool, Databib/re3data, and Databrary. This session will introduce the core architecture and the problems that it solves, and illustrate how connecting services can benefit everyone involved in supporting the research ecosystem. COS is funded through the generosity of grants from the Laura and John Arnold Foundation, the John Templeton Foundation, the Alfred P. Sloan Foundation, the Association of Research Libraries, and others.
Presented at CNI Fall 2014, Washington, DC.
Open Science Framework (OSF): Presentation and TrainingAndrew Sallans
Presentation Date: December 12, 2013.
Location: UC Berkeley, CA
Presenters: Johanna Cohoon & Andrew Sallans (Center for Open Science)
Center for Open Science website: http://centerforopenscience.org
Berkeley Initiative for Transparency in the Social Sciences website: http://bitss.org/annual-meeting/2013-2/
UVa Library Scientific Data Consulting Group (SciDaC): New Partnerships and...Andrew Sallans
A. Sallans. "UVa Library Scientific Data Consulting Group (SciDaC): New Partnerships and Services to Support Scientific Data in the Library." Presented at the 2011 International Association for Social Science Information Services and Technology.
A. Sallans. "Practical Applications of e-Science." Presented at the 2011 eScience Bootcamp at the University of Virginia's Claude Moore Health Sciences Library. 4 March 2011
NSF Data Management Plan - Implications for LibrariansAndrew Sallans
A. Sallans. "NSF Data Management Plan - Implications for Librarians." Presented at the Science and Technology Section (STS) Hot Topics Discussion Group Meeting of the American Library Association's 2011 Midwinter Meeting. 8 January 2011
NSF Data Management Plan - Implications for Librarians
Understanding the Big Picture of e-Science
1. UNDERSTANDING THE
BIG PICTURE OF E-SCIENCE
Andrew Sallans
Head of Strategic Data Initiatives
University of Virginia Library
E-Science Bootcamp
Claude Moore Health Sciences Library, University of Virginia
4 March 2011
2. OUTLINE
What it‟s all about
Examples
Implications
UVA Libraries Response (Round 1)
2
3. WHAT IT‟S ALL ABOUT (AROUND 1999)
"e-Science is about global collaboration in key areas
of science, and the next generation of
infrastructure that will enable it."
"e-Science will change the dynamic of the way
science is undertaken."
Dr Sir John Taylor
Director General of Research Councils,
Office of Science and Technology
United Kingdom
3
Source: http://webscience.org/person/8.html
4. WHAT MADE THIS POSSIBLE?
Internet/World Wide Web
Faster networking (fiber, special research
networks, advances in grids)
Better storage (higher capacity, faster access,
better reliability)
Cheap storage (costs keep decreasing)
Major funding initiatives
Broader interest in collaboration
4
5. SOME COMMON TERMS
Computational science
Scientific computing
Research computing
High-performance computing
Cyberscience
Cyberinfrastructure
5
7. LARGE HADRON COLLIDER AT CERN
Circumference: 26,659 meters
Magnets: 9,300
Speed: protons move at
99.9999991% speed of light)
Collisions/second: 600 million
Data produced: equivalent to
100,000 dual layer DVDs per
year
LHC Grid: tens of thousands
of computers around the world
used collectively to analyze
data (will take 15 years)
7
Source: CERN website (http://cdsweb.cern.ch/record/975468/files/its-2006-003.gif?subformat=icon)
8. BIOMEDICAL INFORMATICS GRID (CABIG)
Launched as test in 2004
Adopted by over 50 NCI-designated cancer centers
Focused on:
Connecting scientists and practitioners through a
shareable and interoperable infrastructure
Development of standard rules and a common
language to more easily share information
Building or adapting tools for collecting, analyzing,
integrating, and disseminating information associated
with cancer research and care
Source: caBIG website, National Cancer Institute (https://cabig.nci.nih.gov/) 8
9. CITIZEN SCIENCE…THE SOCIAL SIDE
34,617,406 clicks done by 82,931 users!
Source: Zooniverse, Real Science Online (http://www.zooniverse.org/home) 9
10. IMPLICATIONS FOR RESEARCH
Greater emphasis on technology
Increase in interdisciplinary research and
collaboration
Often bigger data, with far more complex
associated issues (storage, access, expertise,
funding, preservation, etc.)
Need for innovative approaches and integration
into education/curriculum
10
11. DATA TSUNAMI
IDC estimate of about 1.7 zetabytes (1 trillion terabytes) around 2011
….twice the available space
Source: 11
1) The Great Wave off Kanagawa, Katsushika Hokusai. Found on Wikipedia.
2) The Diverse and Exploding Digital Universe, IDC, May 2010 (http://www.emc.com/collateral/analyst-
reports/diverse-exploding-digital-universe.pdf)
12. BUT, NOT ALL DATA IS EQUAL….
Source: Long Tail, Wikipedia (http://en.wikipedia.org/wiki/The_Long_Tail) 12
13. CASE STUDY: UVA LIBRARIES RESPONSE
(ROUND 1)
Collaboration established around 2005 through
discussions between ITC and Library, and
impetus of Frye Institute capstones.
Research Computing Support services in need of
greater visibility, Library seeking ways to
support changes in scientific research, collocation
provides mutual benefits.
In 2006, staff moved to Library locations
(Research Computing Lab & Scholars‟ Lab),
setup new service points and services.
13
14. RESEARCH IN THE E-SCIENCE WORLD
Heavy use of electronic information resources
Work is predominantly done from a lab/office, not
in the Library
Collaboration is fundamental, but don‟t always
know people in other domains
Grad students are usually bringing new
technology/methods into the team (learning more
about grad students in a research study now)
14
15. IDENTIFIED E-SCIENCE TRENDS
Various components
Computationally intensive science
IT/software/infrastructure
Collaboration
Data
Often intertwined with Open Access initiatives
15
16. E-SCIENCE IN OTHER LIBRARIES
Purdue University
Focus on data curation
IATUL Conference, June 2010
University of Illinois – Urbana Champaign
Focus on data curation
Summer Institute on Data Curation
Cornell University
Metadata consulting services
University of New Mexico
Major DataONE grant
16
17. RESEARCH COMPUTING LAB RESPONSE
Aiming to provide support across the entire
scientific research data lifecycle
Staff with expertise in:
Data
Quantitative data, statistics
Modeling, visualization
Scientific publishing
Emphasis on consulting, not drop-off services
Partnership with traditional librarians to help
ease transition to new support models
17
18. RCL OUTREACH
University Community
Speaker series 2006, 2007, 2008
Research 2.0 Symposium
Partnerships with courses, other units (ie.
MLBS)
Short course series each semester
Library Community
Panel at the ACCS Conference in 2007
Poster at ARL/CNI Forum in 2008
Poster at STS Section of ALA in 2009
18
Journal article in JLA in 2009
19. SAMPLE RCL CONSULTATIONS
STS Undergrad Environmental Justice (2008)
Development of technology solutions for empowering the
citizen scientist
Web 2.0 tools, data collection/management
Data analysis
Economics Graduate Student (2008/2009)
Airline flight price modeling
Screen scraping, data collection/management
Data analysis
Mountain Lake Beetle Project (2009)
Mobile data acquisition/collection solution
Database development/management, programming
Data analysis
Archiving of dissertation data (2009)
EVSC student, ModelMaker 4.0 data
Biology student, IDL, Matlab, R code 19
20. SPECIFICS FOR MEDICAL CENTER
At least 600 RCL support requests from Medical
Center from October „07 through December „09
Medical Center patrons are heavy users of
computational software like Matlab, SAS,
LabView
Increasing emphasis on collaboration
(translational research)
Greater attention to open access (NIH policy)
Growing interest in areas like image integrity
20
21. TAKE-AWAYS
This is the future
Heavily growing space, lots of opportunity
Requires big investment and commitment, the
biggest being training and priority alignment
Libraries and institutions need to make decisions
on what to do and what not to do
It‟s a culture change for both libraries,
institutions, and researchers
21
23. QUESTIONS?
Please feel free to contact me with questions:
als9q@virginia.edu
434-243-2180
Twitter: asallans
23
24. ADDITIONAL INFORMATION
E-Science Talking Points for ARL Deans and
Directors, Elisabeth Jones, University of
Washington, October 2008
(http://www.arl.org/rtl/escience/)
24