NSF SI2 program discussion at 2014 SI2 PI meetingDaniel S. Katz
This document discusses software as infrastructure for science and engineering research. It outlines how software is essential to many areas of science, with about half of recent science papers involving software-intensive projects. It also discusses how "long-tail" scientists need advanced infrastructure to handle large data and simulations. The document notes challenges around larger teams, more data and complex systems, and changing hardware and software. It positions software as a critical part of cyberinfrastructure and outlines NSF programs like SI2 and CDS&E that support development of sustainable scientific software infrastructure.
The document summarizes the 3DPAS theme of the e-Science Institute, which focuses on dynamic distributed data-intensive programming systems and applications. It discusses 14 example applications in different domains that were analyzed in terms of their data, infrastructure usage, and dynamic properties. It provides more detailed descriptions of 6 applications, covering their data sources, processing workflows, and current and future infrastructure usage. The goal of the 3DPAS theme is to understand the unique challenges of data-intensive applications and identify requirements to support future exascale applications involving distributed and dynamic data and computing.
Overview of XSEDE and Introduction to XSEDE 2.0 and BeyondJohn Towns
This presentation will briefly review XSEDE, its past mission and accomplishments, and give insight into the direction and vision for the second round of XSEDE.
The document provides an annual report for the Advanced Community Information Systems (ACIS) group at RWTH Aachen University from October 2013 to September 2014. It summarizes the group's research projects, achievements, community activities, software demonstrations, publications, and theses completed during this period. The group conducted research on mobile community information systems and technology enhanced learning, involved in community services like editorial boards and conference organization, and engaged in open source software development.
The Social Semantic Server: A Flexible Framework to Support Informal Learning...tobold
The document describes the Social Semantic Server (SSS), a flexible framework developed to support informal learning in workplace settings. The SSS was designed based on theories of distributed cognition and meaning making to help learners interact through shared digital artifacts. It implements a service-oriented architecture with various microservices to integrate different learning tools. Examples of tools built on the SSS include Bits & Pieces for sensemaking experiences, KnowBrain for collaborative discussions, and Bookmarker/Attacher for exploring online topics. The SSS aims to provide a technical infrastructure that can capture workplace learning interactions and support the social construction of shared meaning.
International Symposium NLHPC 2013: Innovation at the frontier of HPC
Title: XSEDE: an ecosystem of advanced digital services accelerating scientific discovery
Abstract:
The XSEDE program (Extreme Science and Engineering Discovery Environment) has recently entered its third year of operation. In this talk we will discuss the vision, mission and goals of this project and some of the distinguishing characteristics of the program. This will be accompanied by a review of current status and look ahead at where the program is headed over the next several years.
The document summarizes a presentation about using the Hydra framework to build an institutional repository at the University of Hull. Some key points:
- Hydra allows the repository to support different types of content through customizable templates and handle relationships between items.
- The repository has been used to archive research outputs, events, student works, and experimental data from the history department.
- Customizations were made to integrate maps, DOIs, and additional metadata fields for different data management needs.
- The repository provides a platform for data preservation and access, helping the university comply with research policies like those from funders.
SPARC Repositories conference in Baltimore - Nov 2010Jisc
1. The document discusses the reasons for and vision of creating a global network of repositories to openly share knowledge and data.
2. Key reasons for a global network include enabling open access to information, supporting science through linked data, and aligning with universities' responsibilities to the public.
3. The ideal vision is to build socio-technical infrastructure similar to what was created in the 1880s to support electricity, in order to manage and share linked, open, and trusted data globally through repository networks.
NSF SI2 program discussion at 2014 SI2 PI meetingDaniel S. Katz
This document discusses software as infrastructure for science and engineering research. It outlines how software is essential to many areas of science, with about half of recent science papers involving software-intensive projects. It also discusses how "long-tail" scientists need advanced infrastructure to handle large data and simulations. The document notes challenges around larger teams, more data and complex systems, and changing hardware and software. It positions software as a critical part of cyberinfrastructure and outlines NSF programs like SI2 and CDS&E that support development of sustainable scientific software infrastructure.
The document summarizes the 3DPAS theme of the e-Science Institute, which focuses on dynamic distributed data-intensive programming systems and applications. It discusses 14 example applications in different domains that were analyzed in terms of their data, infrastructure usage, and dynamic properties. It provides more detailed descriptions of 6 applications, covering their data sources, processing workflows, and current and future infrastructure usage. The goal of the 3DPAS theme is to understand the unique challenges of data-intensive applications and identify requirements to support future exascale applications involving distributed and dynamic data and computing.
Overview of XSEDE and Introduction to XSEDE 2.0 and BeyondJohn Towns
This presentation will briefly review XSEDE, its past mission and accomplishments, and give insight into the direction and vision for the second round of XSEDE.
The document provides an annual report for the Advanced Community Information Systems (ACIS) group at RWTH Aachen University from October 2013 to September 2014. It summarizes the group's research projects, achievements, community activities, software demonstrations, publications, and theses completed during this period. The group conducted research on mobile community information systems and technology enhanced learning, involved in community services like editorial boards and conference organization, and engaged in open source software development.
The Social Semantic Server: A Flexible Framework to Support Informal Learning...tobold
The document describes the Social Semantic Server (SSS), a flexible framework developed to support informal learning in workplace settings. The SSS was designed based on theories of distributed cognition and meaning making to help learners interact through shared digital artifacts. It implements a service-oriented architecture with various microservices to integrate different learning tools. Examples of tools built on the SSS include Bits & Pieces for sensemaking experiences, KnowBrain for collaborative discussions, and Bookmarker/Attacher for exploring online topics. The SSS aims to provide a technical infrastructure that can capture workplace learning interactions and support the social construction of shared meaning.
International Symposium NLHPC 2013: Innovation at the frontier of HPC
Title: XSEDE: an ecosystem of advanced digital services accelerating scientific discovery
Abstract:
The XSEDE program (Extreme Science and Engineering Discovery Environment) has recently entered its third year of operation. In this talk we will discuss the vision, mission and goals of this project and some of the distinguishing characteristics of the program. This will be accompanied by a review of current status and look ahead at where the program is headed over the next several years.
The document summarizes a presentation about using the Hydra framework to build an institutional repository at the University of Hull. Some key points:
- Hydra allows the repository to support different types of content through customizable templates and handle relationships between items.
- The repository has been used to archive research outputs, events, student works, and experimental data from the history department.
- Customizations were made to integrate maps, DOIs, and additional metadata fields for different data management needs.
- The repository provides a platform for data preservation and access, helping the university comply with research policies like those from funders.
SPARC Repositories conference in Baltimore - Nov 2010Jisc
1. The document discusses the reasons for and vision of creating a global network of repositories to openly share knowledge and data.
2. Key reasons for a global network include enabling open access to information, supporting science through linked data, and aligning with universities' responsibilities to the public.
3. The ideal vision is to build socio-technical infrastructure similar to what was created in the 1880s to support electricity, in order to manage and share linked, open, and trusted data globally through repository networks.
XSEDE: an ecosystem of advanced digital services accelerating scientific disc...John Towns
XSEDE (eXtreme Digital) is a project that coordinates and provides access to advanced digital services and cyberinfrastructure resources to accelerate scientific discovery. It aims to enhance researcher productivity by providing seamless access to computing resources, expertise, and services. XSEDE integrates resources from various institutions and locations to form a distributed cyberinfrastructure ecosystem for researchers. It supports over $767 million in research annually and has enabled over 10,600 publications.
Supporting Research Communities with XSEDEJohn Towns
XSEDE is a major research infrastructure with collaborations worldwide supporting thousands of researchers across a wide range of domains. XSEDE has taken an integrative and holistic approach to supporting researchers in the use of the varying resources and services available via XSEDE. This presentation will briefly review XSEDE and its vision and provide a discussion of the efforts within XSEDE targeted at supporting research communities.
Presentation given by Stuart Macdonald at the International Workshop on ICT and e-Knowledge for the Developing World in Shanghai International Convention Center, Pudong, Shanghai.
Slides | Research data literacy and the libraryColleen DeLory
Slides from the Dec. 8, 2016 Library Connect webinar "Research data literacy and the library" with Sarah Wright, Christian Lauersen and Anita de Waard. See the full webinar at: http://libraryconnect.elsevier.com/library-connect-webinars?commid=226043
This document summarizes a presentation on emerging technologies given by Robert McDonald. It discusses bleeding edge vs leading edge technologies, highlights several technologies on Gartner's 2011 education hype cycle including cloud computing and mobile learning, and explores trends in areas like business intelligence and future technologies for higher education. The presentation provides an overview of new initiatives and considerations for emerging technologies.
This slide is prepared for the sole purpose of filling up the survey .
All images were taken from google, and information from eresearchSA.edu.au
Survey Link : http://tinyurl.com/c2uoarm (google docs)
This document summarizes research computing needs and resources at Florida A&M University. It finds that FAMU lacks central organization and sufficient resources for research computing. It proposes creating a research computing director position, obtaining membership in state and national research networks, expanding the campus computer cluster, and establishing a computational science program to develop student and faculty skills. Additional infrastructure needs include upgrading the campus network and developing a centralized campus data center and cloud computing access. Meeting these needs would help FAMU build competitive research programs and train students for the 21st century workforce.
Hull presentation to Fedora UK&I meeting, 21st March 2013Chris Awre
This document discusses the Hydra repository system implemented at the University of Hull. It provides an overview of Hydra and how it has been used at Hull to create a multi-purpose institutional repository based on Fedora. Some key points:
- Hydra was initially launched at Hull in 2011/2012 and has been successfully established as the institutional repository solution, though upgrades have been challenging.
- Hydra allows the repository to support different content types through customizable models/templates and provides a single interface for users.
- Hull is looking to further develop the repository by upgrading to the latest Hydra version, improving image management, and integrating library and research systems.
- The repository has been used to support data
XSEDE (Extreme Science and Engineering Discovery Environment) is a digital infrastructure that provides researchers with integrated advanced computing, data, and visualization resources. It aims to enhance scientific productivity through access to these resources and expert support services. XSEDE involves multiple partner institutions and over $100 million in computing resources. It seeks to support open science, enable new multidisciplinary collaborations, and help tackle society's grand challenges.
Some facts and figures about JISC digitisation impactPaolaMarchionni
The content of these slides (or better, the great majority of it) derives from an initial analysis of the results of a survey the JISC Content team circulated among previously funded projects in the areas of digitisation and content. Comments to each slide have been incorporated into the slides, as they are quite extensive. The survey aimed to find out more about how digitised collections were being used and the impact such projects have had on their hosting institutions and more broadly.
High Performance Cyberinfrastructure and Data ServicesJerry Sheehan
The document summarizes the high performance computing, networking, and data services available through the Information Technology Center at Montana State University. It discusses the university's wide area network connectivity, science DMZ for improved data transfer, use of Globus for large data transfer, network performance testing results, Hyalite high performance computing cluster, CHAMP cluster for student use, participation in XSEDE and other national programs, research data census and needs, and new research data services collaboration between the ITC and library.
Slides from NITLE Digital Scholarship Seminar: National Perspective, Jennifer Serventi, Senior Program Officer, Office of Digital Humanities, National Endowment for the Humanities
The document provides information about research data management (RDM) services and initiatives at the University of Edinburgh. It describes the EDINA National Data Centre and Data Library, which provide online resources and data management support. It outlines several JISC-funded RDM projects undertaken by the Data Library, including building the Edinburgh DataShare repository. It also summarizes the Research Data MANTRA training module and the university's RDM roadmap, which lays out a multi-phase plan to improve RDM support and services by 2015 in line with funder requirements.
The document summarizes the activities of EDINA and the Data Library at the University of Edinburgh related to research data management. It describes EDINA as a national data center that provides online resources for education and research. The Data Library assists university researchers with discovering, accessing, using and managing research datasets. It also outlines several projects the Data Library is involved in to develop training, policies and services to support best practices in research data management according to funder requirements. This includes developing an institutional research data management roadmap to help the university meet funder expectations by 2015.
UK e-Infrastructure: Widening Access, Increasing ParticipationNeil Chue Hong
A talk given at the ICHEC Annual Seminar by Neil Chue Hong, reflecting on the rise of Grid and Web 2.0, and how this might enable increased participation and use of computing infrastructure for e-Science and research.
Learning Analytics and Sensemaking in Digital Learning Ecosystems - Examples ...tobold
Presentation given at the Seminar "Opportunities and Challenges of Learning with Technologies: Evidence-based Education" at the Permanent Representation of Estonia to the EU on 12 November 2014 in Brussels.
The document is a presentation on Massachusetts tidelands law given to prospective clients. It provides an agenda that covers the history of tidelands law from Roman law to colonial ordinances to modern cases. It discusses how neighbors and owners view beach use and ownership, tracing the legal rights and divisions of tidelands from early English common law through various colonial ordinances and modern legislation. The presentation aims to explain the complex legal issues around tidelands ownership and recent related court cases in Massachusetts.
Opinions on the State of Production Distributed Infrastructure (PDI)Daniel S. Katz
This document discusses the state of production distributed infrastructure (PDI) and open challenges. It describes three types of existing PDIs - academic/public for science, academic/public for research, and commercial. Key open challenges include measuring delivered science, developing integrated infrastructure and tools, representing small users, the role of virtualization in high-performance computing, and defining an overall vision and architecture with interfaces. The path forward requires a single agreed upon vision and metrics to measure progress towards enabling maximum science delivery.
XSEDE: an ecosystem of advanced digital services accelerating scientific disc...John Towns
XSEDE (eXtreme Digital) is a project that coordinates and provides access to advanced digital services and cyberinfrastructure resources to accelerate scientific discovery. It aims to enhance researcher productivity by providing seamless access to computing resources, expertise, and services. XSEDE integrates resources from various institutions and locations to form a distributed cyberinfrastructure ecosystem for researchers. It supports over $767 million in research annually and has enabled over 10,600 publications.
Supporting Research Communities with XSEDEJohn Towns
XSEDE is a major research infrastructure with collaborations worldwide supporting thousands of researchers across a wide range of domains. XSEDE has taken an integrative and holistic approach to supporting researchers in the use of the varying resources and services available via XSEDE. This presentation will briefly review XSEDE and its vision and provide a discussion of the efforts within XSEDE targeted at supporting research communities.
Presentation given by Stuart Macdonald at the International Workshop on ICT and e-Knowledge for the Developing World in Shanghai International Convention Center, Pudong, Shanghai.
Slides | Research data literacy and the libraryColleen DeLory
Slides from the Dec. 8, 2016 Library Connect webinar "Research data literacy and the library" with Sarah Wright, Christian Lauersen and Anita de Waard. See the full webinar at: http://libraryconnect.elsevier.com/library-connect-webinars?commid=226043
This document summarizes a presentation on emerging technologies given by Robert McDonald. It discusses bleeding edge vs leading edge technologies, highlights several technologies on Gartner's 2011 education hype cycle including cloud computing and mobile learning, and explores trends in areas like business intelligence and future technologies for higher education. The presentation provides an overview of new initiatives and considerations for emerging technologies.
This slide is prepared for the sole purpose of filling up the survey .
All images were taken from google, and information from eresearchSA.edu.au
Survey Link : http://tinyurl.com/c2uoarm (google docs)
This document summarizes research computing needs and resources at Florida A&M University. It finds that FAMU lacks central organization and sufficient resources for research computing. It proposes creating a research computing director position, obtaining membership in state and national research networks, expanding the campus computer cluster, and establishing a computational science program to develop student and faculty skills. Additional infrastructure needs include upgrading the campus network and developing a centralized campus data center and cloud computing access. Meeting these needs would help FAMU build competitive research programs and train students for the 21st century workforce.
Hull presentation to Fedora UK&I meeting, 21st March 2013Chris Awre
This document discusses the Hydra repository system implemented at the University of Hull. It provides an overview of Hydra and how it has been used at Hull to create a multi-purpose institutional repository based on Fedora. Some key points:
- Hydra was initially launched at Hull in 2011/2012 and has been successfully established as the institutional repository solution, though upgrades have been challenging.
- Hydra allows the repository to support different content types through customizable models/templates and provides a single interface for users.
- Hull is looking to further develop the repository by upgrading to the latest Hydra version, improving image management, and integrating library and research systems.
- The repository has been used to support data
XSEDE (Extreme Science and Engineering Discovery Environment) is a digital infrastructure that provides researchers with integrated advanced computing, data, and visualization resources. It aims to enhance scientific productivity through access to these resources and expert support services. XSEDE involves multiple partner institutions and over $100 million in computing resources. It seeks to support open science, enable new multidisciplinary collaborations, and help tackle society's grand challenges.
Some facts and figures about JISC digitisation impactPaolaMarchionni
The content of these slides (or better, the great majority of it) derives from an initial analysis of the results of a survey the JISC Content team circulated among previously funded projects in the areas of digitisation and content. Comments to each slide have been incorporated into the slides, as they are quite extensive. The survey aimed to find out more about how digitised collections were being used and the impact such projects have had on their hosting institutions and more broadly.
High Performance Cyberinfrastructure and Data ServicesJerry Sheehan
The document summarizes the high performance computing, networking, and data services available through the Information Technology Center at Montana State University. It discusses the university's wide area network connectivity, science DMZ for improved data transfer, use of Globus for large data transfer, network performance testing results, Hyalite high performance computing cluster, CHAMP cluster for student use, participation in XSEDE and other national programs, research data census and needs, and new research data services collaboration between the ITC and library.
Slides from NITLE Digital Scholarship Seminar: National Perspective, Jennifer Serventi, Senior Program Officer, Office of Digital Humanities, National Endowment for the Humanities
The document provides information about research data management (RDM) services and initiatives at the University of Edinburgh. It describes the EDINA National Data Centre and Data Library, which provide online resources and data management support. It outlines several JISC-funded RDM projects undertaken by the Data Library, including building the Edinburgh DataShare repository. It also summarizes the Research Data MANTRA training module and the university's RDM roadmap, which lays out a multi-phase plan to improve RDM support and services by 2015 in line with funder requirements.
The document summarizes the activities of EDINA and the Data Library at the University of Edinburgh related to research data management. It describes EDINA as a national data center that provides online resources for education and research. The Data Library assists university researchers with discovering, accessing, using and managing research datasets. It also outlines several projects the Data Library is involved in to develop training, policies and services to support best practices in research data management according to funder requirements. This includes developing an institutional research data management roadmap to help the university meet funder expectations by 2015.
UK e-Infrastructure: Widening Access, Increasing ParticipationNeil Chue Hong
A talk given at the ICHEC Annual Seminar by Neil Chue Hong, reflecting on the rise of Grid and Web 2.0, and how this might enable increased participation and use of computing infrastructure for e-Science and research.
Learning Analytics and Sensemaking in Digital Learning Ecosystems - Examples ...tobold
Presentation given at the Seminar "Opportunities and Challenges of Learning with Technologies: Evidence-based Education" at the Permanent Representation of Estonia to the EU on 12 November 2014 in Brussels.
The document is a presentation on Massachusetts tidelands law given to prospective clients. It provides an agenda that covers the history of tidelands law from Roman law to colonial ordinances to modern cases. It discusses how neighbors and owners view beach use and ownership, tracing the legal rights and divisions of tidelands from early English common law through various colonial ordinances and modern legislation. The presentation aims to explain the complex legal issues around tidelands ownership and recent related court cases in Massachusetts.
Opinions on the State of Production Distributed Infrastructure (PDI)Daniel S. Katz
This document discusses the state of production distributed infrastructure (PDI) and open challenges. It describes three types of existing PDIs - academic/public for science, academic/public for research, and commercial. Key open challenges include measuring delivered science, developing integrated infrastructure and tools, representing small users, the role of virtualization in high-performance computing, and defining an overall vision and architecture with interfaces. The path forward requires a single agreed upon vision and metrics to measure progress towards enabling maximum science delivery.
Perspectives on Undergraduate Education in Parallel and Distributed ComputingDaniel S. Katz
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise boosts blood flow and levels of serotonin and endorphins which elevate mood and may help prevent mental illness.
NSF SI2 program discussion at 2013 SI2 PI meetingDaniel S. Katz
This document discusses software infrastructure challenges and opportunities in science. It notes that software is essential to much of modern science and is a form of infrastructure. It outlines NSF's vision and strategies for supporting software infrastructure through the CIF21 initiative and specific programs like SI2, CDS&E, and XPS. The document discusses the SI2 program's activities in supporting software elements, frameworks, and institutes. It raises general questions about supporting existing infrastructure, deciding when to stop support, encouraging reuse, measuring impact, and supporting software developer careers.
Advancing Science through Coordinated CyberinfrastructureDaniel S. Katz
How local, regional, and national cyberinfrastructure can be coordinated and linked to advance science and engineering, based on experiences and lessons from the Center for Computation & Technology at LSU (ideas, funding, implementation), plus some thoughts on what might be done differently if we were starting today. Presented at First Workshop - Center for Computational Engineering & Sciences, Unicamp, Campinas, Brazil 10 APR 2014
Discussing Software Citation and related topics at Workshop on Data and Software Citation (June 6-7 at Harvard Medical School, http://www.software4data.com/#!nsf-workshop/jghgb)
US University Research Funding, Peer Reviews, and MetricsDaniel S. Katz
My part of the "Digital Science Webinar: Articulating Research Impact – Strategies from Around the Globe" (http://www.digital-science.com/events/digital-science-webinar-articulating-research-impact-strategies-from-around-the-globe/)
Daniel S. Katz will discuss how reviewers at the National Science Foundation (USA) consider the “intellectual merit” and “broader impacts” criteria for funding and in particular how metrics might help applicants understand their impacts in these areas.Dan will also talk about how reviewers might use qualitative and quantitative altmetrics data to inform their peer reviews for grant applications. He will address many of the salient questions around this use of metrics, for example, do reviewers take metrics seriously and what types of metrics are of most value to them?
Working towards Sustainable Software for Science (an NSF and community view)Daniel S. Katz
This document discusses challenges and opportunities for developing sustainable software for science. It notes that software is increasingly important for science but current practices and incentives do not support long-term sustainability. The document summarizes discussions from the Working Towards Sustainable Software for Science conference, which identified key issues around developing sustainable software, best practices, policies around credit and careers, and building supportive communities. It proposes that better measuring contributions to software could help address incentives, career paths, and sustainability of software over the long term.
A description of software as infrastructure at NSF, and how Apache projects may be similar. What lessons can be shared from one organization to the other? How does science software compare with more general software?
Scientific Software Challenges and Community ResponsesDaniel S. Katz
a talk given at RTI International on 7 December 2015, discussing 12 scientific software challenges and how the scientific software community is responding to them
What is eScience, and where does it go from here?Daniel S. Katz
eScience has evolved from focusing on global scientific collaborations enabled by distributed computing infrastructure to emphasizing joint advances in digital infrastructure and how that infrastructure enables new research. This symbiotic relationship between research and infrastructure development could be called Research and Infrastructure Development Symbiosis (RaIDS). Going forward, RaIDS conferences should focus on improving communication between infrastructure developers and researchers to facilitate new collaborations, ensure research publications appropriately attribute enabling infrastructure advances, and standardize catalogs of available infrastructure and research challenges.
Birgit Plietzsch “RDM within research computing support” SALCTG June 2013SALCTG
An overview of Research Data Management: the research process from developing ideas to preservation of data; funder perspectives, the impact on the wider service, Data Asset Frameworks, preservation and access, and cost implications.
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube
This webinar features project overviews of all EarthCube Awards (Building Blocks, Research Coordination Networks, Conceptual Designs, and Test Governance), followed by a call for involvement, and a Q&A session.
Agenda:
EarthCube Awards – Project Overviews
1.. EarthCube Web Services (Building Block)
2. EC3: Earth-Centered Community for Cyberinfrastructure (RCN)
3. GeoSoft (Building Block)
4. Specifying and Implementing ODSIP (Building Block)
5. A Broker Framework for Next Generation Geoscience (BCube) (Building Block)
6. Integrating Discrete and Continuous Data (Building Block)
7. EAGER: Collaborative Research (Building Block)
8. A Cognitive Computer Infrastructure for Geoscience (Building Block)
9. Earth System Bridge (Building Block)
10. CINERGI – Community Inventory of EC Resources for Geoscience Interoperability (BB)
11. Building a Sediment Experimentalist Network (RCN)
12. C4P: Collaboration and Cyberinfrastructure for Paleogeosciences (RCN)
13. Developing a Data-Oriented Human-centric Enterprise for Architecture (CD)
14. Enterprise Architecture for Transformative Research and Collaboration (CD)
15. EC Test Enterprise Governance: An Agile Approach (Test Governance)
A Call for Involvement!
The document discusses open data initiatives and tools for data sharing. It describes projects from the EDINA National Data Centre, DISC-UK DataShare project which investigated legal and technical issues around research data sharing, and tools for visualizing and sharing numeric and spatial data online like Many Eyes, Gapminder and OpenStreetMap. It also covers barriers to data sharing, harnessing collective intelligence through open science, and citizens contributing geographic data through tools like geograph.
Scientific Software Innovation Institutes (S2I2s) as part of NSF’s SI2 programDaniel S. Katz
This talk, presented at a computational chemistry institute conceptualization project (https://sites.google.com/site/s2i2biomolecular/), discusses a view Scientific Software Innovation Institutes, as part of NSF's Software Infrastructure for Sustained Innovation (SI2) program
This document provides an overview of the XSEDE project from its director John Towns. It discusses what XSEDE is, highlights from the past year including new resources and services added, the results of an annual review, and objectives for the coming year. The goals of XSEDE are outlined, including deepening the impact of cyberinfrastructure, preparing researchers, collaborating with institutions, creating an open environment, expanding capabilities, and raising awareness. Challenges faced as a distributed virtual organization are also noted.
SGCI - URSSI - Research Software Engineers, Science Gateway Developers and Cy...Sandra Gesing
The conceptualization of the US Research Software Sustainability Institute (URSSI) just received funding in December 2017 and aims at building the focal point for RSEs in the US similar to SSI in the UK. The Science Gateways Community Institute (SGCI), opened in August 2016, provides free resources, services, experts, and ideas for creating and sustaining science gateways on national and international level. Science gateways – also called virtual research environments or virtual labs – allow science and engineering communities to access shared data, software, computing services, instruments, and other resources specific to their disciplines and use them also in teaching environments. Especially the goals of the workforce development and incubator services have a broad overlap with RSE initiatives to improve career paths of developers and building on-campus developer teams. ACI-REFs (Advanced Cyberinfrastructure Research and Education Facilitators) is a synonym for RSEs and the goal of the project and the trainings aims also at building a network and training the trainers for efficient research software support. The talk will give an overview on the diverse initiatives and highlights the international collaboration possibilities.
Data-intensive bioinformatics on HPC and CloudOla Spjuth
The document discusses data-intensive bioinformatics and challenges with analyzing large genomic datasets on high performance computing (HPC) resources. It summarizes that storage is the biggest challenge as sequencing projects generate very large amounts of data and users do not clean up data. The strategies discussed to address this include assessing costs of storage and analysis upfront, limiting project lifetimes, moving to tiered storage, and improving efficiency. It also discusses using cloud computing resources through virtual clusters and containers to enable flexible, on-demand access and pay-per-use pricing models. Scientific workflows and microservices approaches are presented as ways to automate and orchestrate large-scale genomic analyses on distributed computing resources.
SGCI - S2I2: Science Gateways Community InstituteSandra Gesing
This document discusses science gateways and the Science Gateways Community Institute (SGCI). Science gateways provide access to advanced computing resources, instruments, data, software, and collaboration tools to help researchers tackle complex science questions. The SGCI aims to support science gateways through expertise in areas like technology planning, business planning, security, sustainability, and evaluation. It offers incubator services, extended developer support, and brings together the science gateway community.
Supporting Research Communities with XSEDEJohn Towns
XSEDE is a major research infrastructure in the United States with collaborations worldwide supporting thousands of researchers across a wide range of domains. XSEDE has taken an integrative and holistic approach to supporting researchers in the use of the varying resources and services available via XSEDE. This presentation will briefly review XSEDE and its vision and provide a discussion of the efforts within XSEDE targeted at supporting research communities with a focus on connections to campus efforts.
Institutional repositories capture, preserve, and provide access to the intellectual output of an institution. They consist of formally organized and managed collections of digital content generated by faculty, staff, and students. Institutional repositories allow for the dissemination of knowledge outside the institution, complement traditional forms of publication, and make works visible to colleagues and potential employers or funders. They contribute to an institution's prestige by managing and preserving relevant information that would otherwise remain scattered or inaccessible.
Presenting the following paper “Science Gateways: The Long Road to the Birth of an Institute” by Sandra Gesing, Nancy Wilkins-Diehr, Maytal Dahan, Katherine Lawrence, Michael Zentner, Marlon Pierce, Linda Hayden, Suresh Marru at HICSS50 Conference.
Data-intensive applications on cloud computing resources: Applications in lif...Ola Spjuth
Presentation at the de.NBI 2017 symposium “The Future Development of Bioinformatics in Germany and Europe” held at the Center for Interdisciplinary Research (ZiF) of Bielefeld University, October 23-25, 2017.
https://www.denbi.de/symposium2017
SGCI Science Gateways: Ushering in a New Era of Sustainability Sandra Gesing
The computational landscape has never so fast evolved like in the last decade. Computational scientific methods tackle an increasing breadth and diversity of topics – analyzing data on a large scale and accessing high-performance computing infrastructures, cutting-edge hardware and instruments. Novel technologies such as next-gen sequencing or the Square Kilometre Array telescope, the world largest radio telescope, have evolved, which allow creating data in exascale dimension. While the availability of this data salvage to find answers for research questions, which would not have been feasible before, the amount of data creates new challenges, which obviously need novel computational solutions. Such novel solutions require integrative approaches for multidisciplinary teams across geographical boundaries, which improve usability of scientific methods tailored to the target user communities and aim at achieving reproducibility of science. The goal of science gateways, also called virtual research environments or virtual laboratories, are following exactly this goal to provide an easy-to-use end-to-end solution hiding the complex underlying infrastructure. They support researchers with intuitive user interfaces to focus on their research question instead of becoming acquainted with technological details.
Science gateways are often developed by research teams, who are not necessarily in the computer science domain and science projects depend on academic funding. Centralized research programmer teams, who can provide broad experience and contribute to sustainability of solutions, are rather rare at universities and there is still a lack of incentives for interested developers to stay in academia. One of the future challenges for science gateways and thus for computational scientific methods will be to increase the sustainability and getting less dependent on successful proposals. The US National Science Foundation has recognized the importance of this topic for research and has funded the Science Gateways Community Institute (SGCI) to support not only teams in developing science gateways but also to help communities to find a way to sustain their favorite science gateway for conducting their research. This talk will go into detail for current challenges, the landscape around science gateways, the services of SGCI and approaches to reach sustainability.
Research Software Sustainability
The document discusses the importance of research software and challenges in ensuring its sustainability. It notes that research software is increasingly essential in research but often lacks proper maintenance. Three key points are made:
1) Research software is widely used across many fields and agencies invest billions in its development, yet researchers are not rewarded for its creation and maintenance.
2) Without maintenance, research software will collapse over time as it becomes outdated or broken. Many projects rely on just one or two developers.
3) Changing incentives, career paths, training, and funding models is needed to improve the sustainability of research software for the long-term benefit of science.
(a slightly updated version of this talk is at https://doi.org/10.6084/m9.figshare.10301741.v1)
A talk on the role of software in research and how NCSA is responding in terms of people and roles - given at the 2019 Data Science Leadership Summit (https://sites.google.com/msdse.org/datascienceleadership2019/).
This is partially based on a previous paper: Daniel S. Katz, Kenton McHenry, Caleb Reinking, Robert Haines, "Research Software Development & Management in Universities: Case Studies from Manchester's RSDS Group, Illinois' NCSA, and Notre Dame's CRC", 2019 IEEE/ACM 14th International Workshop on Software Engineering for Science (SE4Science)
doi: https://doi.org/10.1109/SE4Science.2019.00009
preprint: https://arxiv.org/abs/1903.00732
Parsl: Pervasive Parallel Programming in PythonDaniel S. Katz
The document summarizes Parsl, a Python library for pervasive parallel programming. Parsl allows users to naturally express parallelism in Python programs and execute tasks concurrently across different computing platforms while respecting data dependencies. It supports various use cases from small machine learning workloads to extreme-scale simulations involving millions of tasks and thousands of nodes. Parsl provides simple, scalable, and flexible parallel programming while hiding complexity of parallel execution.
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...Daniel S. Katz
This document discusses publicly-funded research software, algorithms, and workflows. It argues that software is fundamentally different than data and requires different policies regarding public access. The document outlines that a large portion of research is software-intensive and relies on software. However, software faces sustainability issues like "software collapse" if not actively maintained. The document recommends that funding agencies take steps to incentivize open source software and long-term maintenance through funding and career incentives. It suggests defaulting to open source models but allowing other options if justified, with the goal of software remaining useful over time beyond the initial funding period.
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...Daniel S. Katz
FAIR principles are not fully sufficient for software. While FAIR aims to make data findable, accessible, interoperable, and reusable, software has key differences from data. FAIR needs expansion to properly address software citation, availability, and quality. Specifically, it should encourage explicitly crediting software contributors, promoting open source as the default for availability, and potentially assessing quality as an additional principle. Simply applying FAIR as is for data does not adequately account for software's nature as both a creative work and executable tool.
How different groups think about software sustainability, what "equations" we might use to measure it, and how it really can't be measured looking forward but only predicted.
Slides for:
"Software Citation in Theory and Practice," by Daniel S. Katz and Neil P. Chue Hong (published paper - https://doi.org/10.1007/978-3-319-96418-8_34; preprint - https://arxiv.org/abs/1807.08149), presented at International Congress on Mathematical Software (ICMS 2018)
Abstract. In most fields, computational models and data analysis have become a significant part of how research is performed, in addition to the more traditional theory and experiment. Mathematics is no exception to this trend. While the system of publication and credit for theory and experiment (journals and books, often monographs) has developed and has become an expected part of the culture, how research is shared and how candidates for hiring, promotion are evaluated, software (and data) do not have the same history. A group working as part of the FORCE11 community developed a set of principles for software citation that fit software into the journal citation system, allow software to be published and then cited, and there are now over 50,000 DOIs that have been issued for software. However, some challenges remain, including: promoting the idea of software citation to developers and users; collaborating with publishers to ensure that systems collect and retain required metadata; ensuring that the rest of the scholarly infrastructure, particu- larly indexing sites, include software; working with communities so that software efforts count; and understanding how best to cite software that has not been published.
A talk about "Conceptualizing a US Research Software Sustainability Institute (URSSI)" presented at the Toward a New Computational Fluid Dynamics Software Infrastructure (CFDSI, https://www.colorado.edu/events/cfdsi/) workshop in Boulder, CO, 16 May 2018.
Research Software Sustainability: WSSSPE & URSSIDaniel S. Katz
The document discusses research software sustainability efforts by the WSSSPE and proposed URSSI institute. It provides an overview of WSSSPE which promotes sustainable research software through community activities and working groups addressing various aspects of the software lifecycle. It also outlines the goals and activities of the conceptualized URSSI institute which aims to establish a US research software sustainability organization through workshops, surveys, and ethnographic studies to understand needs and develop a concrete institute plan.
A brief status of software citation work presented at AAS splinter meeting on implementing the FORCE11 Software Citation Principles in Astronomy (2018-01-11)
A talk about citation and reproducibility in software, presented at the HSF (High Energy Physics Software Foundation) meeting at SDSC, San Diego, CA, USA, 23 January 2017
Based on citation work done by the FORCE11 Software Citation Working Group as well as recent reproducibility discussions, blogs, and papers
Software Citation: Principles, Implementation, and ImpactDaniel S. Katz
The document discusses software citation principles proposed by the FORCE11 Software Citation Working Group. It provides motivation for better recognizing software as a research output and measuring its impact and contributions through citation. The working group developed six software citation principles around importance, credit, unique identification, persistence, accessibility, and specificity. It also discusses implementing the principles through publishing software and citing other software in research papers, and next steps around endorsement and implementation efforts.
The document summarizes the history and plans of the Working towards Sustainable Software for Science: Practice and Experience (WSSSPE) workshops. It discusses that WSSSPE1-3 identified challenges in developing sustainable scientific software and proposed solutions through working groups. Some groups made progress, such as on software credit principles, while others did not due to lack of follow through. WSSSPE4 plans to further the vision of sustainable open-use research software through workshops on building the future and sharing practices and experiences.
Working towards Sustainable Software for Science: Practice and Experience (WS...Daniel S. Katz
This was a short talk about the WSSSPE events, given at the Dagstuhl workshop on Engineering Academic Software, 20 June 2016. It mostly discusses the working groups that have formed gradually over the WSSSPE meetings, and specifically those that worked through WSSSPE3, and what that have done since then.
Looking at Software Sustainability and Productivity Challenges from NSFDaniel S. Katz
The document discusses challenges in software sustainability and productivity faced by the National Science Foundation (NSF). It notes that NSF typically only funds software projects for 5 years, though many projects require support for 20+ years. It also discusses issues like a lack of career paths for software-focused researchers, inconsistent incentives and credit systems, training needs, challenges of interdisciplinary work, and ensuring software portability and dissemination. While the NSF has made some improvements through programs like SI2, the document concludes that more work remains to be done to address these challenges and push academic culture to better support long-term software projects.
Scientific research: What Anna Karenina teaches us about useful negative resultsDaniel S. Katz
a panel talk for the 1st Workshop on E-science ReseaRch leading tO negative Results (ERROR), held in conjunction with the 11th eScience conference on 3 September 2015 in Munich, Germany
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Things to Consider When Choosing a Website Developer for your Website | FODUUFODUU
Choosing the right website developer is crucial for your business. This article covers essential factors to consider, including experience, portfolio, technical skills, communication, pricing, reputation & reviews, cost and budget considerations and post-launch support. Make an informed decision to ensure your website meets your business goals.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
2. Big Science and Infrastructure
• Hurricanes affect humans
• Multi-physics: atmosphere, ocean, coast, vegetation, soil
– Sensors and data as inputs
• Humans: what have they built, where are they, what will they do
– Data and models as inputs
• Infrastructure:
– Urgent/scheduled processing, workflows
– Software applications, workflows
– Networks
– Decision-support systems,
visualization
– Data storage,
interoperability
3. Long-tail Science and Infrastructure
• Exploding data volumes &
powerful simulation methods
mean that more researchers
need advanced infrastructure
• Such “long-tail” researchers
cannot afford expensive
expertise and unique
infrastructure
• Challenge: Outsource and/or
automate time-consuming
common processes
– Tools, e.g., Globus Online
and data management
• Note: much LHC data is moved
by Globus GridFTP, e.g., May/
June 2012, >20 PB, >20M files
– Gateways, e.g., nanoHUB,
CIPRES, access to scientific
simulation software
NSF grant size, 2007.
(“Dark data in the long tail
of science”, B. Heidorn)
4. Cyberinfrastructure (e-Research)
• “Cyberinfrastructure consists of computing systems,
data storage systems, advanced instruments and
data repositories, visualization environments, and
people, all linked together by software and high
performance networks to improve research
productivity and enable breakthroughs not otherwise
possible.”
-- Craig Stewart
• Infrastructure elements:
– parts of an infrastructure,
– developed by individuals and groups,
– international,
– developed for a purpose,
– used by a community
5. Cyberinfrastructure Framework for 21st Century
Science and Engineering (CIF21)
• Cross-NSF portfolio of activities to provide integrated cyber resources
that will enable new multidisciplinary research opportunities in all
science and engineering fields by leveraging ongoing investments and
using common approaches and components (http://www.nsf.gov/cif21)
• ACCI task force reports (http://www.nsf.gov/od/oci/taskforces/index.jsp)
– Campus Bridging, Cyberlearning & Workforce Development, Data
& Visualization, Grand Challenges, HPC, Software for Science &
Engineering
– Included recommendation for NSF-wide CDS&E program
• Vision and Strategy Reports
– ACI - http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12051
– Software - http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12113
– Data - http://www.nsf.gov/od/oci/cif21/DataVision2012.pdf
• Implementation
– Implementation of Software Vision
http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504817
6. Infrastructure Role & Lifecycle
Create and maintain a
software ecosystem
providing new
capabilities that
advance and accelerate
scientific inquiry at
unprecedented
complexity and scale
Support the
foundational
research necessary
to continue to
efficiently advance
scientific software
Enable transformative,
interdisciplinary,
collaborative, science
and engineering
research and
education through the
use of advanced
software and services
Transform practice through new
policies for software addressing
challenges of academic culture, open
dissemination and use, reproducibility
and trust, curation, sustainability,
governance, citation, stewardship, and
attribution of software authorship
Develop a next generation diverse
workforce of scientists and
engineers equipped with essential
skills to use and develop software,
with software and services used in
both the research and education
process
7. Learning and Workforce Development
CI-focused
Workforce as Cyberinfrastructure
Cyber Scientists
to develop
new capabilities
CI-enabled
Area Scientists
to exploit
new capabilities
CI-skilled
Professional Staff
to support
new capabilities
CI
8. Example: Interactions in
Understanding the Universe (I2U2)
• An "educational virtual organization" to strengthen education
and outreach activities of scientific experiments at U.S.
universities and laboratories (CMS, Cosmic Rays, LIGO)
• Creates and maintains infrastructure and common fabric to
develop hands-on laboratory course content and provide
interactive learning experience bringing tangible aspects of
experiments into an accessible "virtual laboratory”
– e-Labs: for classrooms, using web tools
– i-Labs: for museums, using physical interactive interfaces
• Collaboration of scientists, computer scientists and educators
to grow and sustain scientific workforce, and to promote public
appreciation of and support for the complex collaborations of
our national scientific programs
• https://www.i2u2.org
9. Example: Data Science for the
Social Good (DSSG)
• Eric & Wendy Schmidt Data Science for Social Good
fellowship is a University of Chicago summer program for
aspiring data scientists to work on data mining, machine
learning, big data, and data science projects with social
impact
• 48 undergrad and grad students come to Chicago, work in
small-teams with governments and non-profits, led by full-time
mentors, to tackle real-world problems in education, health,
energy, transportation, etc.
• http://dssg.io
10. Common Elements of Examples
• Bring together students and
educators/teachers/mentors
• Use/build tools (preexisting for U2I2,
some preexisting, some novel for
DSSG)
• Have impact for students, and
ideally for science/society
12. Curricula Activities (CS)
• Parallel Computing
– TCPP model curriculum
• http://www.cs.gsu.edu/~tcpp/curriculum/
– Others also exploring this space
• E.g., new curriculum at CMU - http://hiperfit.dk/pdf/HIPERFIT-2-harper.pdf
– Intel Academic Community
• http://software.intel.com/en-us/academic
• Distributed Computing
– NSF Workshop: Designing Tools and Curricula for Undergraduate
Courses in Distributed Systems
• Parallel and Distributed Computing
– ACM/IEEE-CS Computer Science Curricula 2013
• Common elements that can apply outside CS
– Concerned faculty come together
– Work through needed changes (not just additions)
– Work with early adopters
– Update and expand
13. HPC University (HPCU)
• A virtual organization
• Goal: to provide cohesive, persistent, and sustainable on-line
environment to share educational and training materials for
continuum of HPC environments from desktops to the highest-end
facilities
• Resources to guide researchers, educators and students to
– Choose successful paths for HPC learning and workforce development
– Contribute high-quality and pedagogically effective materials that allow
individuals at all levels and in all fields of study to advance scientific discovery
• Actively seeks participation from all parts of HPC community to:
– Assess the learning and workforce development needs and requirements of
the community
– Catalog, disseminate and promote peer-reviewed and persistent HPC
resources
– Develop new content to fill the gaps to address community needs
– Broaden access by a larger and more diverse community via a variety of
delivery methods
– Pursue other activities as needed to address community needs
• http://hpcuniversity.org/
15. The role of training
• There’s a lot of computer/computing training available
• Synchronous and asynchronous
• Often led by centers, and universities with computational
research programs (XSEDE, DOE, etc.)
• Also supported by organizations such as Shodor Foundation,
http://www.computationalscience.org/
• Some aimed at users, others at educators (both K-12 and
university)
• Some is fairly specific computational science (e.g., molecular
dynamics), but other is fairly general (parallel programming)
• OSG offers training for grid computing, but mostly for actually
running or using OSG or HTDC systems (Condor) - https://
opensciencegrid.org/bin/view/Education/WebHome
– E.g. security for users, security for admins
– Also see http://www.campusgrids.org, and U Wisconsin and U Nebraska
classes and curricula based on their experience with and collaboration with
OSG
16. Software Carpentry
• Helps researchers be more productive by teaching them
basic computing skills
• Runs boot camps at dozens of sites around the world, and
also provides open access material online for self-paced
instruction
– Introduction to Unix shell; introduce pipes, loops, history, and the
idea of scripting
– Introduction to Python, to building components that can be used in
pipelines, and to when and why to break code into reusable
functions.
– Version control for file sharing, collaboration, and reproducibility.
– Testing (both the mechanics and the use of tests to define problems
more precisely).
– An introduction to either databases or NumPy, depending on the
audience
17. Software Carpentry Model
• Good expansion model – Initial instructors still are active,
but new instructors come from those trained who want to
share what they have learned, using the material provided
• Material is all open for use and improvement
• In some sense, Software Carpentry has become the
coordinators between those who want to learn and those
who want to teach
• http://software-carpentry.org
18. Argonne Training Program on
Extreme-Scale Computing (ATPESC)
• Two-week program for computational scientists
• Provides intensive hands-on training on the key
skills, approaches, and tools to design,
implement, and execute computational science
and engineering applications on current high-end
computing systems and the leadership-class
computing systems of the future
• As a bridge to that future, this program fills the
gap that exists in the training computational
scientists typically receive through formal
education or other shorter courses.
19. How can NSF help
(if you are in the US)
• Research Experiences for
Undergraduates (REU)
• Division of Undergraduate Education
(EHR/DUE)
• Cyberlearning
• NSF Research Traineeship (NRT)
• Support of workshops
• Support undergraduate travel to
conferences
20. Research Experiences for
Undergraduates - Supplements
• Goals: expand student participation in all kinds of research;
attract diversified pool of talented students into careers in
science and engineering; help ensure that they receive the best
education possible
• REU supplements typically provides support for 1-2 undergrads
to participate in research, can be more for large projects
• Mentoring is important; project should develop students'
research skills, involve them in the culture of research in the
discipline, and connect their research experience with their
overall course of study
• Support for undergrads involved in carrying out research should
be included as part of the research proposal rather than as a
post-award supplement, unless it was not foreseeable at the
time of the original proposal
• REU supplement requests are handled by the NSF program
officer for the underlying research grant
21. Research Experiences for
Undergraduates - Sites
• REU Sites host a summer cohort of
undergraduates for a structured research-learning
experience
– Vision: Extend research participation to students who
would otherwise lack such opportunities
• At least 50% from institutions other than the host
• At least 50% from schools with limited STEM research
opportunities
• Outreach to underrepresented groups is a plus
– Implementation: Create empowering cohort
experience that promotes STEM engagement
• Coherent intellectual focus to research topics
• Research mentoring and support
• Professional development, grad school prep
• Cohort building, networking opportunities, social events
22. Education and Human Resources /
Undergraduate Education (DUE)
• DUE goals include:
– Support Curriculum Development
• Stimulate and support research on learning.
• Promote development of exemplary materials and strategies for
education.
• Support model assessment programs and practices.
• Effect broad dissemination of effective pedagogy and materials.
• Enable long-term sustainability of effective activities.
– Prepare the Workforce
• Promote technological, quantitative, and scientific literacy.
• Support an increase in diversity, size, and quality of the next
generation of STEM professionals who enter the workforce with
two- or four-year degrees or who continue their studies in
graduate and professional schools.
• Invest in the nation's future K-12 teacher workforce.
• Fund research to evaluate and improve workforce initiatives.
• http://www.nsf.gov/div/index.jsp?org=DUE
23. Cyberlearning and Future Learning
Technologies NSF 14-526
• Vision:
– New technologies will transform learning opportunities, interests,
and outcomes in all phases of life, making it possible for learning to
be tailored to individuals and groups
– Best technological genres and socio-technical systems designed for
these purposes will be informed by how people learn
– Can make progress in understanding learning, moving toward
predictive computational models of individual and group learning
• Aims:
– Learning how to design and effectively use the learning technologies
of the future (Future Learning Technologies)
– Understanding processes involved in learning when learners can
have experiences that only technology allows (Cyberlearning)
• Every project addresses and connects 3 thrusts:
– Innovation
– Advancing understanding of how people learn in technology-rich
learning environments
– Promoting generalizability and transferability of new genres
24. NSF Research Traineeship
(NRT) NSF14-548
• To develop bold, new, potentially transformative, and
scalable models for STEM graduate training
• Ensure that graduate students develop the skills,
knowledge, and competencies needed to pursue a range
of STEM careers
• 1 initial priority research theme - Data-Enabled Science
and Engineering (DESE) - but other crosscutting,
interdisciplinary themes are also allowed, aligned with
national research priorities
• Emphasizes the development of competencies for both
research and research-related careers
• Creation of sustainable programmatic capacity at
institutions is an expected outcome.
• Replaces IGERT
25. Other events
• Grace Hopper Celebration of Women in
Computing
– World’s largest gathering of technical women in
computing
– Place where technical women gather to network, find
or be mentors, create collaborative proposals, and
increase the visibility of women’s contributions to
computing
• Tapia Celebration of Diversity in Computing
– Brings together undergraduate and graduate students,
faculty, researchers, and professionals in computing
from all backgrounds and ethnicities
• Both aim to promote their attendees work and
increase networking & mentoring
• Both can be inspirational for undergraduates!!
26. Learning and Workforce Development
CI-focused
Workforce as Cyberinfrastructure
Cyber Scientists
to develop
new capabilities
CI-enabled
Area Scientists
to exploit
new capabilities
CI-skilled
Professional Staff
to support
new capabilities
CI
27. Conclusions
• Lots of demand for trained staff and users
– Those who have the need are trying to provide training to fill that need
(pull)
• Seems to be lots of demand for educated developers, staff, users
– In general, the burden for filling this need seems to be on the
traditional academic system (push)
– Software Carpentry as an exception?
• In CS, lots of people teaching
– Starting to share experiences and lessons learned
– Moving towards some consensus on what to teach or how to teach it?
• Cyberinfrastructure is a common point where big science, long-tail
science, and education meet; it has a dual role
– Used to train/educate
– Needs to be refreshed by trained/educated developers
• Lots of opportunities exist!
– Form a community
– Decide what needs to be done
– Find the right opportunity