Some thoughts on how research and infrastructure software are supported by NSF (and possibly other agencies), for the "What can academia learn from open source?" Academia Town Hall - https://ti.to/github-events/academia-town-hall-
Deonesha Williams is seeking a full-time position in project management or information technology with her skills in leadership, communication, problem solving, and project management. She has a bachelor's degree in management information systems from Virginia State University with a 3.297 GPA and coursework in business areas. Her internship experience includes working on IT policy and controls at General Electric, logistics management for the Army, and an IT security internship at Virginia State University where she managed projects and databases. She has proficiencies in Microsoft Office, SAP, Adobe, and AutoCAD software and is active in her university honors program and various leadership organizations.
Looking at Software Sustainability and Productivity Challenges from NSFDaniel S. Katz
The document discusses challenges in software sustainability and productivity faced by the National Science Foundation (NSF). It notes that NSF typically only funds software projects for 5 years, though many projects require support for 20+ years. It also discusses issues like a lack of career paths for software-focused researchers, inconsistent incentives and credit systems, training needs, challenges of interdisciplinary work, and ensuring software portability and dissemination. While the NSF has made some improvements through programs like SI2, the document concludes that more work remains to be done to address these challenges and push academic culture to better support long-term software projects.
Working towards Sustainable Software for Science: Practice and Experience (WS...Daniel S. Katz
This was a short talk about the WSSSPE events, given at the Dagstuhl workshop on Engineering Academic Software, 20 June 2016. It mostly discusses the working groups that have formed gradually over the WSSSPE meetings, and specifically those that worked through WSSSPE3, and what that have done since then.
Discussing Software Citation and related topics at Workshop on Data and Software Citation (June 6-7 at Harvard Medical School, http://www.software4data.com/#!nsf-workshop/jghgb)
The document discusses the need for a Science Gateways Community Institute (SGCI) to support science gateway developers. Science gateways are online interfaces that provide access to advanced computing resources, software, and data for research. Currently, gateway developers often work in isolation without shared resources or expertise. The proposed SGCI would provide free services like expertise in various areas of gateway development, project planning, and continued support. This would help promote more efficient, effective, and sustainable development of science gateways to enable scientific discovery.
The document contains the resume of Anya Vysotskaya, who has over 10 years of experience in information security, database administration, and quality assurance. She has a bachelor's degree in natural sciences and IT and is skilled in penetration testing, programming, network security, and software testing. The resume highlights her work history, skills, education, and projects demonstrating her expertise in information security, database administration, and quality assurance.
Christopher Bell is a passionate developer seeking a position utilizing skills in Python, JavaScript, MySQL, and Bash. He has experience developing web and data applications at InterSystems Corporation, Dun & Bradstreet NetProspex, and Epic Systems. His background includes project management, database development, and ensuring regulatory compliance for healthcare clients.
Deonesha Williams is seeking a full-time position in project management or information technology with her skills in leadership, communication, problem solving, and project management. She has a bachelor's degree in management information systems from Virginia State University with a 3.297 GPA and coursework in business areas. Her internship experience includes working on IT policy and controls at General Electric, logistics management for the Army, and an IT security internship at Virginia State University where she managed projects and databases. She has proficiencies in Microsoft Office, SAP, Adobe, and AutoCAD software and is active in her university honors program and various leadership organizations.
Looking at Software Sustainability and Productivity Challenges from NSFDaniel S. Katz
The document discusses challenges in software sustainability and productivity faced by the National Science Foundation (NSF). It notes that NSF typically only funds software projects for 5 years, though many projects require support for 20+ years. It also discusses issues like a lack of career paths for software-focused researchers, inconsistent incentives and credit systems, training needs, challenges of interdisciplinary work, and ensuring software portability and dissemination. While the NSF has made some improvements through programs like SI2, the document concludes that more work remains to be done to address these challenges and push academic culture to better support long-term software projects.
Working towards Sustainable Software for Science: Practice and Experience (WS...Daniel S. Katz
This was a short talk about the WSSSPE events, given at the Dagstuhl workshop on Engineering Academic Software, 20 June 2016. It mostly discusses the working groups that have formed gradually over the WSSSPE meetings, and specifically those that worked through WSSSPE3, and what that have done since then.
Discussing Software Citation and related topics at Workshop on Data and Software Citation (June 6-7 at Harvard Medical School, http://www.software4data.com/#!nsf-workshop/jghgb)
The document discusses the need for a Science Gateways Community Institute (SGCI) to support science gateway developers. Science gateways are online interfaces that provide access to advanced computing resources, software, and data for research. Currently, gateway developers often work in isolation without shared resources or expertise. The proposed SGCI would provide free services like expertise in various areas of gateway development, project planning, and continued support. This would help promote more efficient, effective, and sustainable development of science gateways to enable scientific discovery.
The document contains the resume of Anya Vysotskaya, who has over 10 years of experience in information security, database administration, and quality assurance. She has a bachelor's degree in natural sciences and IT and is skilled in penetration testing, programming, network security, and software testing. The resume highlights her work history, skills, education, and projects demonstrating her expertise in information security, database administration, and quality assurance.
Christopher Bell is a passionate developer seeking a position utilizing skills in Python, JavaScript, MySQL, and Bash. He has experience developing web and data applications at InterSystems Corporation, Dun & Bradstreet NetProspex, and Epic Systems. His background includes project management, database development, and ensuring regulatory compliance for healthcare clients.
NSF SI2 program discussion at 2014 SI2 PI meetingDaniel S. Katz
This document discusses software as infrastructure for science and engineering research. It outlines how software is essential to many areas of science, with about half of recent science papers involving software-intensive projects. It also discusses how "long-tail" scientists need advanced infrastructure to handle large data and simulations. The document notes challenges around larger teams, more data and complex systems, and changing hardware and software. It positions software as a critical part of cyberinfrastructure and outlines NSF programs like SI2 and CDS&E that support development of sustainable scientific software infrastructure.
NSF SI2 program discussion at 2013 SI2 PI meetingDaniel S. Katz
This document discusses software infrastructure challenges and opportunities in science. It notes that software is essential to much of modern science and is a form of infrastructure. It outlines NSF's vision and strategies for supporting software infrastructure through the CIF21 initiative and specific programs like SI2, CDS&E, and XPS. The document discusses the SI2 program's activities in supporting software elements, frameworks, and institutes. It raises general questions about supporting existing infrastructure, deciding when to stop support, encouraging reuse, measuring impact, and supporting software developer careers.
Scientific Software Challenges and Community ResponsesDaniel S. Katz
a talk given at RTI International on 7 December 2015, discussing 12 scientific software challenges and how the scientific software community is responding to them
Working towards Sustainable Software for Science (an NSF and community view)Daniel S. Katz
This document discusses challenges and opportunities for developing sustainable software for science. It notes that software is increasingly important for science but current practices and incentives do not support long-term sustainability. The document summarizes discussions from the Working Towards Sustainable Software for Science conference, which identified key issues around developing sustainable software, best practices, policies around credit and careers, and building supportive communities. It proposes that better measuring contributions to software could help address incentives, career paths, and sustainability of software over the long term.
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...Daniel S. Katz
This document discusses publicly-funded research software, algorithms, and workflows. It argues that software is fundamentally different than data and requires different policies regarding public access. The document outlines that a large portion of research is software-intensive and relies on software. However, software faces sustainability issues like "software collapse" if not actively maintained. The document recommends that funding agencies take steps to incentivize open source software and long-term maintenance through funding and career incentives. It suggests defaulting to open source models but allowing other options if justified, with the goal of software remaining useful over time beyond the initial funding period.
Scientific Software Innovation Institutes (S2I2s) as part of NSF’s SI2 programDaniel S. Katz
This talk, presented at a computational chemistry institute conceptualization project (https://sites.google.com/site/s2i2biomolecular/), discusses a view Scientific Software Innovation Institutes, as part of NSF's Software Infrastructure for Sustained Innovation (SI2) program
(a slightly updated version of this talk is at https://doi.org/10.6084/m9.figshare.10301741.v1)
A talk on the role of software in research and how NCSA is responding in terms of people and roles - given at the 2019 Data Science Leadership Summit (https://sites.google.com/msdse.org/datascienceleadership2019/).
This is partially based on a previous paper: Daniel S. Katz, Kenton McHenry, Caleb Reinking, Robert Haines, "Research Software Development & Management in Universities: Case Studies from Manchester's RSDS Group, Illinois' NCSA, and Notre Dame's CRC", 2019 IEEE/ACM 14th International Workshop on Software Engineering for Science (SE4Science)
doi: https://doi.org/10.1109/SE4Science.2019.00009
preprint: https://arxiv.org/abs/1903.00732
ORION Workshop: XSEDE and Building a National/International CyberinfrastructureJohn Towns
This document discusses XSEDE and building national/international cyberinfrastructure. It introduces John Towns, the director of XSEDE, and outlines some of his background and roles. It then provides high-level information about XSEDE, including that it is a 5-year, $121 million project funded by the NSF to provide advanced digital services and resources. XSEDE acts as a virtual organization comprising resources from 20 partner institutions. The document discusses some lessons learned from previous similar projects and challenges in managing a distributed organization like XSEDE. It also provides strategies for successful partnerships, emphasizing the importance of clear goals, stakeholder needs, decision-making processes, and flexibility.
Research Software Sustainability: WSSSPE & URSSIDaniel S. Katz
The document discusses research software sustainability efforts by the WSSSPE and proposed URSSI institute. It provides an overview of WSSSPE which promotes sustainable research software through community activities and working groups addressing various aspects of the software lifecycle. It also outlines the goals and activities of the conceptualized URSSI institute which aims to establish a US research software sustainability organization through workshops, surveys, and ethnographic studies to understand needs and develop a concrete institute plan.
XSEDE is a major research infrastructure with collaborations worldwide supporting thousands of researchers across a wide range of domains. XSEDE has taken an integrative and holistic approach to supporting researchers in the use of the varying resources and services available via XSEDE. This presentation will briefly review XSEDE and its vision and provide a discussion of the efforts within XSEDE targeted at supporting research communities.
This document describes the Science Gateways Community Institute (SGCI), a new NSF-funded institute aimed at helping the scientific community more effectively build online gateways and resources for research. The SGCI will provide consulting services, training, developer support, opportunities for students and educators, and a forum for the gateway community to connect and exchange knowledge. The goal is for the SGCI to become a central resource for all aspects of building and supporting science gateways.
The document summarizes the history and plans of the Working towards Sustainable Software for Science: Practice and Experience (WSSSPE) workshops. It discusses that WSSSPE1-3 identified challenges in developing sustainable scientific software and proposed solutions through working groups. Some groups made progress, such as on software credit principles, while others did not due to lack of follow through. WSSSPE4 plans to further the vision of sustainable open-use research software through workshops on building the future and sharing practices and experiences.
International Symposium NLHPC 2013: Innovation at the frontier of HPC
Title: XSEDE: an ecosystem of advanced digital services accelerating scientific discovery
Abstract:
The XSEDE program (Extreme Science and Engineering Discovery Environment) has recently entered its third year of operation. In this talk we will discuss the vision, mission and goals of this project and some of the distinguishing characteristics of the program. This will be accompanied by a review of current status and look ahead at where the program is headed over the next several years.
This document discusses metrics for measuring the impact of open source software and proposes a vision for tracking software usage and citations. It presents scenarios for different types of open source software and the metrics that could apply. The vision involves products being registered with credit maps to track contributors, and usage being recorded to provide credit to developers. It notes challenges around privacy, defining usage, and tying later products to past usage. The document argues a lack of credit discourages sharing and providing metrics could incentivize collaboration.
US University Research Funding, Peer Reviews, and MetricsDaniel S. Katz
My part of the "Digital Science Webinar: Articulating Research Impact – Strategies from Around the Globe" (http://www.digital-science.com/events/digital-science-webinar-articulating-research-impact-strategies-from-around-the-globe/)
Daniel S. Katz will discuss how reviewers at the National Science Foundation (USA) consider the “intellectual merit” and “broader impacts” criteria for funding and in particular how metrics might help applicants understand their impacts in these areas.Dan will also talk about how reviewers might use qualitative and quantitative altmetrics data to inform their peer reviews for grant applications. He will address many of the salient questions around this use of metrics, for example, do reviewers take metrics seriously and what types of metrics are of most value to them?
The document outlines the services provided by the Science Gateways Community Institute (SGCI) to support the development and use of science gateways. The SGCI offers expertise through an Incubator program to guide gateway projects through all stages. It provides dedicated support staff to directly assist with building and enhancing gateways. It also aims to leverage existing gateway technologies by providing reusable software components. The goal is to help gateway creators focus on their science by utilizing SGCI resources and expertise.
Module 6 - Systems Planning bak.pptx.pdfMASantos15
This document provides an overview of systems planning. It discusses strategic planning, including conducting a SWOT analysis and developing a mission statement, goals, and objectives. It also covers factors to consider for information systems projects, such as internal and external influences. The document outlines the steps of a feasibility study, including assessing operational, technical, economic, and schedule feasibility. Finally, it discusses the preliminary investigation process for planning an information systems project, which involves understanding the problem, defining scope and constraints, fact-finding, feasibility evaluation, estimating time and costs, and presenting results to management.
- The document discusses software analytics and presents a ranked list of 145 questions that software engineers want data scientists to answer. It describes conducting surveys of over 1,500 engineers to develop this list.
- The list identifies important topics for researchers, practitioners, and educators to focus on based on the needs of industry. These include understanding software processes and practices, and the impact of code quality.
- Developing this catalog of questions can guide the collection and analysis of data, as well as the development of tools and techniques, to better support decision-making in software projects.
Research Software Sustainability
The document discusses the importance of research software and challenges in ensuring its sustainability. It notes that research software is increasingly essential in research but often lacks proper maintenance. Three key points are made:
1) Research software is widely used across many fields and agencies invest billions in its development, yet researchers are not rewarded for its creation and maintenance.
2) Without maintenance, research software will collapse over time as it becomes outdated or broken. Many projects rely on just one or two developers.
3) Changing incentives, career paths, training, and funding models is needed to improve the sustainability of research software for the long-term benefit of science.
Parsl: Pervasive Parallel Programming in PythonDaniel S. Katz
The document summarizes Parsl, a Python library for pervasive parallel programming. Parsl allows users to naturally express parallelism in Python programs and execute tasks concurrently across different computing platforms while respecting data dependencies. It supports various use cases from small machine learning workloads to extreme-scale simulations involving millions of tasks and thousands of nodes. Parsl provides simple, scalable, and flexible parallel programming while hiding complexity of parallel execution.
NSF SI2 program discussion at 2014 SI2 PI meetingDaniel S. Katz
This document discusses software as infrastructure for science and engineering research. It outlines how software is essential to many areas of science, with about half of recent science papers involving software-intensive projects. It also discusses how "long-tail" scientists need advanced infrastructure to handle large data and simulations. The document notes challenges around larger teams, more data and complex systems, and changing hardware and software. It positions software as a critical part of cyberinfrastructure and outlines NSF programs like SI2 and CDS&E that support development of sustainable scientific software infrastructure.
NSF SI2 program discussion at 2013 SI2 PI meetingDaniel S. Katz
This document discusses software infrastructure challenges and opportunities in science. It notes that software is essential to much of modern science and is a form of infrastructure. It outlines NSF's vision and strategies for supporting software infrastructure through the CIF21 initiative and specific programs like SI2, CDS&E, and XPS. The document discusses the SI2 program's activities in supporting software elements, frameworks, and institutes. It raises general questions about supporting existing infrastructure, deciding when to stop support, encouraging reuse, measuring impact, and supporting software developer careers.
Scientific Software Challenges and Community ResponsesDaniel S. Katz
a talk given at RTI International on 7 December 2015, discussing 12 scientific software challenges and how the scientific software community is responding to them
Working towards Sustainable Software for Science (an NSF and community view)Daniel S. Katz
This document discusses challenges and opportunities for developing sustainable software for science. It notes that software is increasingly important for science but current practices and incentives do not support long-term sustainability. The document summarizes discussions from the Working Towards Sustainable Software for Science conference, which identified key issues around developing sustainable software, best practices, policies around credit and careers, and building supportive communities. It proposes that better measuring contributions to software could help address incentives, career paths, and sustainability of software over the long term.
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...Daniel S. Katz
This document discusses publicly-funded research software, algorithms, and workflows. It argues that software is fundamentally different than data and requires different policies regarding public access. The document outlines that a large portion of research is software-intensive and relies on software. However, software faces sustainability issues like "software collapse" if not actively maintained. The document recommends that funding agencies take steps to incentivize open source software and long-term maintenance through funding and career incentives. It suggests defaulting to open source models but allowing other options if justified, with the goal of software remaining useful over time beyond the initial funding period.
Scientific Software Innovation Institutes (S2I2s) as part of NSF’s SI2 programDaniel S. Katz
This talk, presented at a computational chemistry institute conceptualization project (https://sites.google.com/site/s2i2biomolecular/), discusses a view Scientific Software Innovation Institutes, as part of NSF's Software Infrastructure for Sustained Innovation (SI2) program
(a slightly updated version of this talk is at https://doi.org/10.6084/m9.figshare.10301741.v1)
A talk on the role of software in research and how NCSA is responding in terms of people and roles - given at the 2019 Data Science Leadership Summit (https://sites.google.com/msdse.org/datascienceleadership2019/).
This is partially based on a previous paper: Daniel S. Katz, Kenton McHenry, Caleb Reinking, Robert Haines, "Research Software Development & Management in Universities: Case Studies from Manchester's RSDS Group, Illinois' NCSA, and Notre Dame's CRC", 2019 IEEE/ACM 14th International Workshop on Software Engineering for Science (SE4Science)
doi: https://doi.org/10.1109/SE4Science.2019.00009
preprint: https://arxiv.org/abs/1903.00732
ORION Workshop: XSEDE and Building a National/International CyberinfrastructureJohn Towns
This document discusses XSEDE and building national/international cyberinfrastructure. It introduces John Towns, the director of XSEDE, and outlines some of his background and roles. It then provides high-level information about XSEDE, including that it is a 5-year, $121 million project funded by the NSF to provide advanced digital services and resources. XSEDE acts as a virtual organization comprising resources from 20 partner institutions. The document discusses some lessons learned from previous similar projects and challenges in managing a distributed organization like XSEDE. It also provides strategies for successful partnerships, emphasizing the importance of clear goals, stakeholder needs, decision-making processes, and flexibility.
Research Software Sustainability: WSSSPE & URSSIDaniel S. Katz
The document discusses research software sustainability efforts by the WSSSPE and proposed URSSI institute. It provides an overview of WSSSPE which promotes sustainable research software through community activities and working groups addressing various aspects of the software lifecycle. It also outlines the goals and activities of the conceptualized URSSI institute which aims to establish a US research software sustainability organization through workshops, surveys, and ethnographic studies to understand needs and develop a concrete institute plan.
XSEDE is a major research infrastructure with collaborations worldwide supporting thousands of researchers across a wide range of domains. XSEDE has taken an integrative and holistic approach to supporting researchers in the use of the varying resources and services available via XSEDE. This presentation will briefly review XSEDE and its vision and provide a discussion of the efforts within XSEDE targeted at supporting research communities.
This document describes the Science Gateways Community Institute (SGCI), a new NSF-funded institute aimed at helping the scientific community more effectively build online gateways and resources for research. The SGCI will provide consulting services, training, developer support, opportunities for students and educators, and a forum for the gateway community to connect and exchange knowledge. The goal is for the SGCI to become a central resource for all aspects of building and supporting science gateways.
The document summarizes the history and plans of the Working towards Sustainable Software for Science: Practice and Experience (WSSSPE) workshops. It discusses that WSSSPE1-3 identified challenges in developing sustainable scientific software and proposed solutions through working groups. Some groups made progress, such as on software credit principles, while others did not due to lack of follow through. WSSSPE4 plans to further the vision of sustainable open-use research software through workshops on building the future and sharing practices and experiences.
International Symposium NLHPC 2013: Innovation at the frontier of HPC
Title: XSEDE: an ecosystem of advanced digital services accelerating scientific discovery
Abstract:
The XSEDE program (Extreme Science and Engineering Discovery Environment) has recently entered its third year of operation. In this talk we will discuss the vision, mission and goals of this project and some of the distinguishing characteristics of the program. This will be accompanied by a review of current status and look ahead at where the program is headed over the next several years.
This document discusses metrics for measuring the impact of open source software and proposes a vision for tracking software usage and citations. It presents scenarios for different types of open source software and the metrics that could apply. The vision involves products being registered with credit maps to track contributors, and usage being recorded to provide credit to developers. It notes challenges around privacy, defining usage, and tying later products to past usage. The document argues a lack of credit discourages sharing and providing metrics could incentivize collaboration.
US University Research Funding, Peer Reviews, and MetricsDaniel S. Katz
My part of the "Digital Science Webinar: Articulating Research Impact – Strategies from Around the Globe" (http://www.digital-science.com/events/digital-science-webinar-articulating-research-impact-strategies-from-around-the-globe/)
Daniel S. Katz will discuss how reviewers at the National Science Foundation (USA) consider the “intellectual merit” and “broader impacts” criteria for funding and in particular how metrics might help applicants understand their impacts in these areas.Dan will also talk about how reviewers might use qualitative and quantitative altmetrics data to inform their peer reviews for grant applications. He will address many of the salient questions around this use of metrics, for example, do reviewers take metrics seriously and what types of metrics are of most value to them?
The document outlines the services provided by the Science Gateways Community Institute (SGCI) to support the development and use of science gateways. The SGCI offers expertise through an Incubator program to guide gateway projects through all stages. It provides dedicated support staff to directly assist with building and enhancing gateways. It also aims to leverage existing gateway technologies by providing reusable software components. The goal is to help gateway creators focus on their science by utilizing SGCI resources and expertise.
Module 6 - Systems Planning bak.pptx.pdfMASantos15
This document provides an overview of systems planning. It discusses strategic planning, including conducting a SWOT analysis and developing a mission statement, goals, and objectives. It also covers factors to consider for information systems projects, such as internal and external influences. The document outlines the steps of a feasibility study, including assessing operational, technical, economic, and schedule feasibility. Finally, it discusses the preliminary investigation process for planning an information systems project, which involves understanding the problem, defining scope and constraints, fact-finding, feasibility evaluation, estimating time and costs, and presenting results to management.
- The document discusses software analytics and presents a ranked list of 145 questions that software engineers want data scientists to answer. It describes conducting surveys of over 1,500 engineers to develop this list.
- The list identifies important topics for researchers, practitioners, and educators to focus on based on the needs of industry. These include understanding software processes and practices, and the impact of code quality.
- Developing this catalog of questions can guide the collection and analysis of data, as well as the development of tools and techniques, to better support decision-making in software projects.
Research Software Sustainability
The document discusses the importance of research software and challenges in ensuring its sustainability. It notes that research software is increasingly essential in research but often lacks proper maintenance. Three key points are made:
1) Research software is widely used across many fields and agencies invest billions in its development, yet researchers are not rewarded for its creation and maintenance.
2) Without maintenance, research software will collapse over time as it becomes outdated or broken. Many projects rely on just one or two developers.
3) Changing incentives, career paths, training, and funding models is needed to improve the sustainability of research software for the long-term benefit of science.
Parsl: Pervasive Parallel Programming in PythonDaniel S. Katz
The document summarizes Parsl, a Python library for pervasive parallel programming. Parsl allows users to naturally express parallelism in Python programs and execute tasks concurrently across different computing platforms while respecting data dependencies. It supports various use cases from small machine learning workloads to extreme-scale simulations involving millions of tasks and thousands of nodes. Parsl provides simple, scalable, and flexible parallel programming while hiding complexity of parallel execution.
What is eScience, and where does it go from here?Daniel S. Katz
eScience has evolved from focusing on global scientific collaborations enabled by distributed computing infrastructure to emphasizing joint advances in digital infrastructure and how that infrastructure enables new research. This symbiotic relationship between research and infrastructure development could be called Research and Infrastructure Development Symbiosis (RaIDS). Going forward, RaIDS conferences should focus on improving communication between infrastructure developers and researchers to facilitate new collaborations, ensure research publications appropriately attribute enabling infrastructure advances, and standardize catalogs of available infrastructure and research challenges.
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...Daniel S. Katz
FAIR principles are not fully sufficient for software. While FAIR aims to make data findable, accessible, interoperable, and reusable, software has key differences from data. FAIR needs expansion to properly address software citation, availability, and quality. Specifically, it should encourage explicitly crediting software contributors, promoting open source as the default for availability, and potentially assessing quality as an additional principle. Simply applying FAIR as is for data does not adequately account for software's nature as both a creative work and executable tool.
How different groups think about software sustainability, what "equations" we might use to measure it, and how it really can't be measured looking forward but only predicted.
Slides for:
"Software Citation in Theory and Practice," by Daniel S. Katz and Neil P. Chue Hong (published paper - https://doi.org/10.1007/978-3-319-96418-8_34; preprint - https://arxiv.org/abs/1807.08149), presented at International Congress on Mathematical Software (ICMS 2018)
Abstract. In most fields, computational models and data analysis have become a significant part of how research is performed, in addition to the more traditional theory and experiment. Mathematics is no exception to this trend. While the system of publication and credit for theory and experiment (journals and books, often monographs) has developed and has become an expected part of the culture, how research is shared and how candidates for hiring, promotion are evaluated, software (and data) do not have the same history. A group working as part of the FORCE11 community developed a set of principles for software citation that fit software into the journal citation system, allow software to be published and then cited, and there are now over 50,000 DOIs that have been issued for software. However, some challenges remain, including: promoting the idea of software citation to developers and users; collaborating with publishers to ensure that systems collect and retain required metadata; ensuring that the rest of the scholarly infrastructure, particu- larly indexing sites, include software; working with communities so that software efforts count; and understanding how best to cite software that has not been published.
A talk about "Conceptualizing a US Research Software Sustainability Institute (URSSI)" presented at the Toward a New Computational Fluid Dynamics Software Infrastructure (CFDSI, https://www.colorado.edu/events/cfdsi/) workshop in Boulder, CO, 16 May 2018.
A brief status of software citation work presented at AAS splinter meeting on implementing the FORCE11 Software Citation Principles in Astronomy (2018-01-11)
A talk about citation and reproducibility in software, presented at the HSF (High Energy Physics Software Foundation) meeting at SDSC, San Diego, CA, USA, 23 January 2017
Based on citation work done by the FORCE11 Software Citation Working Group as well as recent reproducibility discussions, blogs, and papers
Software Citation: Principles, Implementation, and ImpactDaniel S. Katz
The document discusses software citation principles proposed by the FORCE11 Software Citation Working Group. It provides motivation for better recognizing software as a research output and measuring its impact and contributions through citation. The working group developed six software citation principles around importance, credit, unique identification, persistence, accessibility, and specificity. It also discusses implementing the principles through publishing software and citing other software in research papers, and next steps around endorsement and implementation efforts.
Scientific research: What Anna Karenina teaches us about useful negative resultsDaniel S. Katz
a panel talk for the 1st Workshop on E-science ReseaRch leading tO negative Results (ERROR), held in conjunction with the 11th eScience conference on 3 September 2015 in Munich, Germany
Panel: Our Scholarly Recognition System Doesn’t Still WorkDaniel S. Katz
A panel at the 2015 Science of Team Science (SciTS) Conference
Organizers: Daniel S. Katz (U. of Chicago & Argonne National Laboratory), Amy Brand (Digital Science), Melissa Haendel (Oregon Health & Science University), Holly J. Falk-Krzesinski (Elsevier)
Panelists: Robin Champieux (Oregon Health & Science University) Holly Falk-Krzesinski (Elsevier)Daniel S. Katz (U. of Chicago & Argonne National Laboratory)Philippa Saunders (University of Edinburgh)
Abstract: http://bit.ly/scholarly-recognition
Swift Parallel Scripting for High-Performance WorkflowDaniel S. Katz
The Swift scripting language was created to provide a simple, compact way to write parallel scripts that run many copies of ordinary programs concurrently in various workflow patterns, reducing the need for complex parallel programming or arcane scripting to achieve this common high-level task. The result was a highly portable programming model based on implicitly parallel functional dataflow. The same Swift script runs on multi-core computers, clusters, grids, clouds, and supercomputers, and is thus a useful tool for moving workflow computations from laptop to distributed and/or high performance systems.
Swift has proven to be very general, and is in use in domains ranging from earth systems to bioinformatics to molecular modeling. It’s more recently been adapted to serve as a programming model for much finer-grain in-memory workflow on extreme scale systems, where it can perform task rates in the millions to billion-per-second.
In this talk, we describe the state of Swift’s implementation, present several Swift applications, and discuss ideas for of the future evolution of the programming model on which it’s based.
A Method to Select e-Infrastructure Components to SustainDaniel S. Katz
This is a talk presented at International Symposium on Grids and Clouds (ISGC), Taipei, Taiwan, March 20, 2015.
Abstract:
Reusable infrastructure (systems created by one or more people and intended to be used by other people) has become essential for many types of research over the last century, from microscopes to telescopes, and from sequencers to colliders. Over the past few decades, much research infrastructure has become digital, and many new digital systems have been developed. Here we discuss e-Research infrastructure (also called cyberinfrastructure), which has been defined by Craig Stewart as consisting of “... computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high performance networks to improve research productivity and enable breakthroughs not otherwise possible.” While the research infrastructure as a whole is important, it is useful to consider infrastructure elements as well, as they comprise the overall infrastructure. Each element has a technical context (which allows one to ask questions about its architecture, such as: How does it fit into the overall infrastructure? How does it interact with other infrastructure elements?), a social context (which allows one to ask questions about its developers, such as: Who has developed the element?, and it users, such as: Who uses the element?, and its purpose, such as: What is the intended use of the element?), and a political context (which allows one to ask questions about its funders, such as: Who funds the development and maintenance?, and about its political scope, such as: Is the element national? International?). Understanding how a particular infrastructure element can be created and sustained requires answering two pairs of questions: What resources are needed to create it, and how can those resources be assembled and applied? What resources are needed to sustain it, and how can those resources be assembled and applied? In this paper, we focus on the second half of the two questions, since the amount and type of needed resources vary with the specific element being discussed. We believe elements of e-Research infrastructure can be placed in a three-dimensional space, consisting of temporal duration, spatial extent, and purpose. Note that the number of users of a given element should be larger the farther the element is from the origin in any direction, as should the cost. These two elements (number of users and cost) can be generically called ‘scale’ in this context. Alternatively, we can attempt to map impact, rather than usage, as an element of scale. In either case, scale is thus a metric of the space, though it is not orthogonal to any of the three axes. This talk with discuss how placing potential elements in this space allows decisions to be made on which elements should be pursued.
Multi-component Modeling with Swift at Extreme ScaleDaniel S. Katz
This document discusses using the Swift parallel scripting system to model multi-component systems at extreme scale. Swift allows defining tasks that can run concurrently across many resources. It has been used for modeling problems with coupled components, like climate models with interacting atmosphere, ocean, and land components. The document outlines how Swift could express increasingly complex climate models and orchestrate in-situ analytics for extreme-scale simulations.
The Application Fault Tolerance (AFT) portion of the Jet Propulsion Laboratory-led Remote Exploration and Experimentation (REE) final review, May 2001, with references to REE-produced AFT papers added after the review (last three slides)
Metrics & Citation for Software (and Data)Daniel S. Katz
A talk about why metrics and citation for software (and data) are important to NSF and the science & engineering community, and what a number of projects are trying to do to improve the situation. Presented at "Workshop on Supporting Scientific Discovery through Norms and Practices for Software and Data Citation and Attribution", Washington, DC, 29 Jan 2015
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 6
Funding Software in Academia
1. Funding Software in Academia
Daniel S. Katz
dkatz@nsf.gov & d.katz@ieee.org
@danielskatz
Program Director, Division of
Advanced Cyberinfrastructure
(http://www.slideshare.net/danielskatz/
funding-softwareinadademia
Academia Town Hall at UW, Seattle, WA, 2 Feb 2015
2. Software Science
Software
Computing
Infrastructure
• Software (including services)
essential for the bulk of science
- About half the papers in recent issues
of Science were software-intensive
- Research becoming dependent upon
advances in software
- Wide range of software types: system,
applications, modeling, gateways, analysis,
algorithms, middleware, libraries
- Significant software-intensive project across NSF: e.g.
NEON, OOI, NEES, NCN, iPlant, etc
• Software is not a one-time effort, it must be
sustained
• Development, production, and maintenance are people
intensive
• Software life-times are long vs hardware
• Software has under-appreciated value
3. Research Software vs
Infrastructure Software
• Some software is intended for
research
– Funded by many parts of NSF,
sometimes explicitly, often implicitly
– Intended for use by developer
• Other software is intended as
infrastructure
– Funded by many parts of NSF, often
ACI, almost always explicitly
– Intended for use by community
4. Research Software Challenges
• Things are mostly good
• Challenges
– “Software Engineering” skills for
developers
– Reproducibility
– Inheritance
• All of these are also important for
infrastructure software, plus...
5. Software as Infrastructure
• Cyberinfrastructure Framework for 21st Century Science and
Engineering (CIF21)
– Cross-NSF portfolio of activities to provide integrated cyber
resources that will enable new multidisciplinary research
opportunities in all science and engineering fields by
leveraging ongoing investments and using common
approaches and components (http://www.nsf.gov/cif21)
• ACCI task force reports (http://www.nsf.gov/od/oci/taskforces/index.jsp)
– Campus Bridging, Cyberlearning & Workforce
Development, Data & Visualization, Grand Challenges,
HPC, Software for Science & Engineering
• Software Vision and Strategy Report
– http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12113
• Implementation of Software Vision
– http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504817
6. See http://bit.ly/sw-ci for current projects
SI2: 5 rounds of
funding, 65 SSEs
SI2: 4 rounds of
funding, 35 SSIs
SI2: 2 rounds of
funding, 14 S2I2
conceptualizations
NSF Software Infrastructure Projects &
Software Infrastructure for Sustained Innovation (SI2)
SSE & SSI – NSF 14-520: Cross-NSF, all Directorates participating
Next SSEs due today; Next SSIs due June 2015
7. NSF Software as Infrastructure Challenges
• In these programs, ACI works with other NSF
units to support projects that lead to software
as an element of infrastructure
• Issue: amount of software that is
infrastructure grows over time, and grows
faster than NSF funding
Q: How can NSF ensure that software as
infrastructure continues to appear, without
funding all of it?
A: Incentives
To judge software, need to
understand/forecast impact
8. Other Software Challenges
• Working Towards Sustainable Software for
Science: Practice and Experience
(WSSSPE)
– http://wssspe.researchcomputing.org.uk
– 3 workshops held
• Lessons:
Many of the issues in developing
sustainable software are social, not
technical
Software work is inadequately visible in
ways that “count” within the reputation
system underlying science
9. Challenges & Hypothesis
• To judge software, need to understand/forecast impact
• Q: How can NSF ensure that software as infrastructure
continues to appear, without funding all of it?
• A: Incentives
• Many of the issues in developing sustainable software are
social, not technical
• Software work is inadequately visible in ways that “count”
within the reputation system underlying science
Hypothesis: better measurement of
contributions can lead to rewards
(incentives), leading to career paths,
willingness to join communities, leading to
more sustainable software
10. Consequences & Discussion
• Metrics – How to measure software contributions,
particularly in academic system?
– Not just authors by order, but for all contributors
– Need institutional buy-in, e.g., researcher profiles
– Publisher involvement is essential
• Career paths – Is there a role for non-tenure-track
researchers who produce software, data, etc. in
universities?
– Assuming yes, do universities recognize and support this? If
not, how to get them to?
• Reproducibility and related requirements
• Tools, including provenance systems
• Lots of players, e.g. NSF, NIH, DOE, JISC, RCUK, Sloan &
Moore, Wellcome, universities, Mozilla, Apache, Zenodo,
GitHub, publishers, DataCite, CrossRef, VIVO, ...