How have the consequences of software problems changed over the past 30 years? To begin to answer this question, we analyzed 386,381 news articles reporting on software problems published between 1980 and 2012, spanning widely circulated newspapers to small trade magazines. Our results show that after an increase in reporting just prior to Y2K, news on software problems has declined in North America, but increased in the rest of the world. Most articles only report minor consequences such as frustration, confusion, anger, or at worst, having to delay some activity for a few hours, usually due to service outages in government, transportation, finance, and information services. However, about once per month, the news reports at least one death, injury, or threatened access to food or shelter due to software problems. Reports of these severe consequences are also increasing, due primarily to stories about transportation and government software.
Who creates trends in online social mediaAmir Razmjou
The document discusses and compares two paradigms for how trends spread in online social media: the influential hypothesis and the crowd hypothesis. The influential hypothesis suggests that a small number of "opinion leaders" drive the adoption of new trends, acting as intermediaries between the media and the public. However, the document argues this view is outdated and does not apply well to today's internet era. Instead, evidence shows the crowd's early participation in spreading ideas can lead to widespread diffusion, rather than domination by influential elites. Analysis of slang word adoption over time supports the crowd hypothesis, finding ordinary users with around 150-300 followers play a key role in trends going viral.
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...Nicolas Bettenburg
Software development is a largely collaborative effort, of which the actual encoding of program logic in source code is a relatively small part. Software developers have to collaborate effectively and communicate with their peers in order to avoid coordination problems. To date, little is known how developer communication during software development activities impacts the quality and evolution of a software.
In this thesis, we present and evaluate tools and techniques to recover communication data from traces of the software development activities. With this data, we study the impact of developer communication on the quality and evolution of the software through an in-depth investigation of the role of developer communication during software development activities. Through multiple case-studies on a broad spectrum of open-source software projects, we find that communication between developers stands in a direct relationship to the quality of the software. Our findings demonstrate that our models based on developer communication explain software defects as well as state-of-the art models that are based on technical information such as code and process metrics, and that social information metrics are orthogonal to these traditional metrics, leading to a more complete and integrated view on software defects. In addition, we find that communication between developers plays a important role in maintaining a healthy contribution management process, which is one of the key factors to the successful evolution of the software. Source code contributors who are part of the community surrounding open-source projects are available for limited times, and long communication times can lead to the loss of valuable contributions.
Our thesis illustrates that software development is an intricate and complex process that is strongly influenced by the social interactions between the stakeholders involved in the development activities. A traditional view based solely on technical aspects of software development such as source code size and complexity, while valuable, limits our understanding of software development activities. The research presented in this thesis consists of a first step towards gaining a more holistic view on software development activities.
Workshop session 10 - Alternatives to CATI (3) address-based sampling and pus...The Social Research Centre
Social Research Centre workshop - Telephone Surveying in the Post-Modern Era, held Thursday 10 October 2019. Presentation by Anna Lethborg - Research Director (Social Research Centre)
The document discusses correlating scholarly networks derived from citations and social networks derived from tweets mentioning academic papers. It presents the motivation to study this correlation and describes collecting over 17,000 tweets referencing papers from ArXiv.org. Networks are constructed connecting papers mentioned by the same users, with edge weights based on time between tweets. Several network analysis metrics and case studies are discussed, finding multi-disciplinary papers are most discussed in both networks and core-periphery community structures. Areas for further work include integrating multiple social networks and modeling network dynamics over time.
The network administrator has received complaints about poor call quality and slow network performance on the college's converged network. Upon reviewing logs from the network monitoring solution, it was discovered that unauthorized users from a nearby new housing development were accessing the college's wireless network and generating excessive traffic, impacting performance. The network administrator must research enterprise wireless access control solutions and submit a 1-2 page report to the CIO recommending a solution to restrict unauthorized access and resolve the network issues.
Predicting Defects Using Change Genealogies (ISSE 2013)Kim Herzig
This document discusses using change genealogies, which model dependencies between code changes, to predict defects. It finds that models using change genealogy metrics outperform those based on code complexity or dependency networks alone, achieving better precision while maintaining close recall. Key metrics include network efficiency and relationships between changes and dependency types. The study confirms that code entities combining functionalities from multiple older changes are more defect-prone.
This document summarizes research on social networks and mobile phone use in New Zealand, Germany, and the United States. The research used surveys and focus groups of university students to examine preferred communication methods, attitudes towards text messaging, and characteristics of social networks. Key findings include that text messaging was the preferred way to contact friends, especially among females, and that mobile phones are important for maintaining social relationships and networks as contact information is stored in phone books. Social networks drawn by participants showed that females tended to have denser networks than males.
Who creates trends in online social mediaAmir Razmjou
The document discusses and compares two paradigms for how trends spread in online social media: the influential hypothesis and the crowd hypothesis. The influential hypothesis suggests that a small number of "opinion leaders" drive the adoption of new trends, acting as intermediaries between the media and the public. However, the document argues this view is outdated and does not apply well to today's internet era. Instead, evidence shows the crowd's early participation in spreading ideas can lead to widespread diffusion, rather than domination by influential elites. Analysis of slang word adoption over time supports the crowd hypothesis, finding ordinary users with around 150-300 followers play a key role in trends going viral.
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...Nicolas Bettenburg
Software development is a largely collaborative effort, of which the actual encoding of program logic in source code is a relatively small part. Software developers have to collaborate effectively and communicate with their peers in order to avoid coordination problems. To date, little is known how developer communication during software development activities impacts the quality and evolution of a software.
In this thesis, we present and evaluate tools and techniques to recover communication data from traces of the software development activities. With this data, we study the impact of developer communication on the quality and evolution of the software through an in-depth investigation of the role of developer communication during software development activities. Through multiple case-studies on a broad spectrum of open-source software projects, we find that communication between developers stands in a direct relationship to the quality of the software. Our findings demonstrate that our models based on developer communication explain software defects as well as state-of-the art models that are based on technical information such as code and process metrics, and that social information metrics are orthogonal to these traditional metrics, leading to a more complete and integrated view on software defects. In addition, we find that communication between developers plays a important role in maintaining a healthy contribution management process, which is one of the key factors to the successful evolution of the software. Source code contributors who are part of the community surrounding open-source projects are available for limited times, and long communication times can lead to the loss of valuable contributions.
Our thesis illustrates that software development is an intricate and complex process that is strongly influenced by the social interactions between the stakeholders involved in the development activities. A traditional view based solely on technical aspects of software development such as source code size and complexity, while valuable, limits our understanding of software development activities. The research presented in this thesis consists of a first step towards gaining a more holistic view on software development activities.
Workshop session 10 - Alternatives to CATI (3) address-based sampling and pus...The Social Research Centre
Social Research Centre workshop - Telephone Surveying in the Post-Modern Era, held Thursday 10 October 2019. Presentation by Anna Lethborg - Research Director (Social Research Centre)
The document discusses correlating scholarly networks derived from citations and social networks derived from tweets mentioning academic papers. It presents the motivation to study this correlation and describes collecting over 17,000 tweets referencing papers from ArXiv.org. Networks are constructed connecting papers mentioned by the same users, with edge weights based on time between tweets. Several network analysis metrics and case studies are discussed, finding multi-disciplinary papers are most discussed in both networks and core-periphery community structures. Areas for further work include integrating multiple social networks and modeling network dynamics over time.
The network administrator has received complaints about poor call quality and slow network performance on the college's converged network. Upon reviewing logs from the network monitoring solution, it was discovered that unauthorized users from a nearby new housing development were accessing the college's wireless network and generating excessive traffic, impacting performance. The network administrator must research enterprise wireless access control solutions and submit a 1-2 page report to the CIO recommending a solution to restrict unauthorized access and resolve the network issues.
Predicting Defects Using Change Genealogies (ISSE 2013)Kim Herzig
This document discusses using change genealogies, which model dependencies between code changes, to predict defects. It finds that models using change genealogy metrics outperform those based on code complexity or dependency networks alone, achieving better precision while maintaining close recall. Key metrics include network efficiency and relationships between changes and dependency types. The study confirms that code entities combining functionalities from multiple older changes are more defect-prone.
This document summarizes research on social networks and mobile phone use in New Zealand, Germany, and the United States. The research used surveys and focus groups of university students to examine preferred communication methods, attitudes towards text messaging, and characteristics of social networks. Key findings include that text messaging was the preferred way to contact friends, especially among females, and that mobile phones are important for maintaining social relationships and networks as contact information is stored in phone books. Social networks drawn by participants showed that females tended to have denser networks than males.
Systems Engineering and Requirements Management in Medical Device Product Dev...UBMCanon
Systems engineering is an interdisciplinary approach that focuses on defining customer needs, documenting requirements, and enabling the realization of successful systems. It considers both business and technical needs across the entire life cycle from concept to disposal. Requirements management is the foundation of systems engineering. Organizations can improve processes and reduce risks through structured approaches like the Systems Engineering V-Model and maturity models like CMMI that provide standard processes and best practices. Verification and validation are used to ensure a system meets its requirements through methods like testing, analysis and demonstration.
The Direct Project aims to create standards and services to enable secure and meaningful health information exchange over the internet. Current methods of exchanging health data, like fax and unsecured email, are slow, inconvenient, and lack security. The Direct Project has over 60 organizational members and 200 participants working on standards development, implementation, security, and testing to support nationwide health information exchange. The project will be piloted with various provider organizations to evaluate the specifications in real-world settings.
The IEEE 1633 provides practical guidance for developing reliable software and making key decisions that include reliability. There are qualitative and quantitative tasks starting from the beginning of the program until deployment. These methods are applicable for agile and incremental development environments. In fact, they work better in an agile environment. This document has practical step by step instructions for how to identify failure modes and root cause, identify risks that are often overlooked, predict defects before the code is even written, plan staffing levels for testing and support, evaluate reliability during testing, and make a release decision. Examples of the techniques are provided. This document was written by people who have real world experience in making software more reliable while still on time and within budget. It covers software failure modes effects analysis, software fault trees, software defect root cause analysis, reliability predictions, defect density predictions, software reliability benchmarking, software reliability growth estimation, developing a reliability driven test suite, allocating reliability to software, evaluating the portion of the total system failures that will be caused by the software, and managing software for reliability. The working group is chaired by Ann Marie Neufelder who is the global leader in reliable software. The document will be updated in 2023 for the Common Defect Enumeration and relationship with DevSecOps.
SD West 2008: Call the requirements police, you've entered design!Alan Bustamante
The document discusses best practices for separating requirements and design, as including too much design detail in requirements documentation can lead to increased rework, unrealistic customer expectations, and unnecessary constraints on developers, and it provides recommendations such as using control-agnostic use cases and defining data elements in a shared glossary rather than within use cases.
Understanding and Improving Software Productivityssuser2be5eb
The document discusses challenges in measuring and understanding software productivity. It reviews 30 studies of software productivity from the 1970s to early 2000s. Key findings include that most studies are inadequate due to poor measurement, productivity varies greatly between individuals and languages, and contradictory findings are common. Improving productivity requires understanding factors like development environment attributes, product attributes, and staff attributes. Moving forward, the document advocates for developing setting-specific theories of software production and using knowledge management approaches.
We present a novel application of advanced logic-based business rules (Rulelog) and natural language processing (Textual Logic): to automatically generate detailed explanations for business decision making and education. Business users need to deeply understand the critical knowledge that underlies organizational decisions. Students at all levels want to understand the "why" not just the "what", and benefit from automated support for developing higher cognitive and critical thinking skills. Our approach is domain-neutral, and its generated explanations are interactively navigable. We give two case studies:
· Regulations & policies in financial compliance. A key challenge in compliance is the ever-growing amount and complexity of regulations & policies. Our approach aids compliance professionals to interpret the results of automated decision support systems and to contribute more rapidly and effectively to evolving the implemented business rules that underlie those systems.
· Review and test prep in college-level science. A key challenge in education is the high cost of personalizing the learning experience. Our approach reduces that cost, deepens the learning experience, and empowers the student.
Here are the key characteristics of engineering:
1. Engineers proceed by making a series of decisions, carefully evaluating options and choosing appropriate approaches at each decision point based on tradeoff analysis of costs and benefits.
2. Engineers measure things quantitatively when appropriate, calibrate and validate measurements, and use approximations based on experience.
3. Engineers design things, whether physical systems, software systems, or business processes. Design involves establishing requirements, generating alternative solutions, evaluating alternatives, and choosing a solution.
4. Engineers build or construct things, whether physical prototypes or software systems. They integrate components and ensure everything works together as intended.
5. Engineers test things, whether by simulation, experimentation, or operational trials,
This document discusses software engineering. It aims to introduce software engineering and provide a framework for understanding the course. It discusses objectives like understanding what software engineering is and why it is important. It also discusses factors for software failures like increasing demands and low expectations of software quality. The history of software engineering is discussed, noting the term was coined in 1968 to address the "software crisis" of unreliable, over budget software projects. New techniques were then developed throughout the 1970s-1980s to help address these issues.
Unsustainable Regaining Control of Uncontrollable AppsCAST
The ever-growing cost to maintain systems continues to crush IT organizations robbing their ability to fund innovation while increasing risks across the organization. There are, however, some tactics to reduce application total ownership cost, reduce complexity and improve sustainability across your portfolio.
Angelbeat -Cut Your Troubleshooting_Time-In-HalfRick Kingsley
This document discusses how network and application performance monitoring and diagnostics (NPMD) technology can help troubleshoot issues faster and reduce costs. It provides an overview of today's IT challenges involving complex applications and infrastructure. NPMD solutions that provide high-level views, back-in-time analytics, deep packet inspection, and an understanding of infrastructure and application dependencies can help meet these challenges. Case studies show how such solutions provide tangible returns on investment through reduced troubleshooting time and increased uptime.
This document provides an introduction to the course CS 3041 - Human Computer Interaction. It discusses what HCI is and notes it is an interdisciplinary field that combines knowledge from various professions. The document outlines the instructor's contact information and office hours. It also summarizes the goals and philosophy of the course, as well as grading criteria and an overview of the topics and chapters that will be covered.
Knowledge Matters Issue 15 - Technology at ConcernEllen Ward
Digital data gathering (DDG) has improved Concern Worldwide's monitoring and evaluation practices in several ways:
1) It provides instant access to accurate data in real-time, accelerating evidence-based decision making.
2) It improves data reliability through built-in error controls and removing manual data entry.
3) It reduces data loss through automatic uploading and GPS tracking of enumerators.
4) It facilitates centralized data management through a uniform system.
While DDG has benefits, technology does not solve all problems - data collection is just one step, and other M&E processes like indicators and analysis plans are still important. Concern is looking to expand DDG use for regular monitoring and introduce
Redgrave LLP New Review Strategies - eDiscoveryRedgrave LLP
Redgrave LLP Partners Jonathan and Victoria Redgrave, along with the CEO of IT.com Mark Cordover and the Director of Litigation for the National Archives & Records Administration Jason R. Baron, address why deploying technology is a “must” for medium and large companies dealing with the expanding volume of information involved in litigation.
Redgrave LLP New Review Strategies - eDiscoveryRedgrave LLP
Redgrave LLP Partners Jonathan and Victoria Redgrave, along with the CEO of IT.com Mark Cordover and the Director of Litigation for the National Archives & Records Administration Jason R. Baron, address why deploying technology is a “must” for medium and large companies dealing with the expanding volume of information involved in litigation.
2016 velocity santa clara state of dev ops report deck finalNicole Forsgren
The document summarizes findings from the 2016 State of DevOps Report. It shows that high-performing organizations outperform their peers in terms of throughput and lead time for changes. Employees in high-performing organizations are more likely to recommend their organization as a great place to work. These organizations spend less time fixing security issues by addressing security at every stage of development. The document also discusses how high performers spend more time on new work and less time on unplanned work and rework through the use of continuous delivery practices.
Java learn from basic part chapter_01 short notes to understand the java quic...GaytriMate
Software is the set of instructions and data that enable computers to function. It encompasses computer programs, data storage, and documentation. There are different types of software including generic products sold broadly and customized products tailored for specific customers. Software engineering aims to develop software through systematic, disciplined processes to produce reliable and efficient software economically. It involves layers of process, methods, tools, and a quality focus. A generic software process framework includes activities like communication, planning, modeling, construction, and deployment managed through umbrella activities.
Topic 4 - impact of Information Technology.pptx.pdflundamiko
Technology has revolutionized how organizations operate in several ways:
1. Information technology systems allow organizations to simplify transaction processing and enable customers, suppliers, and distributors to interact through communication technologies like the internet.
2. Decision support systems and data-driven systems help workers access and analyze internal and external information to facilitate decision making.
3. Communication technologies like email allow easy and fast communication within and outside the organization.
Systems Engineering and Requirements Management in Medical Device Product Dev...UBMCanon
Systems engineering is an interdisciplinary approach that focuses on defining customer needs, documenting requirements, and enabling the realization of successful systems. It considers both business and technical needs across the entire life cycle from concept to disposal. Requirements management is the foundation of systems engineering. Organizations can improve processes and reduce risks through structured approaches like the Systems Engineering V-Model and maturity models like CMMI that provide standard processes and best practices. Verification and validation are used to ensure a system meets its requirements through methods like testing, analysis and demonstration.
The Direct Project aims to create standards and services to enable secure and meaningful health information exchange over the internet. Current methods of exchanging health data, like fax and unsecured email, are slow, inconvenient, and lack security. The Direct Project has over 60 organizational members and 200 participants working on standards development, implementation, security, and testing to support nationwide health information exchange. The project will be piloted with various provider organizations to evaluate the specifications in real-world settings.
The IEEE 1633 provides practical guidance for developing reliable software and making key decisions that include reliability. There are qualitative and quantitative tasks starting from the beginning of the program until deployment. These methods are applicable for agile and incremental development environments. In fact, they work better in an agile environment. This document has practical step by step instructions for how to identify failure modes and root cause, identify risks that are often overlooked, predict defects before the code is even written, plan staffing levels for testing and support, evaluate reliability during testing, and make a release decision. Examples of the techniques are provided. This document was written by people who have real world experience in making software more reliable while still on time and within budget. It covers software failure modes effects analysis, software fault trees, software defect root cause analysis, reliability predictions, defect density predictions, software reliability benchmarking, software reliability growth estimation, developing a reliability driven test suite, allocating reliability to software, evaluating the portion of the total system failures that will be caused by the software, and managing software for reliability. The working group is chaired by Ann Marie Neufelder who is the global leader in reliable software. The document will be updated in 2023 for the Common Defect Enumeration and relationship with DevSecOps.
SD West 2008: Call the requirements police, you've entered design!Alan Bustamante
The document discusses best practices for separating requirements and design, as including too much design detail in requirements documentation can lead to increased rework, unrealistic customer expectations, and unnecessary constraints on developers, and it provides recommendations such as using control-agnostic use cases and defining data elements in a shared glossary rather than within use cases.
Understanding and Improving Software Productivityssuser2be5eb
The document discusses challenges in measuring and understanding software productivity. It reviews 30 studies of software productivity from the 1970s to early 2000s. Key findings include that most studies are inadequate due to poor measurement, productivity varies greatly between individuals and languages, and contradictory findings are common. Improving productivity requires understanding factors like development environment attributes, product attributes, and staff attributes. Moving forward, the document advocates for developing setting-specific theories of software production and using knowledge management approaches.
We present a novel application of advanced logic-based business rules (Rulelog) and natural language processing (Textual Logic): to automatically generate detailed explanations for business decision making and education. Business users need to deeply understand the critical knowledge that underlies organizational decisions. Students at all levels want to understand the "why" not just the "what", and benefit from automated support for developing higher cognitive and critical thinking skills. Our approach is domain-neutral, and its generated explanations are interactively navigable. We give two case studies:
· Regulations & policies in financial compliance. A key challenge in compliance is the ever-growing amount and complexity of regulations & policies. Our approach aids compliance professionals to interpret the results of automated decision support systems and to contribute more rapidly and effectively to evolving the implemented business rules that underlie those systems.
· Review and test prep in college-level science. A key challenge in education is the high cost of personalizing the learning experience. Our approach reduces that cost, deepens the learning experience, and empowers the student.
Here are the key characteristics of engineering:
1. Engineers proceed by making a series of decisions, carefully evaluating options and choosing appropriate approaches at each decision point based on tradeoff analysis of costs and benefits.
2. Engineers measure things quantitatively when appropriate, calibrate and validate measurements, and use approximations based on experience.
3. Engineers design things, whether physical systems, software systems, or business processes. Design involves establishing requirements, generating alternative solutions, evaluating alternatives, and choosing a solution.
4. Engineers build or construct things, whether physical prototypes or software systems. They integrate components and ensure everything works together as intended.
5. Engineers test things, whether by simulation, experimentation, or operational trials,
This document discusses software engineering. It aims to introduce software engineering and provide a framework for understanding the course. It discusses objectives like understanding what software engineering is and why it is important. It also discusses factors for software failures like increasing demands and low expectations of software quality. The history of software engineering is discussed, noting the term was coined in 1968 to address the "software crisis" of unreliable, over budget software projects. New techniques were then developed throughout the 1970s-1980s to help address these issues.
Unsustainable Regaining Control of Uncontrollable AppsCAST
The ever-growing cost to maintain systems continues to crush IT organizations robbing their ability to fund innovation while increasing risks across the organization. There are, however, some tactics to reduce application total ownership cost, reduce complexity and improve sustainability across your portfolio.
Angelbeat -Cut Your Troubleshooting_Time-In-HalfRick Kingsley
This document discusses how network and application performance monitoring and diagnostics (NPMD) technology can help troubleshoot issues faster and reduce costs. It provides an overview of today's IT challenges involving complex applications and infrastructure. NPMD solutions that provide high-level views, back-in-time analytics, deep packet inspection, and an understanding of infrastructure and application dependencies can help meet these challenges. Case studies show how such solutions provide tangible returns on investment through reduced troubleshooting time and increased uptime.
This document provides an introduction to the course CS 3041 - Human Computer Interaction. It discusses what HCI is and notes it is an interdisciplinary field that combines knowledge from various professions. The document outlines the instructor's contact information and office hours. It also summarizes the goals and philosophy of the course, as well as grading criteria and an overview of the topics and chapters that will be covered.
Knowledge Matters Issue 15 - Technology at ConcernEllen Ward
Digital data gathering (DDG) has improved Concern Worldwide's monitoring and evaluation practices in several ways:
1) It provides instant access to accurate data in real-time, accelerating evidence-based decision making.
2) It improves data reliability through built-in error controls and removing manual data entry.
3) It reduces data loss through automatic uploading and GPS tracking of enumerators.
4) It facilitates centralized data management through a uniform system.
While DDG has benefits, technology does not solve all problems - data collection is just one step, and other M&E processes like indicators and analysis plans are still important. Concern is looking to expand DDG use for regular monitoring and introduce
Redgrave LLP New Review Strategies - eDiscoveryRedgrave LLP
Redgrave LLP Partners Jonathan and Victoria Redgrave, along with the CEO of IT.com Mark Cordover and the Director of Litigation for the National Archives & Records Administration Jason R. Baron, address why deploying technology is a “must” for medium and large companies dealing with the expanding volume of information involved in litigation.
Redgrave LLP New Review Strategies - eDiscoveryRedgrave LLP
Redgrave LLP Partners Jonathan and Victoria Redgrave, along with the CEO of IT.com Mark Cordover and the Director of Litigation for the National Archives & Records Administration Jason R. Baron, address why deploying technology is a “must” for medium and large companies dealing with the expanding volume of information involved in litigation.
2016 velocity santa clara state of dev ops report deck finalNicole Forsgren
The document summarizes findings from the 2016 State of DevOps Report. It shows that high-performing organizations outperform their peers in terms of throughput and lead time for changes. Employees in high-performing organizations are more likely to recommend their organization as a great place to work. These organizations spend less time fixing security issues by addressing security at every stage of development. The document also discusses how high performers spend more time on new work and less time on unplanned work and rework through the use of continuous delivery practices.
Java learn from basic part chapter_01 short notes to understand the java quic...GaytriMate
Software is the set of instructions and data that enable computers to function. It encompasses computer programs, data storage, and documentation. There are different types of software including generic products sold broadly and customized products tailored for specific customers. Software engineering aims to develop software through systematic, disciplined processes to produce reliable and efficient software economically. It involves layers of process, methods, tools, and a quality focus. A generic software process framework includes activities like communication, planning, modeling, construction, and deployment managed through umbrella activities.
Topic 4 - impact of Information Technology.pptx.pdflundamiko
Technology has revolutionized how organizations operate in several ways:
1. Information technology systems allow organizations to simplify transaction processing and enable customers, suppliers, and distributors to interact through communication technologies like the internet.
2. Decision support systems and data-driven systems help workers access and analyze internal and external information to facilitate decision making.
3. Communication technologies like email allow easy and fast communication within and outside the organization.
Similar to Thirty Years of Software Problems in the News (20)
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
1. Thirty Years of
Software Problems
in the News
Bryan Dosono, Syracuse University
Andrew J. Ko, University of Washington
Neeraja Duriseti, AT&T
2. Related work
• Software transforms society as it reshapes
globalization, social relationships, and other aspects of
everyday work
• Software isn’t perfect; it can fail spectacularly
• Most water, gas, and electricity failures now involve
software failures (Rahman, Beznosov & Martí, 2009)
• Security breaches are now more prevalent and more
costly than ever before (Garrison & Ncube, 2011)
• Cars recalled more frequently than ever because of
software defects (Mark, Bagdouri, Palen, Martin, Al-Ani &
Anderson, 2012)
INTRODUCTION
METHOD
RESULTS
DISCUSSION
CONCLUSION
QUESTIONS
3. Research question
•How have the consequences of
software problems changed over
the past 30 years?
INTRODUCTION
METHOD
RESULTS
DISCUSSION
CONCLUSION
QUESTIONS
4. Data collection
• Gathered text to build a corpus for analysis
• Three major news archives: ProQuest, LexisNexus, Factiva
• Databases included 93 out of 100 most circulated
newspapers
• Only considered English news coverage
• Searched for words commonly referring to software
problems
• i.e., bug, defect, error, fault, flaw, malfunction, glitch, mistake
• “Glitch” consistently referred to software problems (60%)
• 386,381 articles published from 1980-2012 with word “glitch”
INTRODUCTION
METHOD
RESULTS
DISCUSSION
CONCLUSION
QUESTIONS
5. Termsconsideredforretrievingarticles
Term
% About
Software
# Articles
With Term
# About
Software
Dominant Topics
glitch 60% 49,537 ~30,000 software problems
bug 11% 182,150 ~20,000 insects, illness
malfunction 8% 24,154 ~2,000 mechanical problems
defect 3% 92,300 ~3,000 taxes, birth defects
flaw 3% 106,240 ~3,000 construction, decisions
error 2% 629,832 ~12,000 sports, news, medicine
mistake 1% 586,720 ~6,000 decision making
fault 1% 265,383 ~3,000 politics
INTRODUCTION
RESULTS
DISCUSSION
CONCLUSION
QUESTIONS
METHOD
6. Clustering articles to stories
• Developed new clustering algorithm
• Converted each article’s words into a vector of tf-idf scores
• Iterated through all articles in order of publication date
• Compared each article to all subsequent articles published
within a 7-day window
INTRODUCTION
METHOD
RESULTS
DISCUSSION
CONCLUSION
QUESTIONS
7. Story measurements
Quantitative variables
• Normalized article
count
• Frequency of news
source location by
geographic region
Qualitative variables
• Categorized articles
with specific codes
among the three
authors (Table 2)
• Three rounds of
redundant coding until
reaching satisfactory
intercoder reliability
INTRODUCTION
METHOD
RESULTS
DISCUSSION
CONCLUSION
QUESTIONS
8. Codesusedtoclassifyeachstory
Code Description
software system failure caused by a software defect
death one or more people lost their lives
harm one or more people were physically harmed
basic one or more people lost access to food or shelter
property one or more people permanently lost material goods, money, or data
delay one or more people had to postpone an activity due
INTRODUCTION
METHOD
RESULTS
DISCUSSION
CONCLUSION
QUESTIONS
9. Top10storiesbynormalizedarticlecount
Software Problem Reported Year # Articles
The aftermath of the year 2000 millennium bug 2000 17,193
E-voting defects in the 2004 US elections 2004 2,509
Preparations for the year 2000 millennium bug 1998 1,745
Medicare drug eligibility defect 2006 1,983
Toyota Prius sudden acceleration bug 2010 3,782
Delayed e-voting tallies in the 2008 US elections 2008 1,796
Incorrect voting tallies from e-voting machines 2006 1,236
Incorrect New Mexico voting tally 2000 1,003
Atlanta Olympics results delivery to news delayed 1996 447
E-voting defects in the 2002 US mid-term elections 2002 636
INTRODUCTION
METHOD
RESULTS
DISCUSSION
CONCLUSION
QUESTIONS
16. Discussion
• Several possible causes could underlie the
observed news reporting trends
• Software problems are becoming harder to observe
• Most of the stories in our data are based on press
releases or investigations of public organizations
• Software failures could be more common within
private organizations but are simply not reported
INTRODUCTION
METHOD
DISCUSSION
RESULTS
CONCLUSION
QUESTIONS
17. Discussion
• New directions for software engineering practice
• Software should be used in supplement to legacy
process systems rather than as a cure-all for
problems
• Software design practices should consider the
potential for failure in all parts of a system’s
functionality
• Adapt frameworks on universal human needs into a
software inspection checklist
INTRODUCTION
METHOD
DISCUSSION
RESULTS
CONCLUSION
QUESTIONS
18. Discussion
• Current software lacks
• A middle ground between working and not working
• Tools to quickly isolate and identify causes for failure
• Highly informative error messages to speed the
recovery process
• Software should be designed with the recovery
process in mind
INTRODUCTION
METHOD
DISCUSSION
RESULTS
CONCLUSION
QUESTIONS
19. Conclusion
• Consequences of software problems on society can
range from trivial to catastrophic
• The majority of reported software problems are
brief inconveniences
• Software engineering practices still need
improvement as we design software for the future
INTRODUCTION
METHOD
CONCLUSION
DISCUSSION
RESULTS
QUESTIONS
20. Any questions?
• Acknowledgements
• Thank you to Parmit Chilana for her comments on early
drafts of this work
• This material is based in part upon work supported by the
National Science Foundation under Grant Numbers CCF-
0952733 and CCF-1153625
• Any opinions, findings, and conclusions or
recommendations expressed in this material are those of
the author(s) and do not necessarily reflect the views of
the National Science Foundation
INTRODUCTION
METHOD
QUESTIONS
DISCUSSION
RESULTS
CONCLUSION
Editor's Notes
Hello, everyone. My name is Bryan Dosono and I am a PhD student in Information Science and Technology at Syracuse University. This is my first time at CHASE and I am incredibly thrilled to be here! On behalf of Andrew Ko (a professor from the University of Washington Information School) and Neeraja Duriseti (of AT&T), I will be presenting the research we conducted a couple of summers ago when I was still an undergraduate at the University of Washington that investigated the past thirty years of software problems in the news.
The ubiquity of software has transformed society in innumerable ways, reshaping globalization, social relationships, work, and countless other aspects of society. With these changes, however, have also come a dramatic shift in how society works: behind much of our modern infrastructure is now the automated digital logic of algorithms, often supplementing or even replacing the slow subjectivity of human decision making. Such automation naturally has tradeoffs: it can greatly increase productivity and lower costs, but it can also fail spectacularly. Most water, gas, and electricity failures now involve software failures, security breaches are now more prevalent and more costly than ever before, and cars are recalled more frequently than ever because of software defects.
How have the consequences of software problems changed over the past 30 years? In this paper, we begin to understand answer this question at the macro scale, investigating the trends in software problems and their consequences by mining archives of failures. Perhaps the most extensive archive of software problems and their consequences is news archives. News is probably the largest record of the acute effects of software failure on society. Through both quantitative and qualitative methods, we investigate the problems reported, the consequences they have had on people and society, and how the consequences reported in the news have changed over the last 30 years.
We retrieved news from ProQuest, LexisNexis, and Factiva, the three major news archives. These databases included full text articles for 93 of the top 100 most circulated newspapers in the United States. This data set may represent one of the most comprehensive samples of stories about software problems in history. To retrieve articles about software problems, we began by searching for words commonly referring to software problems, specifically searching for bug, defect, error, fault, flaw, malfunction, glitch and mistake.
We selected a random sample of search results for each of these terms to assess their viability, randomly selecting one article per quarter, per database, per term, from 1980 to 2012, for a total of 260 articles per term (2,080 articles overall). We then analyzed each article for whether it referred to a software problem or some other type of problem. As seen in this table, the only phrase that consistently referred to software problems was glitch. Only 11% of articles with the word bug and 9% of articles with the word malfunction referred to software: across the sources we sampled, these terms typically referred to insects, illness and mechanical failures. Thus, we decided to focus only on articles with the word glitch.
We developed an algorithm that converted each article’s words into a vector of tf-idf scores and then iterated through all articles in order of publication date, comparing each article to all subsequent articles published within a 7-day window. If the cosine similarity of any two article’s tf-idf term vector was .25 or greater (representing a modest similarity), then an edge was created between the two articles, creating a graph of related articles. When a similar article was found, the comparison window was then set to 7 days ahead of the date of the new relevant article. This allowed articles to be linked across weeks, months, and years, as long as at least one similar article was published within a week. After comparisons for each article were complete for a day, all disjoint graph components of the article graph whose latest dated article was more than 7 days before the current date were removed from the graph and treated as a story cluster. We chose 7 days because of the weekly news cycle.
The application of our algorithm grouped the 386,381 articles into 58,577 story clusters. In the rest of this paper, an article will refer to a single news report from a single news organization, whereas a story will refer to the set of articles identified by our clustering algorithm as reporting on the same software problem. To assess the accuracy of clustering, the authors independently read all articles in a random sample of 550 story clusters spanning 6,935 articles, judging each article as either relevant to the cluster or off-topic.
We measured the attention a story was given by the press by its computing normalized article count, which was the number of articles in a story cluster, excluding duplicates published in different sources, divided by the number of news sources that were available in the news databases in the year the story was published. This measured how many news organizations committed effort to reporting on the story, while controlling for the confounding factor of full text availability over time. To measure the geographical location from which a story was reported, we assigned a world region, country, and state (for US stories) to each news source that appeared more than 5 times in our data set. All other aspects of stories were qualitative measurements, which we obtained by coding a sample of story clusters and their articles using content analysis methods. Our sample included the subset of the 58,577 story clusters with 6 or more articles and a random sample of stories with fewer articles, to focus our analysis on the stories that received the most attention from the press. Reading these articles spanned about 1,000 person-hours across 2 months.
This table shows the variables that we coded. To assess the reliability of each variable in this table, the authors iteratively and redundantly coded random samples of 100 stories, each time computing Krippendorff’s alpha for each variable and discussing disagreements. After three rounds of redundant coding (covering 5% of all story clusters), we reached satisfactory agreement on all variables. The authors then split this sample into three sets and coded each story cluster independently, reading each article in a cluster. For each cluster with more than 100 articles, we read a random sample of at least 10% of stories to identify the dominant story topic.
While the notable stories varied greatly in their consequences, the top 100 stories receiving the most press (as measured by normalized article count of each story) were quite homogenous. The biggest story was by far the Y2K millennium defects. Another 13 stories covered fears and consequences of problems with e-voting software from 1998 to 2008: tallies were delayed and incorrect, machines froze, digital voter registration records were missing names, and votes were lost.
This figure shows the number of stories reported per month, separated by region. As the plot shows, there were few problems reported in the 1980s, with a rapid rise in the number of stories approaching the year 2000. After the year 2000, however, the frequency of reporting appears to have slowly declined overall. In interpreting these trends, it is important to remember that declines in stories in North America may represent declines in US journalism budgets, as opposed to declines in software problem frequency.
Death was rare. Among our coded sample, 47 stories reported one or more deaths related to software. Extrapolating this to the ~35,000 stories in our data, as shown in the top figure above, about 400 stories over 32 years likely reported deaths (about 1 story per month worldwide). A third of reported deaths were due to collisions in planes and cars, with most plane crashes accounting for most death counts above 50. On land, transit lines and railroads that used the same signaling system have repeatedly failed to detect trains traveling the wrong way. Toyota recalled its 2010 Hybrid Prius after discovering a design flaw that required a software change to fix the anti lock brake-system problem; the LA times reported in March 2010 that 102 people had died from this and other problems with sudden acceleration. In the air, plane crashes stemmed from pilots receiving faulty readings on cockpit screens, engines and thrusters being reversed mid-flight, and air traffic controllers guiding dangerous landings.
Injury was also rare. In our sample, 39 stories reported physical harm. Extrapolating this rate to all 35,000 stories, the number of injured since 1980 is likely about 12,500 (or about 30 people per month). Half of the stories reported injuries sustained on boats and ships because of bumps and tilts. In 1997, autopilot problems led a double-decker tourist sightseeing boat to ram into a bridge on the Seine River in Paris, injuring 28 people. Other stories reported injuries in entertainment, health, and manufacturing. On Broadway, a Spiderman stunt actor was injured after plunging 30 feet due to a computer-controlled harness failure. In 2008, a big television screen at the Piccadilly Gardens fan zone failed to show a highly anticipated football game as planned, resulting in thousands of fans violently rioting against the police, leading to injuries in both parties.
Threats to food and shelter were even less common than death and physical harm, appearing in only 28 coded stories. Extrapolating this to all 35,000 stories, the number of people reported to have lost access to food and/or shelter since 1980 is likely to be about 80 million (30 per month). A third of stories involved problems that resulted in the delay of government aid to vulnerable populations. Some involved welfare recipients that qualified for food stamps and students waiting for financial aid that was incorrectly denied or backlogged. Five stories reported on faulty natural disaster detection systems, including overlooked tornado warnings, missed wildfire notification dispatches, and 911 emergency calls that did not reach an operator. Threats to basic needs were related to region, with basic needs losses overrepresented in the US.
The loss of money, data, and material goods was much more common, reported in 498 of coded stories, and therefore about 4,400 stories over the 32 years. About 20% of stories reported data loss, often e-mails, court documents, or data that NASA probes failed to transmit. Another 20% were stories about companies losing revenue because customers had problems with the company’s software. The remaining bulk of property losses included incorrect charges that were never reimbursed, regulatory fines due to software defects, stock market loss, and security breaches. While it was difficulty to quantify the loss of data and physical goods, we were able to quantify loss money, as many articles explicitly reported it. Of a random sample of 100 stories, 28 reported an amount lost due to the software problem. We converted all of these amounts to USD using the average exchange rate for the month the story was published. As seen in the top figure, the resulting range was from $625 to $7 billion, with a median of $7.7 million lost.
Delay was the most common consequence, mentioned in 40.4% (14,160 stories) of articles. The most common were outages in government services, forcing people to wait days. Software delays were common and typically reported on widely publicized projects such as new government websites or utility services. Other common types of delays occurred due to outages in stock market trading, banks and ATMs, flights, voting and elections, NASA launches, phone service, public transportation, and web sites. The shortest delays concerned emergency responses to fires, injuries, and other time-critical services, where a few minutes of delay was a life and death matter.
Several possible causes could underlie the observed news reporting trends. One instance may be the increased experience with software and how it fails amongst common public expectations over the last three decades, making the failures less surprising and less newsworthy. A related cause could be that most software problems are becoming harder to observe. Most of the stories in our data are based on press releases or investigations of public organizations. Software failures could be more common within private organizations but are simply not reported. It could also be that people pay more attention to the public sector due to a distrust that public organizations can provide reliable service. This interpretation aligns with our finding that developing regions of the world where governments are believed to be less reliable report a greater number of software failures in the public sector.
Regardless of the underlying causes, our research suggests new directions for software engineering practice. People are highly reliant on software as the only solution. Software should be used in supplement to legacy process systems rather than as a cure-all for problems. For example, many airports had many alternative processes such as manual check-ins, transporting baggage by hand, etc. to keep operating during software failures. Software design practices should consider the potential for failure in all parts of a system’s functionality. The design of even a near perfect software system can lead to breakdowns in human-computer communication, making it hard to detect and fix errors before they escalate. One solution would adapt frameworks on universal human needs into a software inspection checklist. This checklist would focus on the data computed by the software and the decisions it makes and help review the potential for delay, loss, or other human risks.
It is impossible to predict all circumstances in which software will run. Delay and property loss (most common consequences of software failures according to our results) were often caused due to the time it takes support teams to detect, diagnose, and correct an error, rather than the software failure itself. Software should be designed with the recovery process in mind. Current software lacks a middle ground between working and not working, tools to quickly isolate and identify causes for failure, or highly informative error messages to speed the recovery process. Future software should be designed taking these factors into consideration.
We sought to understand the consequences of software problems on society and how they changing over time. Our results indicate a dramatic fluctuation in the seriousness of these consequences, but the majority of the consequences were brief inconveniences, such as delay or property loss. While the most severe consequences are becoming increasingly common, they are still relatively rare compared to other risks to humanity. Our results call for a clear definition of how software is correct in software engineering research and beckons practices to educate end users that software is, in fact, fallible.
Thank you all for being a gracious audience. I’d now like to open the floor for any questions you may have.
We would like to thank Parmit Chilana for her comments on early drafts of this work. This material is based in part upon work supported by the National Science Foundation under Grant Numbers CCF-0952733 and CCF-1153625. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.