SlideShare a Scribd company logo
1 of 6
“What are "Software Analytics" and who should care about them?”
October 2013
Scope of this Report
Everyone has a different definition of Software Analytics and so it is fair to say that this is a broad topic.
Hence, this report defines Software Analytics as much by the examples it gives as by one single formal
definition. In answering the “who should care?” part of the question, the report seeks to identify roles
that should be using Software Analytics already and/or could be using it more. We also look into the
future to see how Software Analytics might change software development in organizations in the way
that business intelligence has impacted some organizations.

What are “Software Analytics”?
This definition of “Software Analytics” comes from a 2011 paper by Zhang et al:
“Software Analytics” – to utilize the data-driven approach to enable software practitioners to
perform data exploration and analysis in order to obtain insightful and actionable information
for completing various tasks around software systems, software users, and software
development process.
So, can we think of Software Analytics as “big data” for software development in that the focus is on
mining the richness of data about software projects for a surprising new insight? In short, yes. But most
practitioners set the bar a little higher than just “new insights.” The goal of software analytics is to
produce real-time and actionable insights where “real-time” is defined to be within the time frame of a
project such that corrective action is feasible more quickly than it would be by just letting the project
fail.
Some examples of Software Analytics include (more examples are provided later in the report):
 Using data from previous projects to size and estimate future projects
 Using defect data to allocate effort in a running project
 Benchmarking projects and project against industry data
 Statistically and/or heuristically analyzing code to look for patterns and anti-patterns.
Menzies and Zimmerman identify one important goal and constraint for Software Analytics – it is about
using lessons from one set of projects to improve results on another set of projects. Consequently, the
value of Software Analytics is undermined if the right data is not available for analysis (because it
doesn’t exist, is not sufficiently structured or the owners are not prepared to share) or if the results
cannot be readily shared (because they are not sufficiently structured or the owners are not prepared to
share).
Current use of the term “Software Analytics” suggests a tipping point. In the past, “Software Analytics”
seems to have been used quite narrowly, more often than not in the context of analyzing defects. More
©2013 David Consulting Group

Page 1 of 6

v1
recently and perhaps due to the growth in business intelligence (“Big Data” if you must), an awareness
has grown that the software development process and the subsequent operation of the software
throws off a lot of data. Historically, some of this data has been analyzed but much has been ignored by
organizations. Going forward, under the more broadly defined “Software Analytics” banner, there is an
expectation that important insights can be uncovered to improve decision-making around software
development and operation.

In his paper, “Searching under the Streetlight for Useful Software Analytics,” Johnson makes an excellent
point about the importance of unobtrusive data collection. One of the reasons that much data goes to
waste is because, in Johnson’s words: “For developers, one of the most frustrating aspects of manual
data collection is the loop of doing some work and then interrupting it to record what they worked on.”
We would go further to add that most developers are trained to question the purpose of any
measurement and analysis of what they do and smart enough to “game” any data collection system that
they don’t value.

Different and Distinct Kinds of Software Analytics
Everyone has a different definition of software analytics and this is justified to some extent by the
variety. For example:
 Internal versus External Analytics
o Live (or directly accessible) Data versus Stale (or exported and sent to a 3rd party) Data –
Internal analysts are much more likely to be able to access live data than external
analysts.
o Privacy – external analysts may necessarily only have access to anonymous data
 Quantitative versus Qualitative Methods
o Typically, quantitative analysis of data is thought of as somehow more “accurate” than
qualitative data but often quantitative data can be misleading or downright wrong
without the benefit of qualitative context.
o DCG’s own ADM Performance Baseline combines quantitative analytics with qualitative
interviews to provide our clients with an assessment of how their software development
group is performing AND why!
 Data mining Tools versus Interactive Tools

©2013 David Consulting Group

Page 2 of 6

v1
o




As the names imply, data mining tools are usually automatic, configured to run regularly
and perform the same analysis every time on a fresh (new, expanded or updated) set of
data each time.
o Today, most Software Analytics falls into the data mining category.
Practitioner Audience versus Management Audience
o Different audiences have different data needs for different decisions.
Exploratory versus Deployment Analytics
o In Exploratory Analytics, the goal is to explore the data to try to find potentially valuable
insights
o In Deployment Analytics, the goal is to automatically and regularly generate the
information from the valuable insight and deploy it out into the organization to inform
decisions.
o An example might be that an exploratory analysis might look for a size of project (in
function points) at which the defect rate deteriorates significantly. If found then
deployment analysis can be used to look for and report on projects that exceed or are
close to exceeding this number of function points.

Applications of Software Analytics
Since the types of Software Analytics vary, it is reasonable to expect the applications of software
analytics to have even greater range. For example:
From Menzies & Zimmerman in IEEE Software (references to the underlying case studies are included in
the Menzies & Zimmerman paper):
 Combining software product information
 Using process data to predict overall
with apps store data
project effort
 Using software process models to learn
 Using operating system logs that predict
effective process change
software power consumption
 Exploring product line models to configure
 Mining natural language requirements to
new applications
find links between components
 Mining performance data
 Using XML descriptions of design patterns
to recommend particular designs
 Using email lists to understand the human
 Linking emails to source code artifacts and
networks inside software teams
classifying their content
 Using execution traces to learn normal
 Using bug databases to learn predictors
interface usage patterns
that guide inspection teams to where the
code is most likely to fail
 Using security data to identify indicators
 Using visualization to support program
for software vulnerabilities
comprehension
 Using software ontologies to enable
 Mining code clones to assess the
natural language queries
implications of cloning and copy/paste in
software
The variety can exist across organizations and within organizations. For example, Microsoft have at least
two systems: Czerwonka at al have reported on the development of CODEMINE at Microsoft; Zhang et
Al have reported on the Stackmine Project. CODEMINE is a software development analytics platform for
©2013 David Consulting Group

Page 3 of 6

v1
collecting and analyzing engineering process data. It includes constraints and pivotal organizational and
technical choices. Examples of uses include:
 Trend monitoring and reports on development health
 Risk evaluation and change impact analysis
 Version control branch structure optimization
 Social-technical data analysis
 Custom search for bugs and debug logs, speeding up investigations of new issues.
StackMine was developed to perform Software Analytics on a very particular set of data in a Microsoft
system called PerfTrack which logs reports of poor end user performance. For example, PerfTrack
measures how long it takes for a user to receive a response from the system when the user clicks on a
Windows Explorer button to create a new folder. If the response time exceeds a satisfactory threshold,
PerfTrack sends execution traces (containing call stacks collected during the preceding time interval)
back to Microsoft for debugging. All the collected call traces in total could contain more than 1 billion
call stacks – much more than performance analysts can expect to analyze without automation.

Who should care about Software Analytics?
Everyone who has to make decisions about software development and software operation should care
about Software Analytics. Not only because data-driven decisions are usually better than other kinds
but also because if Software Analytics are not being used to drive decisions then the collection of that
data is almost certainly a waste of time and money.
Johnson and his colleagues experience with designing Software Analytics led him to suggest three
dimensions for evaluating approaches (we have simplified):
 Data Collection: Automation versus Manual collection
 Adoption Barriers: Comprehensiveness versus Intrusion (social or political)
 Applicability of Approach: Universal (e.g. all projects) versus Narrow (e.g. only java projects)
Optimizing an approach across these dimensions requires the attention of all those who have a stake in
the outcome and/or incur a cost through the implementation.
If Software Analytics is, to some degree, about “Big Data” then is it really relevant to smaller companies?
Our experiences with smaller clients at DCG is absolutely “yes” and, again applying some objectivity,
those experiences correlate very well with the experience of Amisoft reported by Robbes, Vidal and
Bastarrica. While only a few DCG clients introduce Software Analytics as part of a CMMI initiative, this
was one of the drivers for Amisoft but they also reported the following additional drivers:
 Determining whether employees were really following the company’s process
 Measuring actual adherence to the company’s process
 Gathering evidence of whether the process was a net positive for the company
 Increasing the visibility of activities
 Locating opportunities for improvement
Amisoft saw benefits from the Software Analytics program in the following areas:
 Strategic decisions
o Scheduling – analysis of historical data allows for optimization of the contingency buffer
applied to project delivery dates
©2013 David Consulting Group

Page 4 of 6

v1
o



Requirement volatility – measurement showed that requirement volatility was so high
that it was detrimental to project success. Steps were taken to reduce volatility and
continue to monitor it.
o Verification and Validation – the company realized that they were not putting enough
time into this part of the process and quality of the finished product was at risk.
Tactical Decisions
o Personnel – Project managers monitored several aspects of employee performance
actively during projects to allow for immediate correction of problems instead of
retrospective analysis of why the project failed.
o Client Interaction – Project managers used the reports on Requirements volatility and
incidents to manage the client relationship.
o Rescheduling – planning and task status data was used to prompt rescheduling or
reassignment of at-risk tasks.

Menzies and Zimmerman make some predictions about changes we might expect by 2020 due to the
coming impact of Software Analytics. You should care about Software Analytics if any of these predicted
changes might impact you:
 More and different data;
 More algorithms;
 Faster decision making with the availability of more data and faster release cycles;
 More people involved in analytics as it becomes more routine to mine data;
 More education as more people analyze and work with data;
 More roles for data scientists and developers as this field matures with specialized sub-areas;
 More real-time analytics to address the challenges of quickly finding patterns in big data;
 More analytics for software systems such as mobile apps and games;
 More impact of social tools in analytics.

Conclusion
Software Analytics is the process of extracting meaningful insights from the data produced during
software development and operation. Software analytics should be sought after and used by all those
people who have to make decisions about software development and operation. Data-driven decisions
are usually better than decisions based on “gut feel”, imperfect memory or business as usual.

Sources





Dongmei Zhang, Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, and Tao Xie.
"Software Analytics as a Learning Case in Practice: Approaches and Experiences". In Proceedings
of International Workshop on Machine Learning Technologies in Software Engineering (MALETS
2011), Lawrence, Kansas, November 2011.
“Software Analytics: So What?”, Tim Menzies & Thomas Zimmerman, IEEE Software,
July/August 2013
“Searching under the Streetlight for Useful Software Analytics”, Philip M. Johnson, IEEE
Software, July/August 2013

©2013 David Consulting Group

Page 5 of 6

v1






“CODEMINE: Building a Software Development Data Analytics Platform at Microsoft”, Jacek
Czerwonka, Nachiappan Nagappan, Wolfram Schulte, Brendan Murphy, IEEE Software,
July/August 2013
“Software Analytics in Practice”, Dongmei Zhang, Shi Han, Jian-Guang Lou, Haidong Zhang, Tao
Zie, IEEE Software, September/October 2013
“Are Software Analytics Efforts Worthwhile for Small Companies? The Case of Amisoft”, Romain
Robbes, Rene Vidal, Maria Cecilia Bastarrica, IEEE Software, September/October 2013
“IEEE Software”, July/August 2013.
“IEEE Software”, September/October 2013.

©2013 David Consulting Group

Page 6 of 6

v1

More Related Content

Recently uploaded

Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfUK Journal
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 

Recently uploaded (20)

Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 

Featured

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 

Featured (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

What Are Software Analytics - And Who Should Care About Them?

  • 1. “What are "Software Analytics" and who should care about them?” October 2013 Scope of this Report Everyone has a different definition of Software Analytics and so it is fair to say that this is a broad topic. Hence, this report defines Software Analytics as much by the examples it gives as by one single formal definition. In answering the “who should care?” part of the question, the report seeks to identify roles that should be using Software Analytics already and/or could be using it more. We also look into the future to see how Software Analytics might change software development in organizations in the way that business intelligence has impacted some organizations. What are “Software Analytics”? This definition of “Software Analytics” comes from a 2011 paper by Zhang et al: “Software Analytics” – to utilize the data-driven approach to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for completing various tasks around software systems, software users, and software development process. So, can we think of Software Analytics as “big data” for software development in that the focus is on mining the richness of data about software projects for a surprising new insight? In short, yes. But most practitioners set the bar a little higher than just “new insights.” The goal of software analytics is to produce real-time and actionable insights where “real-time” is defined to be within the time frame of a project such that corrective action is feasible more quickly than it would be by just letting the project fail. Some examples of Software Analytics include (more examples are provided later in the report):  Using data from previous projects to size and estimate future projects  Using defect data to allocate effort in a running project  Benchmarking projects and project against industry data  Statistically and/or heuristically analyzing code to look for patterns and anti-patterns. Menzies and Zimmerman identify one important goal and constraint for Software Analytics – it is about using lessons from one set of projects to improve results on another set of projects. Consequently, the value of Software Analytics is undermined if the right data is not available for analysis (because it doesn’t exist, is not sufficiently structured or the owners are not prepared to share) or if the results cannot be readily shared (because they are not sufficiently structured or the owners are not prepared to share). Current use of the term “Software Analytics” suggests a tipping point. In the past, “Software Analytics” seems to have been used quite narrowly, more often than not in the context of analyzing defects. More ©2013 David Consulting Group Page 1 of 6 v1
  • 2. recently and perhaps due to the growth in business intelligence (“Big Data” if you must), an awareness has grown that the software development process and the subsequent operation of the software throws off a lot of data. Historically, some of this data has been analyzed but much has been ignored by organizations. Going forward, under the more broadly defined “Software Analytics” banner, there is an expectation that important insights can be uncovered to improve decision-making around software development and operation. In his paper, “Searching under the Streetlight for Useful Software Analytics,” Johnson makes an excellent point about the importance of unobtrusive data collection. One of the reasons that much data goes to waste is because, in Johnson’s words: “For developers, one of the most frustrating aspects of manual data collection is the loop of doing some work and then interrupting it to record what they worked on.” We would go further to add that most developers are trained to question the purpose of any measurement and analysis of what they do and smart enough to “game” any data collection system that they don’t value. Different and Distinct Kinds of Software Analytics Everyone has a different definition of software analytics and this is justified to some extent by the variety. For example:  Internal versus External Analytics o Live (or directly accessible) Data versus Stale (or exported and sent to a 3rd party) Data – Internal analysts are much more likely to be able to access live data than external analysts. o Privacy – external analysts may necessarily only have access to anonymous data  Quantitative versus Qualitative Methods o Typically, quantitative analysis of data is thought of as somehow more “accurate” than qualitative data but often quantitative data can be misleading or downright wrong without the benefit of qualitative context. o DCG’s own ADM Performance Baseline combines quantitative analytics with qualitative interviews to provide our clients with an assessment of how their software development group is performing AND why!  Data mining Tools versus Interactive Tools ©2013 David Consulting Group Page 2 of 6 v1
  • 3. o   As the names imply, data mining tools are usually automatic, configured to run regularly and perform the same analysis every time on a fresh (new, expanded or updated) set of data each time. o Today, most Software Analytics falls into the data mining category. Practitioner Audience versus Management Audience o Different audiences have different data needs for different decisions. Exploratory versus Deployment Analytics o In Exploratory Analytics, the goal is to explore the data to try to find potentially valuable insights o In Deployment Analytics, the goal is to automatically and regularly generate the information from the valuable insight and deploy it out into the organization to inform decisions. o An example might be that an exploratory analysis might look for a size of project (in function points) at which the defect rate deteriorates significantly. If found then deployment analysis can be used to look for and report on projects that exceed or are close to exceeding this number of function points. Applications of Software Analytics Since the types of Software Analytics vary, it is reasonable to expect the applications of software analytics to have even greater range. For example: From Menzies & Zimmerman in IEEE Software (references to the underlying case studies are included in the Menzies & Zimmerman paper):  Combining software product information  Using process data to predict overall with apps store data project effort  Using software process models to learn  Using operating system logs that predict effective process change software power consumption  Exploring product line models to configure  Mining natural language requirements to new applications find links between components  Mining performance data  Using XML descriptions of design patterns to recommend particular designs  Using email lists to understand the human  Linking emails to source code artifacts and networks inside software teams classifying their content  Using execution traces to learn normal  Using bug databases to learn predictors interface usage patterns that guide inspection teams to where the code is most likely to fail  Using security data to identify indicators  Using visualization to support program for software vulnerabilities comprehension  Using software ontologies to enable  Mining code clones to assess the natural language queries implications of cloning and copy/paste in software The variety can exist across organizations and within organizations. For example, Microsoft have at least two systems: Czerwonka at al have reported on the development of CODEMINE at Microsoft; Zhang et Al have reported on the Stackmine Project. CODEMINE is a software development analytics platform for ©2013 David Consulting Group Page 3 of 6 v1
  • 4. collecting and analyzing engineering process data. It includes constraints and pivotal organizational and technical choices. Examples of uses include:  Trend monitoring and reports on development health  Risk evaluation and change impact analysis  Version control branch structure optimization  Social-technical data analysis  Custom search for bugs and debug logs, speeding up investigations of new issues. StackMine was developed to perform Software Analytics on a very particular set of data in a Microsoft system called PerfTrack which logs reports of poor end user performance. For example, PerfTrack measures how long it takes for a user to receive a response from the system when the user clicks on a Windows Explorer button to create a new folder. If the response time exceeds a satisfactory threshold, PerfTrack sends execution traces (containing call stacks collected during the preceding time interval) back to Microsoft for debugging. All the collected call traces in total could contain more than 1 billion call stacks – much more than performance analysts can expect to analyze without automation. Who should care about Software Analytics? Everyone who has to make decisions about software development and software operation should care about Software Analytics. Not only because data-driven decisions are usually better than other kinds but also because if Software Analytics are not being used to drive decisions then the collection of that data is almost certainly a waste of time and money. Johnson and his colleagues experience with designing Software Analytics led him to suggest three dimensions for evaluating approaches (we have simplified):  Data Collection: Automation versus Manual collection  Adoption Barriers: Comprehensiveness versus Intrusion (social or political)  Applicability of Approach: Universal (e.g. all projects) versus Narrow (e.g. only java projects) Optimizing an approach across these dimensions requires the attention of all those who have a stake in the outcome and/or incur a cost through the implementation. If Software Analytics is, to some degree, about “Big Data” then is it really relevant to smaller companies? Our experiences with smaller clients at DCG is absolutely “yes” and, again applying some objectivity, those experiences correlate very well with the experience of Amisoft reported by Robbes, Vidal and Bastarrica. While only a few DCG clients introduce Software Analytics as part of a CMMI initiative, this was one of the drivers for Amisoft but they also reported the following additional drivers:  Determining whether employees were really following the company’s process  Measuring actual adherence to the company’s process  Gathering evidence of whether the process was a net positive for the company  Increasing the visibility of activities  Locating opportunities for improvement Amisoft saw benefits from the Software Analytics program in the following areas:  Strategic decisions o Scheduling – analysis of historical data allows for optimization of the contingency buffer applied to project delivery dates ©2013 David Consulting Group Page 4 of 6 v1
  • 5. o  Requirement volatility – measurement showed that requirement volatility was so high that it was detrimental to project success. Steps were taken to reduce volatility and continue to monitor it. o Verification and Validation – the company realized that they were not putting enough time into this part of the process and quality of the finished product was at risk. Tactical Decisions o Personnel – Project managers monitored several aspects of employee performance actively during projects to allow for immediate correction of problems instead of retrospective analysis of why the project failed. o Client Interaction – Project managers used the reports on Requirements volatility and incidents to manage the client relationship. o Rescheduling – planning and task status data was used to prompt rescheduling or reassignment of at-risk tasks. Menzies and Zimmerman make some predictions about changes we might expect by 2020 due to the coming impact of Software Analytics. You should care about Software Analytics if any of these predicted changes might impact you:  More and different data;  More algorithms;  Faster decision making with the availability of more data and faster release cycles;  More people involved in analytics as it becomes more routine to mine data;  More education as more people analyze and work with data;  More roles for data scientists and developers as this field matures with specialized sub-areas;  More real-time analytics to address the challenges of quickly finding patterns in big data;  More analytics for software systems such as mobile apps and games;  More impact of social tools in analytics. Conclusion Software Analytics is the process of extracting meaningful insights from the data produced during software development and operation. Software analytics should be sought after and used by all those people who have to make decisions about software development and operation. Data-driven decisions are usually better than decisions based on “gut feel”, imperfect memory or business as usual. Sources    Dongmei Zhang, Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, and Tao Xie. "Software Analytics as a Learning Case in Practice: Approaches and Experiences". In Proceedings of International Workshop on Machine Learning Technologies in Software Engineering (MALETS 2011), Lawrence, Kansas, November 2011. “Software Analytics: So What?”, Tim Menzies & Thomas Zimmerman, IEEE Software, July/August 2013 “Searching under the Streetlight for Useful Software Analytics”, Philip M. Johnson, IEEE Software, July/August 2013 ©2013 David Consulting Group Page 5 of 6 v1
  • 6.      “CODEMINE: Building a Software Development Data Analytics Platform at Microsoft”, Jacek Czerwonka, Nachiappan Nagappan, Wolfram Schulte, Brendan Murphy, IEEE Software, July/August 2013 “Software Analytics in Practice”, Dongmei Zhang, Shi Han, Jian-Guang Lou, Haidong Zhang, Tao Zie, IEEE Software, September/October 2013 “Are Software Analytics Efforts Worthwhile for Small Companies? The Case of Amisoft”, Romain Robbes, Rene Vidal, Maria Cecilia Bastarrica, IEEE Software, September/October 2013 “IEEE Software”, July/August 2013. “IEEE Software”, September/October 2013. ©2013 David Consulting Group Page 6 of 6 v1