Probability and Statistics
1
INTRODUCTION
Statistics is the science of
conducting studies to collect,
organize, summarize, analyze,
and draw conclusions from data.
The mathematics of the
collection, organization, and
interpretation of numerical data,
especially the analysis of
population characteristics from
sample datasets.
2
STATISTICS
Software Measurements
 Establishment of Measurements, Analysis
and Forecasting of future events. For
Example;
• Measurement of Bugs Density
• Measurement of Invalid Processing of Bugs
• Measurement of Bugs reported by client
• Measurement of Project Issues
• Measurement of rejected Baseline requests
3
STATISTICAL SOFTWARE ENGINEERING
APPLICATIONS OF STATISTICS IN S.E.
Quality Management
 Statistical methods for quality control
 Analysis of Bugs, NCs and Issues
4
STATISTICAL SOFTWARE ENGINEERING
APPLICATIONS OF STATISTICS IN S.E.
Analysis of results of surveys
Quantitative Research Methodology
 Descriptive Statistics
 Correlation
 Regression
 Hypothesis Testing
Quality Control
 P Charts, Control Charts
5
STATISTICAL SOFTWARE ENGINEERING
APPLICATIONS OF STATISTICS IN GENERAL
Analysis of computation of algorithms
 Complexity and Performance
Analysis of Network traffic
Analysis of CPU and Memory utilization
Progress reporting to top Management
 Graphs and Charts
Prediction and Forecasting of future
events
6
STATISTICAL SOFTWARE ENGINEERING
APPLICATIONS OF STATISTICS IN GENERAL
Development of research Instruments
 Reliability Analysis
 Validity Analysis
Analysis of quality of software
7
STATISTICAL SOFTWARE ENGINEERING
APPLICATIONS OF STATISTICS IN GENERAL
Inferential Statistics
• Inferential statistics
consists of generalizing
from samples to
populations, performing
hypothesis tests,
determining relationships
among variables, and
making predictions.
Descriptive Statistics
• Descriptive statistics
consists of the collection,
organization,
summarization, and
presentation of data
• Charts, Graphs, Tables,
Mean, Median, Mode etc
Types of Statistics
8
A variable is a characteristic or
attribute that can assume
different values.
 For Example, if the duration of 30
activities were measured, then
duration would be a variable.
9
VARIABLE
10
QUALITATIVE VARIABLES
Qualitative variables are
variables that can be
placed into distinct
categories, according to
some characteristic or
attribute.
e.g. Gender, Geographical
Location of team, Nature
of Project, Designation of
employee
QUANTITATIVE VARIABLES
Quantitative variables are
numerical and can be
ordered or ranked.
e.g. Professional
Experience of Employee
(in years), Budget of
Project, No. of bugs in a
release.
CLASSIFICATION OF VARIABLES
Quantitative variables can be
further classified into two
groups:
 Discrete and Continuous
11
CLASSIFICATION OF VARIABLES
QUANTITATIVE VARIABLES
Discrete variables can be
assigned values such as 0, 1, 2,
3 and are said to be countable.
e.g.
 No. of software projects completed
by a company
 No. of software engineers in a
team (in matrix based
organization)
12
CLASSIFICATION OF VARIABLES
1. DISCRETE VARIABLES
Continuous variables can
assume an infinite number of
values between any two specific
values. They often include
fractions and decimals. e.g.
 Budget of a software project
 Computed Bugs Density against a
release/build
13
CLASSIFICATION OF VARIABLES
2. CONTINUOUS VARIABLES
In addition to being classified as
qualitative or quantitative,
variables can be classified by how
they are categorized, counted, or
measured.
Measurement Scale has four types:
 Nominal
 Ordinal
 Interval
 Ratio
14
CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS
The nominal level of measurement
classifies data into mutually
exclusive (non-overlapping)
categories in which no order or
ranking can be imposed on the
data.
 e.g. Gender
 Projects completed by company
 Skills of Employee
 Cities
15
CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS
1. NOMINAL LEVEL OF MEASUREMENT
Dichotomous is a special type of
Nominal variable that comprises
only two possible values.
 E.g. Gender (Male, Female)
 Unit Test Result ( Pass, Fail)
 Sanity Testing Result ( Pass, Fail)
16
CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS
1.1 DICHOTOMOUS
The ordinal level of measurement
classifies data into categories that
can be ranked.
Mutually exclusive groups + order
 E.g. Severity of Bugs ( Level-1, Level-
2, Level-3, Level-4)
 Priority of Change Request ( High,
Medium, Low)
17
CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS
2. ORDINAL LEVEL OF MEASUREMENT
The interval level of measurement
ranks data.
Precise differences between
Interval and Ratio measure do
exist; however, there is no
meaningful zero.
Interval variables have ordered
categories that are equally spaced.
 E.g. Temperature (73 oF)
 Calculated Bugs Density
18
CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS
3. INTERVAL LEVEL OF MEASUREMENT
The ratio level of measurement
possesses all the characteristics of
interval measurement, and there
exists a true zero.
 E.g. No. of Bugs
 Estimated Effort for new project
 Duration of Project
 Delay of schedule
19
CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS
4. RATIO LEVEL OF MEASUREMENT
Nominal Variable  Nominal
Ordinal Variable  Ordinal
Interval & Ratio Variables  Scale
20
TYPES OF VARIABLES IN SPSS
Data
 Data are the values (measurements or
observations) that the variables can
assume.
Data Set
 A collection of data values forms a data
set. Each value in the data set is called a
data value or a datum.
21
SOME MORE DEFINITIONS
Population
 A population consists of all subjects
(human or otherwise) that are being
studied.
Sample
 A sample is a group of subjects selected
from a population.
22
SOME MORE DEFINITIONS
Read the following HR policy of a
software house regarding annual
increments of employees, and answer
the questions.
 Employees who meet their deadlines 95-
100% of the time usually receive Rs. 20k
as an increment in their salary. Employees
who meet their deadline 80-90% of the
time usually receive Rs. 10k, and
employees who meet their deadlines less
than 80% of the time usually receive Rs.
5k as an increment in their salary.
23
CASE STUDY NO. 1 (HR POLICY)
Based on this information, ‘Meeting
deadlines’ and ‘Annual increments’ are
related. The more you meet deadlines,
the more likely it is you will receive a
higher increment. If you improve your
performance and meet deadlines of
maximum tasks, your annual
increment will probably improve.
24
CASE STUDY NO. 1 (HR POLICY)
1. What are the variables under study?
2. What are the data in the study?
3. Are descriptive, inferential, or both
types of statistics used?
4. What is the population under study?
5. Was a sample collected? If so, from
where?
6. From the information given, comment
on the relationship between the variables.
25
CASE STUDY NO. 1 (HR POLICY)
QUESTIONS
1. The variables are ‘Meeting deadlines’
and ‘Annual Increments’
2. The data consists of ‘Percentage of
Meeting Deadlines’ and ‘Amount of
increments’
3. These are descriptive statistics;
however, inference statement is also
present (i.e. Based on this information, ‘Meeting
deadlines’ and ‘Annual increments’ are related). So
these are also inferential statistics.
26
CASE STUDY NO. 1 (HR POLICY)
ANSWERS
4. The population under study is the
employees of software house.
5. Not specified
6. Based on the data, it appears that, in
general, the better you meet deadlines,
the higher will be your annual increment.
27
CASE STUDY NO. 1 (HR POLICY)
ANSWERS
 Quality Management department of a
software house has published the
number of open bugs of five ‘In-
Progress’ software projects, during
Annual Quality review meeting.
28
CASE STUDY NO. 2 (PROJECTS QUALITY)
Project Name No. of Open Bugs
Project 1 500
Project 2 600
Project 3 350
Project 4 265
Project 5 1325
 1. What are the variables under
study?
2. Categorize each variable as
quantitative or qualitative.
3. Categorize each quantitative
variable as discrete or continuous.
4. Identify the level of measurement
for each variable.
29
CASE STUDY NO. 2 (PROJECTS QUALITY)
QUESTIONS
5. ‘Project 4’ shows minimum number
of ‘Open Bugs’. Does that mean
‘Project 4’ is most successful project
among all 5 projects?
30
CASE STUDY NO. 2 (PROJECTS QUALITY)
QUESTIONS
 1. The variables are
‘Project Name’ and ‘No. of Open Bugs’.
2. ‘Project Name’ is a Qualitative
variable, while ‘No. of Open Bugs’ is
quantitative variable.
3. The ‘No. of Open Bugs’ is Discrete
variable.
4. ‘Project Name’ is Nominal, while
‘No. of Open Bugs’ is ratio.
31
CASE STUDY NO. 2 (PROJECTS QUALITY)
ANSWERS
5. ‘Project 4’ shows minimum number
of ‘Open Bugs’: However, there may
be other things to consider, Size of
Project, Schedule of Project,
Compliance with client requirements.
Therefore, it is not necessary that a
project with minimum Open bugs is
most successful project of company.
32
CASE STUDY NO. 2 (PROJECTS QUALITY)
ANSWERS

Introduction.pptx

  • 1.
  • 2.
    Statistics is thescience of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data. The mathematics of the collection, organization, and interpretation of numerical data, especially the analysis of population characteristics from sample datasets. 2 STATISTICS
  • 3.
    Software Measurements  Establishmentof Measurements, Analysis and Forecasting of future events. For Example; • Measurement of Bugs Density • Measurement of Invalid Processing of Bugs • Measurement of Bugs reported by client • Measurement of Project Issues • Measurement of rejected Baseline requests 3 STATISTICAL SOFTWARE ENGINEERING APPLICATIONS OF STATISTICS IN S.E.
  • 4.
    Quality Management  Statisticalmethods for quality control  Analysis of Bugs, NCs and Issues 4 STATISTICAL SOFTWARE ENGINEERING APPLICATIONS OF STATISTICS IN S.E.
  • 5.
    Analysis of resultsof surveys Quantitative Research Methodology  Descriptive Statistics  Correlation  Regression  Hypothesis Testing Quality Control  P Charts, Control Charts 5 STATISTICAL SOFTWARE ENGINEERING APPLICATIONS OF STATISTICS IN GENERAL
  • 6.
    Analysis of computationof algorithms  Complexity and Performance Analysis of Network traffic Analysis of CPU and Memory utilization Progress reporting to top Management  Graphs and Charts Prediction and Forecasting of future events 6 STATISTICAL SOFTWARE ENGINEERING APPLICATIONS OF STATISTICS IN GENERAL
  • 7.
    Development of researchInstruments  Reliability Analysis  Validity Analysis Analysis of quality of software 7 STATISTICAL SOFTWARE ENGINEERING APPLICATIONS OF STATISTICS IN GENERAL
  • 8.
    Inferential Statistics • Inferentialstatistics consists of generalizing from samples to populations, performing hypothesis tests, determining relationships among variables, and making predictions. Descriptive Statistics • Descriptive statistics consists of the collection, organization, summarization, and presentation of data • Charts, Graphs, Tables, Mean, Median, Mode etc Types of Statistics 8
  • 9.
    A variable isa characteristic or attribute that can assume different values.  For Example, if the duration of 30 activities were measured, then duration would be a variable. 9 VARIABLE
  • 10.
    10 QUALITATIVE VARIABLES Qualitative variablesare variables that can be placed into distinct categories, according to some characteristic or attribute. e.g. Gender, Geographical Location of team, Nature of Project, Designation of employee QUANTITATIVE VARIABLES Quantitative variables are numerical and can be ordered or ranked. e.g. Professional Experience of Employee (in years), Budget of Project, No. of bugs in a release. CLASSIFICATION OF VARIABLES
  • 11.
    Quantitative variables canbe further classified into two groups:  Discrete and Continuous 11 CLASSIFICATION OF VARIABLES QUANTITATIVE VARIABLES
  • 12.
    Discrete variables canbe assigned values such as 0, 1, 2, 3 and are said to be countable. e.g.  No. of software projects completed by a company  No. of software engineers in a team (in matrix based organization) 12 CLASSIFICATION OF VARIABLES 1. DISCRETE VARIABLES
  • 13.
    Continuous variables can assumean infinite number of values between any two specific values. They often include fractions and decimals. e.g.  Budget of a software project  Computed Bugs Density against a release/build 13 CLASSIFICATION OF VARIABLES 2. CONTINUOUS VARIABLES
  • 14.
    In addition tobeing classified as qualitative or quantitative, variables can be classified by how they are categorized, counted, or measured. Measurement Scale has four types:  Nominal  Ordinal  Interval  Ratio 14 CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS
  • 15.
    The nominal levelof measurement classifies data into mutually exclusive (non-overlapping) categories in which no order or ranking can be imposed on the data.  e.g. Gender  Projects completed by company  Skills of Employee  Cities 15 CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS 1. NOMINAL LEVEL OF MEASUREMENT
  • 16.
    Dichotomous is aspecial type of Nominal variable that comprises only two possible values.  E.g. Gender (Male, Female)  Unit Test Result ( Pass, Fail)  Sanity Testing Result ( Pass, Fail) 16 CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS 1.1 DICHOTOMOUS
  • 17.
    The ordinal levelof measurement classifies data into categories that can be ranked. Mutually exclusive groups + order  E.g. Severity of Bugs ( Level-1, Level- 2, Level-3, Level-4)  Priority of Change Request ( High, Medium, Low) 17 CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS 2. ORDINAL LEVEL OF MEASUREMENT
  • 18.
    The interval levelof measurement ranks data. Precise differences between Interval and Ratio measure do exist; however, there is no meaningful zero. Interval variables have ordered categories that are equally spaced.  E.g. Temperature (73 oF)  Calculated Bugs Density 18 CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS 3. INTERVAL LEVEL OF MEASUREMENT
  • 19.
    The ratio levelof measurement possesses all the characteristics of interval measurement, and there exists a true zero.  E.g. No. of Bugs  Estimated Effort for new project  Duration of Project  Delay of schedule 19 CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS 4. RATIO LEVEL OF MEASUREMENT
  • 20.
    Nominal Variable Nominal Ordinal Variable  Ordinal Interval & Ratio Variables  Scale 20 TYPES OF VARIABLES IN SPSS
  • 21.
    Data  Data arethe values (measurements or observations) that the variables can assume. Data Set  A collection of data values forms a data set. Each value in the data set is called a data value or a datum. 21 SOME MORE DEFINITIONS
  • 22.
    Population  A populationconsists of all subjects (human or otherwise) that are being studied. Sample  A sample is a group of subjects selected from a population. 22 SOME MORE DEFINITIONS
  • 23.
    Read the followingHR policy of a software house regarding annual increments of employees, and answer the questions.  Employees who meet their deadlines 95- 100% of the time usually receive Rs. 20k as an increment in their salary. Employees who meet their deadline 80-90% of the time usually receive Rs. 10k, and employees who meet their deadlines less than 80% of the time usually receive Rs. 5k as an increment in their salary. 23 CASE STUDY NO. 1 (HR POLICY)
  • 24.
    Based on thisinformation, ‘Meeting deadlines’ and ‘Annual increments’ are related. The more you meet deadlines, the more likely it is you will receive a higher increment. If you improve your performance and meet deadlines of maximum tasks, your annual increment will probably improve. 24 CASE STUDY NO. 1 (HR POLICY)
  • 25.
    1. What arethe variables under study? 2. What are the data in the study? 3. Are descriptive, inferential, or both types of statistics used? 4. What is the population under study? 5. Was a sample collected? If so, from where? 6. From the information given, comment on the relationship between the variables. 25 CASE STUDY NO. 1 (HR POLICY) QUESTIONS
  • 26.
    1. The variablesare ‘Meeting deadlines’ and ‘Annual Increments’ 2. The data consists of ‘Percentage of Meeting Deadlines’ and ‘Amount of increments’ 3. These are descriptive statistics; however, inference statement is also present (i.e. Based on this information, ‘Meeting deadlines’ and ‘Annual increments’ are related). So these are also inferential statistics. 26 CASE STUDY NO. 1 (HR POLICY) ANSWERS
  • 27.
    4. The populationunder study is the employees of software house. 5. Not specified 6. Based on the data, it appears that, in general, the better you meet deadlines, the higher will be your annual increment. 27 CASE STUDY NO. 1 (HR POLICY) ANSWERS
  • 28.
     Quality Managementdepartment of a software house has published the number of open bugs of five ‘In- Progress’ software projects, during Annual Quality review meeting. 28 CASE STUDY NO. 2 (PROJECTS QUALITY) Project Name No. of Open Bugs Project 1 500 Project 2 600 Project 3 350 Project 4 265 Project 5 1325
  • 29.
     1. Whatare the variables under study? 2. Categorize each variable as quantitative or qualitative. 3. Categorize each quantitative variable as discrete or continuous. 4. Identify the level of measurement for each variable. 29 CASE STUDY NO. 2 (PROJECTS QUALITY) QUESTIONS
  • 30.
    5. ‘Project 4’shows minimum number of ‘Open Bugs’. Does that mean ‘Project 4’ is most successful project among all 5 projects? 30 CASE STUDY NO. 2 (PROJECTS QUALITY) QUESTIONS
  • 31.
     1. Thevariables are ‘Project Name’ and ‘No. of Open Bugs’. 2. ‘Project Name’ is a Qualitative variable, while ‘No. of Open Bugs’ is quantitative variable. 3. The ‘No. of Open Bugs’ is Discrete variable. 4. ‘Project Name’ is Nominal, while ‘No. of Open Bugs’ is ratio. 31 CASE STUDY NO. 2 (PROJECTS QUALITY) ANSWERS
  • 32.
    5. ‘Project 4’shows minimum number of ‘Open Bugs’: However, there may be other things to consider, Size of Project, Schedule of Project, Compliance with client requirements. Therefore, it is not necessary that a project with minimum Open bugs is most successful project of company. 32 CASE STUDY NO. 2 (PROJECTS QUALITY) ANSWERS