Data CollectionList of Data Points Gerrit Klaschke
What is covered? Suggested Mandatory Data Optional Data Data for Model Calibrations Goal-Question-Metrics Approach
Data Collection – What to Collect? This powerpoint lists what project data needs to be collected for different purposes. First part lists mandatory data and suggested optional data. Second part lists data needed for model calibration Last part: we categorize the data using the GQM approach. After setting a goal and questions, the GQM approach defines the metrics and data points which are needed to be collected. These lists are also synchronized with sources such as ISBSG Database, USC COCOMO Project Database and QSM data collection documents available on the Internet
Suggested Mandatory Data Project name including version/ increment/ release if applicable Project manager name and contact information Company/department information Submitter name and contact information Submission date
Suggested Mandatory Data Estimation model used including the model parameter used Project type/domain Project lifecycle Project standard – if applicable Equivalent SLOC or other sizing depending on the model Counting convention used, e.g. logical or physical SLOC and counting technique used e.g. counting tool Estimated Effort Estimated Schedule
Suggested Mandatory Data Actual Effort (person months) Actual Schedule (months) Project memo field for free text Project start date Reference to any additional project files if applicable. Team size information
Suggested Mandatory Data Final environmental adjustment factor Final scaling adjustment factor Hours per Months Product complexity measure of description Development tools Data quality qualifier (a to f) to measure the quality of the data. Development type e.g. new development or enhancement
Suggested Mandatory Data It is highly recommended to store a copy of the entire project type/domain, lifecycle and standard definition and estimation model. Definitions and model parameters change over time which makes it necessary to store a copy of the definitions along with the project which used them for later retrieval.
Suggested Optional Data Estimated total defects and if possible a breakdown (requirements, design, coding, documentation, bad fixes defects etc) Actual total defects and if possible a breakdown (requirements, design, coding, documentation, bad fixes defects etc) Environment factors list with ratings and ratings mapping Scaling factors list with ratings and ratings mapping Project estimated end date Project actual end date
Suggested Optional Data Effort by phase Total effort in person hours Effort by phase Schedule by phase, e.g. Waterfall: requirements, design, code, test, total. Number of people (and job functions) working in each phase (average or maximum) Sizing data such as FP count, Internet point, bottom up Sizing methodology parameter
Suggested Optional Data Volume/size information for all components, breakdown: new, reused, COTS and REVL. Programming language composition per component and unadjusted FP count Description of the programming language used (IDE heavy? Etc). Sizing methodology parameter definitions: Methodology used (name such as function points) List of factors and weights
Data for Model Calibrations COCOMO’s calibration routine needs the following inputs to calibration A and C Actual Effort, Schedule Number of projects used EAF (Environmental Factor overall) or all individual factors Exponent Base (driven by the model) Overall Scaling Factor or all individual factors KSLOC HPM
Goal/Question/Metric The Goal/Question/Metric (GQM) Paradigm is a mechanism that provides a framework for developing a metrics program. It was developed at the University of Maryland as a mechanism for formalizing the tasks of characterization, planning, construction, analysis, learning and feedback. The GQM paradigm was developed for all types of studies, particularly studies concerned with improvement issues. The paradigm does not provide specific goals but rather a framework for stating goals and refining them into questions to provide a specification for the data needed to help achieve the goals.
Goal/Question/Metric The GQM paradigm consists of three steps: 1. Generate a set of goals 2. Derive a set of questions 3. Develop a set of metrics The next pages list several Goals, Questions and their needed metrics/data points. It can be used to determine what project data needs to be collected.
Goal/Question/Metric Main Goal: Improve the Software Process Goal: Improve Productivity Question: What are the overall and subgroup productivities? Metric: SLOC/person-month (overall and subgroup averages). Productivity should be calculated from actuals. Question: What is the productivity per activity? Metric: SLOC/person-month per activity. Question: Is productivity increasing over time? Metric: overall productivity trends (multiple projects).
Goal/Question/Metric Main Goal: Improve the Software Process Goal: Improve Quality Question: What is the overall and subgroup defect densities? Metrics: Defects/KSLOC Question: How many defects are introduced by type? Metric: Percent of defects introduced per type. A project type must be specified Question: What defect types occur the most? Metric: Defect type percentages ordered by magnitude
Goal/Question/Metric Main Goal: Improve software process predictability Goal: Improve Effort Predictability Question: How accurate are effort estimates? Metric: actual vs. estimated effort