This document discusses a study on how the statistical distributions of software metrics can impact quality. It shows that some metrics, like file size, follow a double Pareto distribution with a transition point from lognormal to power law behavior. Files above this transition point account for a large percentage of code size and defects. The probability of finding defects is higher for files with metrics above the transition point. Overall, the findings indicate the statistical distributions of metrics are related to defects density and can help reduce the search space for problematic files.
Digitalization is the use of digital technologies to change a business model and provide new revenue and value-producing opportunities; it is the process of moving to a digital business.
Digitalization is the use of digital technologies to change a business model and provide new revenue and value-producing opportunities; it is the process of moving to a digital business.
(ATS3-PLAT01) Recent developments in Pipeline PilotBIOVIA
This session will review significant enhancements to Pipeline Pilot in recent releases. Areas covered are: Professional client, administration, security, integration, databases, and collections (chemistry, next gen sequencing, documents and text, statistics, and imaging).
2011/2012 CAST report on Application Software Quality (CRASH)CAST
This report details the latest Structural Quality and Technical Debt trends of software applications across industries and technology platforms using data from the CAST Appmarq database—the largest repository of its kind, with 745 applications representing 365 million lines of code submitted by 160 organizations.
The Explosion of Petascale in the Race to ExascaleIntel IT Center
Raj Hazra VP of the Architecture Group and GM of Technical Computing at Intel discusses the race to Exascale computing in the world of HPC and Supercomputing and Intel Xeon Phi's role.
Hedge Fund IT Challenges Financial SurveyAvere Systems
This survey highlights results of a recent Avere Systems Survey capturing challenges that hedge fund IT managers are experiencing in an era of constant and rapid change.
Paper: Vasilescu B, Serebrenik A and van den Brand MGJ (2011), "By No Means: A Study on Aggregating Software Metrics", In Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics, pp. 23-26, ACM.
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution Analytics
R users already know why the R language is the lingua franca of statisticians today: because it's the most powerful statistical language in the world. Revolution Analytics builds on the power of open source R, and adds performance, productivity and integration features to create Revolution R Enterprise. In this presentation, author and blogger David Smith will introduce the additional capabilities of Revolution R Enterprise.
During the session I would like to bring basic concepts about the performance testing and highlight the activities,
that we are running in CTCo. I believe, that primary audience for this session
would be test engineers, that do not have experience in this activity and would like to gain some knowledge in this area.
This presentation intends to explain important concepts about software engineering, CAP Theorem, SOA concepts, API Management and ending with a solution which solves the API Aggregation issues used by Composite UI, by an API Gateway implementation using Ocelot lib.
During the period of this survey more than 30 answers were collected. The intended audience of the questionnaire was a (even small) community of experts and proficient BPMN users. Thus, even if this survey can not be taken into account as representative of a public sentiment about BPMN, this work clearly represents a sight of the experts on how BPMN is used by practitioners in everyday business process modeling chores.
5 APM and Capacity Planning Imperatives for a Virtualized WorldCorrelsense
The proliferation of virtualized applications has greatly increased the complexity of capacity planning and performance management. Monitoring and forecasting CPU utilization is no longer enough. IT operations and capacity planners now must understand and optimize their applications and infrastructure from the end user to the data center.
Join Correlsense and Metron-Athene for an online seminar which will explore key performance management and capacity planning strategies for a virtualized world. We will discuss:
What you need to know about capacity management when operating in both physical and virtual environments
How performance monitoring in virtual environments relates to your capacity management goals
What is unique about capacity and performance management for virtualized applications
In this talk, John will explore the technology and architecture introduced in the ARM Cortex-A15 processor in support of virtualization. This is the first of multiple processors from ARM that will support true virtualization, and the ability to host existing operating systems binaries without modification. The hardware extensions were defined following careful analysis to address the key virtualization performance limitations of today's solutions while bringing new technologies to the device to better support a virtualized system.
Simple is Not Necessarily Better: Why Software Productivity Factors Can Lead...Michael Gallo
Oftentimes, DoD analysts use historical productivity factors from analogous projects for estimating software development cost. Typically, the size of major new developments dwarf the size of these analog projects and can lead to significant under-estimation.
This presentation summarizes four key risks that contribute to this under-estimation and offers simple remedies to address these risks.
In natural sciences, intensive properties do not depend on the size of the system. These slides summarize how we have found intensive metrics for the case of open source software, and how to use these metrics to evaluate open source evolution. These slides have been presented at MSR 2013. There is a preprint of the paper at http://oa.upm.es/14698/
(ATS3-PLAT01) Recent developments in Pipeline PilotBIOVIA
This session will review significant enhancements to Pipeline Pilot in recent releases. Areas covered are: Professional client, administration, security, integration, databases, and collections (chemistry, next gen sequencing, documents and text, statistics, and imaging).
2011/2012 CAST report on Application Software Quality (CRASH)CAST
This report details the latest Structural Quality and Technical Debt trends of software applications across industries and technology platforms using data from the CAST Appmarq database—the largest repository of its kind, with 745 applications representing 365 million lines of code submitted by 160 organizations.
The Explosion of Petascale in the Race to ExascaleIntel IT Center
Raj Hazra VP of the Architecture Group and GM of Technical Computing at Intel discusses the race to Exascale computing in the world of HPC and Supercomputing and Intel Xeon Phi's role.
Hedge Fund IT Challenges Financial SurveyAvere Systems
This survey highlights results of a recent Avere Systems Survey capturing challenges that hedge fund IT managers are experiencing in an era of constant and rapid change.
Paper: Vasilescu B, Serebrenik A and van den Brand MGJ (2011), "By No Means: A Study on Aggregating Software Metrics", In Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics, pp. 23-26, ACM.
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution Analytics
R users already know why the R language is the lingua franca of statisticians today: because it's the most powerful statistical language in the world. Revolution Analytics builds on the power of open source R, and adds performance, productivity and integration features to create Revolution R Enterprise. In this presentation, author and blogger David Smith will introduce the additional capabilities of Revolution R Enterprise.
During the session I would like to bring basic concepts about the performance testing and highlight the activities,
that we are running in CTCo. I believe, that primary audience for this session
would be test engineers, that do not have experience in this activity and would like to gain some knowledge in this area.
This presentation intends to explain important concepts about software engineering, CAP Theorem, SOA concepts, API Management and ending with a solution which solves the API Aggregation issues used by Composite UI, by an API Gateway implementation using Ocelot lib.
During the period of this survey more than 30 answers were collected. The intended audience of the questionnaire was a (even small) community of experts and proficient BPMN users. Thus, even if this survey can not be taken into account as representative of a public sentiment about BPMN, this work clearly represents a sight of the experts on how BPMN is used by practitioners in everyday business process modeling chores.
5 APM and Capacity Planning Imperatives for a Virtualized WorldCorrelsense
The proliferation of virtualized applications has greatly increased the complexity of capacity planning and performance management. Monitoring and forecasting CPU utilization is no longer enough. IT operations and capacity planners now must understand and optimize their applications and infrastructure from the end user to the data center.
Join Correlsense and Metron-Athene for an online seminar which will explore key performance management and capacity planning strategies for a virtualized world. We will discuss:
What you need to know about capacity management when operating in both physical and virtual environments
How performance monitoring in virtual environments relates to your capacity management goals
What is unique about capacity and performance management for virtualized applications
In this talk, John will explore the technology and architecture introduced in the ARM Cortex-A15 processor in support of virtualization. This is the first of multiple processors from ARM that will support true virtualization, and the ability to host existing operating systems binaries without modification. The hardware extensions were defined following careful analysis to address the key virtualization performance limitations of today's solutions while bringing new technologies to the device to better support a virtualized system.
Simple is Not Necessarily Better: Why Software Productivity Factors Can Lead...Michael Gallo
Oftentimes, DoD analysts use historical productivity factors from analogous projects for estimating software development cost. Typically, the size of major new developments dwarf the size of these analog projects and can lead to significant under-estimation.
This presentation summarizes four key risks that contribute to this under-estimation and offers simple remedies to address these risks.
In natural sciences, intensive properties do not depend on the size of the system. These slides summarize how we have found intensive metrics for the case of open source software, and how to use these metrics to evaluate open source evolution. These slides have been presented at MSR 2013. There is a preprint of the paper at http://oa.upm.es/14698/
MATLAB se ha convertido en un estándar para el cálculo científico y la visualización en ingeniería y ciencias, y como herramienta docente en universidades. El principal inconveniente para la enseñanza con MATLAB es la dificultad de acceso de los alumnos a la herramienta, debido al alto coste de las licencias. Esto provoca un impacto docente muy claro, pues pocos estudiantes pueden practicar usando sus propios ordenadores.
Octave es una alternativa a MATLAB, que se distribuye como software libre. El principal inconveniente que ha tenido Octave hasta ahora era la falta de una interfaz gráfica sencilla y de un entorno de programación, similares a los que presenta MATLAB. Además, puede ser complicado de instalar en algunas plataformas, como Windows.
Este inconveniente es historia gracias al proyecto Octave UPM, que proporciona un entorno de programación basado en Octave compatible con MATLAB. El entorno Octave UPM ofrece las mismas funcionalidades que MATLAB y ha sido probado con éxito en varias asignaturas en la UPM, con más de 200 alumnos matriculados, que anteriormente empleaban exclusivamente MATLAB, sin cambiar una línea del código de las asignaturas.
Evaluating the presence and impact of bias in bug-fix datasetsIsrael Herraiz
Empirical Software Engineering relies on reusable datasets to make it easier to replicate empirical studies and therefore build theories on top of those empirical results. An area where these reusable datasets are particularly useful is defect predictions. In this area, the goal is to predict which entities will be more error prone, so managers can take preventive actions to improve the quality of the delivered system. These reusable datasets contain information about source code files and their history, bug reports, and bugs fixed in each one of the files. However, some of the most used datasets in the Empirical Software Engineering community have been shown to be biased: many links between files and fixed bugs are missing. Research work has already shown that this bias may affect the performance of defect prediction models. In this talk we will show how to use statistical techniques to evaluate the bias in datasets, and to estimate their impact on defect prediction
Software size distribution - Why we always underestimate software costIsrael Herraiz
Why we always underestimate software cost.
Presentation of the paper "On the distribution of source code file sizes", accepted for ICSOFT 2011 http://www.icsoft.org
Preprint available at http://oa.upm.es/6791/
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
1. Statistical distributions of software metrics: do
they matter?
Israel Herraiz
Technical University of Madrid
israel.herraiz@upm.es
Grab these slides from
http://slideshare.net/herraiz/statistical-distributions-of-metrics
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 1/17
2. Outline
1 Some background
2 Statistical properties of software metrics
3 Evidence of impact on quality
4 Summary of findings and further work
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 2/17
3. 1 Some background
2 Statistical properties of software metrics
3 Evidence of impact on quality
4 Summary of findings and further work
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 3/17
4. A (not so) long time ago...
Statistical distribution of software metrics
Software size follows a double Pareto distribution
Towards a theoretical model for software growth MSR 2007
More recently
Not only size, but some OO metrics too (and some complexity metrics)
On the Statistical Distribution of Object-Oriented System
Properties WETSoM 2012
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 4/17
5. OK, but what is that double Pareto thing?
1e+00
1e−02
P[X > x]
Data
Double Pareto
1e−04
Lognormal
1 100 10000
SLOC
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 5/17
6. But does it matter?
Most of the files are on the
lognormal side
10 15 20 25 30 35
% Files
5
0
C C++ Java Python Lisp
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 6/17
7. But does it matter?
Most of the files are on the But the power law minority
lognormal side matters a lot
10 15 20 25 30 35
40
30
% SLOC
% Files
20
10
5
0
0
C C++ Java Python Lisp C C++ Java Python Lisp
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 6/17
8. Large files have a large impact
Size estimation models
Some software size estimation models are based on the log-normality of size
metrics. These models systematically underestimate the size of software.
C C++
50
50
RE
RE
0
0
−100
−100
2000 5000 10000 50000 2000 5000 20000 50000
SLOC SLOC
Java Python
50
50
RE
RE
0
0
−100
−100
1000 2000 5000 10000 1000 2000 5000 10000
SLOC SLOC
On the distribution of source code file sizes ICSOFT 2011
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 7/17
9. 1 Some background
2 Statistical properties of software metrics
3 Evidence of impact on quality
4 Summary of findings and further work
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 8/17
10. Parameters of the statistical distribution
Power law parameters: λ and xmin
Transition from lognormal to power law
1e+00
1e−02
P[X > x]
Data
Double Pareto
1e−04
Lognormal
1 100 10000
SLOC
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 9/17
11. 1 Some background
2 Statistical properties of software metrics
3 Evidence of impact on quality
4 Summary of findings and further work
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 10/17
12. Probability of finding defects
Probability of finding defects
We have seen that files above xmin account for 40% of total size, being
only about ∼ 1% of the files.
What about defects? Probability of finding defects in three software
projects (using CYCLO as metric)
Project Below xmin Above xmin
Apache .4178 .7708
OpenIntents .2500 .7500
Zxing .2143 .4161
* Data extracted from “ReLink: Recovering Links between Bugs and Changes” FSE
2011.
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 11/17
13. Probability of finding defects
Probability of finding defects (normalized metrics)
Using CYCLO / WMC as metric (cyclomatic complex. per LOC)
Project Below xmin Above xmin
Apache .4159 .6296
OpenIntents .2813 .5417
Zxing .3181 .2389
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 12/17
14. Probability of finding defects
Defects density (only pre-release defects)
Using Number of Methods and number of pre-release defects per LOC
Below xmin Above xmin
Below xmin Above xmin
12000 300
10000 250
8000 200
6000 150
4000 100
2000 50
0 0
0 1 2 3 4 5 6 7 8 9 10 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
Avg .Dens. = .2685 Avg .Dens. = .4565
* Data obtained from "Predicting Defects for Eclipse” PROMISE 2007
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 13/17
15. Probability of finding defects
Defects density (only post-release defects)
Using Number of Methods and number of post-release defects per LOC
Below xmin Above xmin
Below xmin Above xmin
12000 300
10000 250
8000 200
6000 150
4000 100
2000 50
0 0
0 1 2 3 4 5 6 7 8 9 10 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
Avg .Dens. = .1437 Avg .Dens. = .2690
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 14/17
16. Probability of finding defects
Defects density (pre + post-release defects)
Using CYCLO/SLOC and number of total defects per LOC
0 3
10 10
−1 2
10 10
Pr(X ≥ x)
−2 1
10 10
−3 0
10 10
−4 −1
10 −1 1 3 5
10
−1 0 1 2 3 4 5
10 10 10 10 10 10 10
10 10 10 10
x
Below xmin Above xmin
Avg .Dens. = .3335 (>9000 files) Avg .Dens. = .7747 (364 files)
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 15/17
17. 1 Some background
2 Statistical properties of software metrics
3 Evidence of impact on quality
4 Summary of findings and further work
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 16/17
18. Summary and further work
Summary of preliminary findings
Some metrics have a transition from lognormal to power law
Clear relation between normalized metrics and defects density
Although the threshold might not be perfect (e.g., you might find a
high defects density in a lower side file), it greatly reduces the search
space for potentially problematic files
Further work
Verify in more projects
Do you have defects data at the file level?
Find explanation for the transition and its influence on quality
How do the statistical parameters change over time? Do defects
evolve accordingly?
Israel Herraiz, UPM Statistical distributions of software metrics: do they matter? 17/17