Predictive uncertainty of deep models and its applicationsNAVER Engineering
발표자: 이기민(KAIST 박사과정)
발표일: 2018.4.
The predictive uncertainty (e.g., entropy of softmax distribution of a deep classifier) is indispensable as it is useful in many machine learning applications (e.g., active learning and ensemble learning) as well as when deploying the trained model in real-world systems. In order to improve the quality of the predictive uncertainty, we proposed a novel loss function for training deep models (ICLR 2018). We showed that confidence deep models trained by our method can be very useful in various machine learning applications such as novelty detection (CVPR 2018) and ensemble learning (ICML 2017).
The ever-increasing number of parameters in deep neural networks poses challengesfor memory-limited applications. Regularize-and-prune methods aim at meetingthese challenges by sparsifying the network weights. In this context we quantifythe outputsensitivityto the parameters (i.e. their relevance to the network output)and introduce a regularization term that gradually lowers the absolute value ofparameters with low sensitivity. Thus, a very large fraction of the parametersapproach zero and are eventually set to zero by simple thresholding. Our methodsurpasses most of the recent techniques both in terms of sparsity and error rates. Insome cases, the method reaches twice the sparsity obtained by other techniques atequal error rates.
http://imatge-upc.github.io/vqa-2016-cvprw/
This thesis studies methods to solve Visual Question-Answering (VQA) tasks with a Deep Learning framework.As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle Question-Answering (text based). We then modify the previous model to accept an image as an input in addition to the question. For this purpose, we explore the VGG-16 and K-CNN convolutional neural networks to extract visual features from the image. These are merged with the word embedding or with a sentence embedding of the question to predict the answer. This work was successfully submitted to the Visual Question Answering Challenge 2016, where it achieved a 53,62\% of accuracy in the test dataset. The developed software has followed the best programming practices and Python code style, providing a consistent baseline in Keras for different configurations.
Usage of Generative Adversarial Networks (GANs) in HealthcareGlobalLogic Ukraine
The presentation is devoted to the application of Generative Adversarial Networks (GANs) in Healthcare. We will shortly observe basic principles and features of such networks, outline the types of tasks in medicine researches and practice that can be solved with GANs. Than we’ll discuss the examples of GANs using for the solving for some medical tasks.
This presentation by Vladyslav Kolbasin (Lead Software Developer, Consultant, GlobalLogic, Kharkiv) was delivered at AI Ukraine 2017 (Kharkiv) on September 24, 2017.
Predictive uncertainty of deep models and its applicationsNAVER Engineering
발표자: 이기민(KAIST 박사과정)
발표일: 2018.4.
The predictive uncertainty (e.g., entropy of softmax distribution of a deep classifier) is indispensable as it is useful in many machine learning applications (e.g., active learning and ensemble learning) as well as when deploying the trained model in real-world systems. In order to improve the quality of the predictive uncertainty, we proposed a novel loss function for training deep models (ICLR 2018). We showed that confidence deep models trained by our method can be very useful in various machine learning applications such as novelty detection (CVPR 2018) and ensemble learning (ICML 2017).
The ever-increasing number of parameters in deep neural networks poses challengesfor memory-limited applications. Regularize-and-prune methods aim at meetingthese challenges by sparsifying the network weights. In this context we quantifythe outputsensitivityto the parameters (i.e. their relevance to the network output)and introduce a regularization term that gradually lowers the absolute value ofparameters with low sensitivity. Thus, a very large fraction of the parametersapproach zero and are eventually set to zero by simple thresholding. Our methodsurpasses most of the recent techniques both in terms of sparsity and error rates. Insome cases, the method reaches twice the sparsity obtained by other techniques atequal error rates.
http://imatge-upc.github.io/vqa-2016-cvprw/
This thesis studies methods to solve Visual Question-Answering (VQA) tasks with a Deep Learning framework.As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle Question-Answering (text based). We then modify the previous model to accept an image as an input in addition to the question. For this purpose, we explore the VGG-16 and K-CNN convolutional neural networks to extract visual features from the image. These are merged with the word embedding or with a sentence embedding of the question to predict the answer. This work was successfully submitted to the Visual Question Answering Challenge 2016, where it achieved a 53,62\% of accuracy in the test dataset. The developed software has followed the best programming practices and Python code style, providing a consistent baseline in Keras for different configurations.
Usage of Generative Adversarial Networks (GANs) in HealthcareGlobalLogic Ukraine
The presentation is devoted to the application of Generative Adversarial Networks (GANs) in Healthcare. We will shortly observe basic principles and features of such networks, outline the types of tasks in medicine researches and practice that can be solved with GANs. Than we’ll discuss the examples of GANs using for the solving for some medical tasks.
This presentation by Vladyslav Kolbasin (Lead Software Developer, Consultant, GlobalLogic, Kharkiv) was delivered at AI Ukraine 2017 (Kharkiv) on September 24, 2017.
Continual Learning with Deep Architectures - Tutorial ICML 2021Vincenzo Lomonaco
Humans have the extraordinary ability to learn continually from experience. Not only we can apply previously learned knowledge and skills to new situations, we can also use these as the foundation for later learning. One of the grand goals of Artificial Intelligence (AI) is building an artificial “continual learning” agent that constructs a sophisticated understanding of the world from its own experience through the autonomous incremental development of ever more complex knowledge and skills (Parisi, 2019). However, despite early speculations and few pioneering works (Ring, 1998; Thrun, 1998; Carlson, 2010), very little research and effort has been devoted to address this vision. Current AI systems greatly suffer from the exposure to new data or environments which even slightly differ from the ones for which they have been trained for (Goodfellow, 2013). Moreover, the learning process is usually constrained on fixed datasets within narrow and isolated tasks which may hardly lead to the emergence of more complex and autonomous intelligent behaviors. In essence, continual learning and adaptation capabilities, while more than often thought as fundamental pillars of every intelligent agent, have been mostly left out of the main AI research focus.
In this tutorial, we propose to summarize the application of these ideas in light of the more recent advances in machine learning research and in the context of deep architectures for AI (Lomonaco, 2019). Starting from a motivation and a brief history, we link recent Continual Learning advances to previous research endeavours on related topics and we summarize the state-of-the-art in terms of major approaches, benchmarks and key results. In the second part of the tutorial we plan to cover more exploratory studies about Continual Learning with low supervised signals and the relationships with other paradigms such as Unsupervised, Semi-Supervised and Reinforcement Learning. We will also highlight the impact of recent Neuroscience discoveries in the design of original continual learning algorithms as well as their deployment in real-world applications. Finally, we will underline the notion of continual learning as a key technological enabler for Sustainable Machine Learning and its societal impact, as well as recap interesting research questions and directions worth addressing in the future.
Authors: Vincenzo Lomonaco, Irina Rish
Official Website: https://sites.google.com/view/cltutorial-icml2021
A critical review on Adversarial Attacks on Intrusion Detection Systems: Must...PhD Assistance
The present article helps the USA, the UK, Europe and the Australian students pursuing their computer Science postgraduate degree to identify right topic in the area of computer science specifically on deep learning, adversarial attacks and intrusion detection system. These topics are researched in-depth at the University of Spain, Cornell University, University of Modena and Reggio Emilia, Modena, Italy, and many more
http://www.phdassistance.com/industries/computer-science-information/
PhD Assistance offers UK Dissertation Research Topics Services in Computer Science Engineering Domain. When you Order Computer Science Dissertation Services at PhD Assistance, we promise you the following – Plagiarism free, Always on Time, outstanding customer support, written to Standard, Unlimited Revisions support and High-quality Subject Matter Experts http://www.phdassistance.com/services/phd-literature-review/gap-identification/
For Any Queries : Website: www.phdassistance.com
Phd Research Lab : www.research.phdassistance.com
Email: info@phdassistance.com
Phone : +91-4448137070
Contact Name Ganesh / Vinoth Kumar
Functional-connectome biomarkers to meet clinical needs?Gael Varoquaux
Extracting Functional-Connectome Biomarkers with Machine Learning: a talk in the symposium on how do current predictive connectivity models meet clinician’s needs?
This talk is a bit provocative and first sets visions, before bringing a few technical suggestions
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Ian Morgan
Professor Steve Roberts, Machine learning research group and Oxford-Man Institute + Alan Turing Institute. Steve gave this talk on the 24th January at the London Bayes Nets meetup.
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONijaia
In natural language processing, attention mechanism in neural networks are widely utilized. In this paper, the research team explore a new mechanism of extending output attention in recurrent neural networks for dialog systems. The new attention method was compared with the current method in generating dialog sentence using a real dataset. Our architecture exhibits several attractive properties such as better handle long sequences and, it could generate more reasonable replies in many cases.
Recommendation system using collaborative deep learningRitesh Sawant
Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional
CF-based methods use the ratings given to items by users
as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in
many applications, causing CF-based methods to degrade
significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as
item content information may be utilized. Collaborative
topic regression (CTR) is an appealing recent method taking
this approach which tightly couples the two components that
learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be
very effective when the auxiliary information is very sparse.
To address this problem, we generalize recent advances in
deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model
called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback)
matrix. Extensive experiments on three real-world datasets
from different domains show that CDL can significantly advance the state of the art.
A Deep Dive into Classification with Naive Bayes. Along the way we take a look at some basics from Ian Witten's Data Mining book and dig into the algorithm.
Presented on Wed Apr 27 2011 at SeaHUG in Seattle, WA.
Continual Learning with Deep Architectures - Tutorial ICML 2021Vincenzo Lomonaco
Humans have the extraordinary ability to learn continually from experience. Not only we can apply previously learned knowledge and skills to new situations, we can also use these as the foundation for later learning. One of the grand goals of Artificial Intelligence (AI) is building an artificial “continual learning” agent that constructs a sophisticated understanding of the world from its own experience through the autonomous incremental development of ever more complex knowledge and skills (Parisi, 2019). However, despite early speculations and few pioneering works (Ring, 1998; Thrun, 1998; Carlson, 2010), very little research and effort has been devoted to address this vision. Current AI systems greatly suffer from the exposure to new data or environments which even slightly differ from the ones for which they have been trained for (Goodfellow, 2013). Moreover, the learning process is usually constrained on fixed datasets within narrow and isolated tasks which may hardly lead to the emergence of more complex and autonomous intelligent behaviors. In essence, continual learning and adaptation capabilities, while more than often thought as fundamental pillars of every intelligent agent, have been mostly left out of the main AI research focus.
In this tutorial, we propose to summarize the application of these ideas in light of the more recent advances in machine learning research and in the context of deep architectures for AI (Lomonaco, 2019). Starting from a motivation and a brief history, we link recent Continual Learning advances to previous research endeavours on related topics and we summarize the state-of-the-art in terms of major approaches, benchmarks and key results. In the second part of the tutorial we plan to cover more exploratory studies about Continual Learning with low supervised signals and the relationships with other paradigms such as Unsupervised, Semi-Supervised and Reinforcement Learning. We will also highlight the impact of recent Neuroscience discoveries in the design of original continual learning algorithms as well as their deployment in real-world applications. Finally, we will underline the notion of continual learning as a key technological enabler for Sustainable Machine Learning and its societal impact, as well as recap interesting research questions and directions worth addressing in the future.
Authors: Vincenzo Lomonaco, Irina Rish
Official Website: https://sites.google.com/view/cltutorial-icml2021
A critical review on Adversarial Attacks on Intrusion Detection Systems: Must...PhD Assistance
The present article helps the USA, the UK, Europe and the Australian students pursuing their computer Science postgraduate degree to identify right topic in the area of computer science specifically on deep learning, adversarial attacks and intrusion detection system. These topics are researched in-depth at the University of Spain, Cornell University, University of Modena and Reggio Emilia, Modena, Italy, and many more
http://www.phdassistance.com/industries/computer-science-information/
PhD Assistance offers UK Dissertation Research Topics Services in Computer Science Engineering Domain. When you Order Computer Science Dissertation Services at PhD Assistance, we promise you the following – Plagiarism free, Always on Time, outstanding customer support, written to Standard, Unlimited Revisions support and High-quality Subject Matter Experts http://www.phdassistance.com/services/phd-literature-review/gap-identification/
For Any Queries : Website: www.phdassistance.com
Phd Research Lab : www.research.phdassistance.com
Email: info@phdassistance.com
Phone : +91-4448137070
Contact Name Ganesh / Vinoth Kumar
Functional-connectome biomarkers to meet clinical needs?Gael Varoquaux
Extracting Functional-Connectome Biomarkers with Machine Learning: a talk in the symposium on how do current predictive connectivity models meet clinician’s needs?
This talk is a bit provocative and first sets visions, before bringing a few technical suggestions
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Ian Morgan
Professor Steve Roberts, Machine learning research group and Oxford-Man Institute + Alan Turing Institute. Steve gave this talk on the 24th January at the London Bayes Nets meetup.
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONijaia
In natural language processing, attention mechanism in neural networks are widely utilized. In this paper, the research team explore a new mechanism of extending output attention in recurrent neural networks for dialog systems. The new attention method was compared with the current method in generating dialog sentence using a real dataset. Our architecture exhibits several attractive properties such as better handle long sequences and, it could generate more reasonable replies in many cases.
Recommendation system using collaborative deep learningRitesh Sawant
Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional
CF-based methods use the ratings given to items by users
as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in
many applications, causing CF-based methods to degrade
significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as
item content information may be utilized. Collaborative
topic regression (CTR) is an appealing recent method taking
this approach which tightly couples the two components that
learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be
very effective when the auxiliary information is very sparse.
To address this problem, we generalize recent advances in
deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model
called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback)
matrix. Extensive experiments on three real-world datasets
from different domains show that CDL can significantly advance the state of the art.
A Deep Dive into Classification with Naive Bayes. Along the way we take a look at some basics from Ian Witten's Data Mining book and dig into the algorithm.
Presented on Wed Apr 27 2011 at SeaHUG in Seattle, WA.
Imputation techniques for missing data in clinical trialsNitin George
Missing data are unavoidable in clinical and epidemiological researches. Missing data leads to bias and loss of information in research analysis. Usually we are not aware of missing data techniques because we are depending on some software’s. The objective of this seminar is to introduce different missing data mechanisms and imputation techniques for missing data with the help of examples.
Analysis of crop yield prediction using data mining techniqueseSAT Journals
Abstract
Agrarian sector in India is facing rigorous problem to maximize the crop productivity. More than 60 percent of the crop still depends on monsoon rainfall. Recent developments in Information Technology for agriculture field has become an interesting research area to predict the crop yield. The problem of yield prediction is a major problem that remains to be solved based on available data. Data Mining techniques are the better choices for this purpose. Different Data Mining techniques are used and evaluated in agriculture for estimating the future year's crop production. This paper presents a brief analysis of crop yield prediction using Multiple Linear Regression (MLR) technique and Density based clustering technique for the selected region i.e. East Godavari district of Andhra Pradesh in India.
Keywords: Agrarian Sector, Crop Production, Data Mining, Density based clustering, Information Technology, Multiple Linear Regression, Yield Prediction.
A FUZZY LOGIC BASED SCHEME FOR THE PARAMETERIZATION OF THE INTER-TROPICAL DIS...ijfls
In this paper, a Fuzzy Logic based scheme for the parameterization of the Inter-Tropical Discontinuity
(ITD) over Nigeria was presented. The scheme was developed in order to provide a computational basis for
Numerical Weather Prediction (NWP) modeling over Nigeria. The scheme uses a fuzzified 2.50 by 50
resolution grid box or 10 rows by 4 columns (10x4) matrix with the rows classified into 10 zones. The two extreme zones represented by the five (5) boundary points or two-dimensional (2-D) lattice nodes (O1 – O5), define the matrix boundaries or lattice edges, and hence, the meridional limits of the ITD position. The scheme is simple enough to be included as an ITD parameterization by NWP modelers over West Africa.
The Naive Baye's Classifier basically uses the Baye’s Theorem. According to the 'statistics and probability' and 'probability theory', the baye’s theorem is used to describe the probability for an event to occur based on the conditions related to the event that occurs. Copy the link given below and paste it in new browser window to get more information on Naive Bayes:- http://www.transtutors.com/homework-help/statistics/naive-bayes.aspx
Sentiment analysis using naive bayes classifier Dev Sahu
This ppt contains a small description of naive bayes classifier algorithm. It is a machine learning approach for detection of sentiment and text classification.
To present on the seminar in DASH-Lab, SKKU, I brought out the thesis, which is Transferable GAN-generated Images (ICML 2020)
Detection.
.
If you want to see the context more specifically, you can see from this link : https://arxiv.org/abs/2008.04115
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHcscpconf
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the complex and dynamic interaction of factors that impact software development. Heterogeneity exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency due to heterogeneity of the data. Using a clustered approach creates the subsets of data having a degree of homogeneity that enhances prediction accuracy. It was also observed in this study that ridge regression performs better than other regression techniques used in the analysis.
Estimating project development effort using clustered regression approachcsandit
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a
challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the
complex and dynamic interaction of factors that impact software development. Heterogeneity
exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying
them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency
due to heterogeneity of the data. Using a clustered approach creates the subsets of data having
a degree of homogeneity that enhances prediction accuracy. It was also observed in this study
that ridge regression performs better than other regression techniques used in the analysis.
In the modern world, we are permanently using, leveraging, interacting with, and relying upon systems of ever higher sophistication, ranging from our cars, recommender systems in eCommerce, and networks when we go online, to integrated circuits when using our PCs and smartphones, security-critical software when accessing our bank accounts, and spreadsheets for financial planning and decision making. The complexity of these systems coupled with our high dependency on them implies both a non-negligible likelihood of system failures, and a high potential that such failures have significant negative effects on our everyday life. For that reason, it is a vital requirement to keep the harm of emerging failures to a minimum, which means minimizing the system downtime as well as the cost of system repair. This is where model-based diagnosis comes into play.
Model-based diagnosis is a principled, domain-independent approach that can be generally applied to troubleshoot systems of a wide variety of types, including all the ones mentioned above. It exploits and orchestrates techniques for knowledge representation, automated reasoning, heuristic problem solving, intelligent search, learning, stochastics, statistics, decision making under uncertainty, as well as combinatorics and set theory to detect, localize, and fix faults in abnormally behaving systems.
In this talk, we will give an introduction to the topic of model-based diagnosis, point out the major challenges in the field, and discuss a selection of approaches from our research addressing these challenges. For instance, we will present methods for the optimization of the time and memory performance of diagnosis systems, show efficient techniques for a semi-automatic debugging by interacting with a user or expert, and demonstrate how our algorithms can be effectively leveraged in important application domains such as scheduling or the Semantic Web.
Software Reliability Growth Model with Logistic- Exponential Testing-Effort F...IDES Editor
Software reliability is one of the important factors of
software quality. Before software delivered in to market it is
thoroughly checked and errors are removed. Every software
industry wants to develop software that should be error free.
Software reliability growth models are helping the software
industries to develop software which is error free and reliable.
In this paper an analysis is done based on incorporating the
logistic-exponential testing-effort in to NHPP Software
reliability growth model and also observed its release policy.
Experiments are performed on the real datasets. Parameters
are calculated and observed that our model is best fitted for
the datasets.
Adversarial Variational Autoencoders to extend and improve generative model -...Loc Nguyen
Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is a good data generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
Image De-Noising Using Deep Neural Networkaciijournal
Deep neural network as a part of deep learning algorithm is a state-of-the-art approach to find higher level
representations of input data which has been introduced to many practical and challenging learning
problems successfully. The primary goal of deep learning is to use large data to help solving a given task
on machine learning. We propose an methodology for image de-noising project defined by this model and
conduct training a large image database to get the experimental output. The result shows the robustness
and efficient our our algorithm.
Adversarial Variational Autoencoders to extend and improve generative modelLoc Nguyen
Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is good at generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
Science has escaped the lab and is roaming free in the world. People use software to understand the world . What tools are needed to support that work?
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
Multi-objective evolutionary algorithms (MOEAs) help software engineers find novel solutions to complex problems. When automatic tools explore too many options, they are slow to use and hard to comprehend. GALE is a near-linear time MOEA that builds a piecewise approximation to the surface of best solutions along the Pareto frontier. For each piece, GALE mutates solutions towards the better end. In numerous case studies, GALE finds comparable solutions to standard methods (NSGA-II, SPEA2) using far fewer evaluations (e.g. 20 evaluations, not 1,000). GALE is recommended when a model is expensive to evaluate, or when some audience needs to browse and understand how an MOEA has made its conclusions.
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...CS, NcState
Discussions about sharing
- Too much fear
- Not enough about benefits
Can we learn more from sharing that hoarding ?
- Yes (results from SE)
Three laws of trusted data sharing:
- For SE quality prediction..
- Better models from shared privatized data that from all raw data
Q: does this work for other kinds of data?
A: don’t know… yet
172529main ken and_tim_software_assurance_research_at_west_virginiaCS, NcState
SA @ WV(software assurance research at West Virginia)
Kenneth McGill
NASA IV&V Facility Research Lead
304.367.8300
Kenneth.McGill@ivv.nasa.gov
Dr. Tim Menzies Ph.D. (WVU)
Software Engineering Research Chair
tim@menzies.us
Next Generation “Treatment Learning” (finding the diamonds in the dust)CS, NcState
Q: How have dummies (like me) managed to gain (some) control over a (seemingly) complex world?
A:The world is simpler than we think.
◆ Models contain clumps
◆ A few collar variables decide which clumps to use.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Quantum Computing: Current Landscape and the Future Role of APIs
PROMISE 2011: "Handling missing data in software effort prediction with naive Bayes and EM"
1. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Handling missing data in software effort
prediction with naive Bayes and EM algorithm
Wen Zhang Ye Yang Qing Wang
Laboratory for Internet Software Technologies
Institute of Software, Chinese Academy of Sciences
Beijing 100190, P.R.China
{zhangwen,ye,wq}@itechs.iscas.ac.cn
7th International Conference on Predictive Models in
Software Engineering (PROMISE), 2011
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
2. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Outline
1 Introduction
2 Naive Bayes and EM for software effort prediction
3 Missing data handling strategies
Missing data toleration strategy.
Missing data imputation strategy
4 Experiments
The datasets
Experiment setup
Experimental results
5 Threats.
6 Conclusion and future work
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
3. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Effort prediction with missing data.
The knowledge on software project effort stored in the
historical datasets can be used to develop predictive
models, by either statistical methods such as linear
regression and correlation analysis to predict the effort of
new incoming projects.
Usually, most historical effort datasets contain large
amount of missing data.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
4. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Effort prediction with missing data.
Due to the small sizes of most historical databases, the
common practice of ignoring projects with missing data will
lead to biased and inaccurate prediction model.
For these reasons, how to handle missing data in software
effort datasets is becoming an important problem.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
5. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Sample data
The historical effort data of projects were organized as
shown in the following Table.
Table: The sample data in historical project dataset.
D X1 ... Xj ... Xn H
D1 x11 ... x1j ... x1n h1
... ... ... ... ... ... ...
Di xi1 ... xij ... xin hi
... ... ... ... ... ... ...
Dm xm1 ... xmj ... xmn hm
Xj (1 ≤ j ≤ n) denotes an attribute of project Di
(1 ≤ i ≤ m). hi is the effort class label of Di and it is
derived from the real effort of project Di .
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
6. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Sample data.
There are l effort classes for all the projects in a dataset,
that is, hi is equal to one of the elements in {c1 , ..., cl }.
Xj is independent of each other and has Boolean values
without missing data, i.e. xij ∈ {0, 1}.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
7. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Formulation of the problem.
An effort dataset Ycom containing m historical projects as
Ycom = (D1 , ..., Di , ..., Dm )T , where Di (1 ≤ i ≤ m) is a
historical project and Di = (xi1 , ..., xij , ..., xin )T is
represented by n attributes Xj (1 ≤ j ≤ n).
hi denotes the effort class label of project Di . For each xij ,
which is the value of attribute Xj ) (1 ≤ j ≤ n)on Di , it would
be observed or missing.
Cross validation on effort prediction is used to to evaluate
the performances of missing data handling techniques.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
8. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Motivation.
EM (Expectation Maximization) algorithm is a method for
finding maximum likelihood or maximum a posteriori
estimates of parameters in statistical models.
The motivation of applying EM(Expectation Maximization)
to na¨ Bayes is to augment the unlabeled projects with
ive
their estimated effort class labels into the labeled data sets.
Thus, the performance of classification would be improved
by using more data to train the prediction model.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
9. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Labeled projects and unlabeled projects.
For a labeled project DiL , its effort class
P(hi = ct ∣DiL ) ∈ {0, 1} is determinate.
For an unlabeled project DiU , its label P(hi = ct ∣DiU ) is
unknown.
However, if we can assign predicted effort class to DiU ,
then DiU could also be used to update the estimates
P{Xj = 0∣ct }, P{Xj = 1∣ct } and P(ct ), and further to refine
the effort prediction model P(ct ∣Di ). This process is
described in Equations 1, 2, 3 and 4.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
10. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Estimating P ( +1)
(Xj = 1∣ct ).
The likelihood of occurrence of Xj with respect to ct at
+ 1 iteration, is updated by Equation 1 using the
estimates at iteration.
1 + m xij P ( ) (hi = ct ∣Di )
P( +1)
(Xj = 1∣ct ) = i=1
. (1)
n+ n j=1
m
i=1 xij P
( ) (h = c ∣D )
i t i
In practice, we explain P ( +1) (Xj = 1∣ct ) as probability of
attribute Xj appearing in a project whose effort class is ct .
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
11. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Estimating P ( +1)
(Xj = 0∣ct ).
Accordingly, the likelihood of non-occurrence of Xj with
respect to ct at + 1 iteration, P ( +1) (Xj = 0∣ct ) is
estimated by Equation 2.
P( +1)
(Xj = 0∣ct ) = 1 − P ( +1)
(Xj = 1∣ct ). (2)
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
12. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Estimating P ( +1)
(ct ).
Second, the effort class prior probability, P ( +1) (ct ), is updated
in the same manner by Equation 3 using estimates at the
iteration. In practice, we may regard P ( +1) (ct ) as the prior
probability of class label ct appearing in all the software
projects.
m ( ) (h
1+ i=1 P i = ct ∣Di )
P( +1)
(ct ) = . (3)
l +m
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
13. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Estimating P ( +1)
(hi ′ = ct ∣Di ′ ).
Third, the posterior probability of an unlabeled project Di ′
belonging to an effort class ct at the + 1 iteration,
P ( +1) (hi ′ = ct ∣Di ′ ), is updated using Equation 4.
P ( ) (ct )P ( ) (Di ′ ∣ct )
P( +1)
(hi ′ = ct ∣Di ′ ) =
P ( ) (Di ′ )
n
P ( ) (ct ) P ( ) (xi ′ j ∣ct ) (4)
j=1
= .
l n
P ( ) (ct ) P ( ) (xi ′ j ∣ct )
t=1 j=1
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
14. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Estimating P ( +1)
(hi ′ = ct ∣Di ′ ).
Hereafter,
for labeled projects, if xij = 1, then
P ( ) (xij ∣ct ) = P ( ) (Xj = 1∣ct ). Otherwise xij = 0, then
P ( ) (xij ∣ct ) = P ( ) (Xj = 0∣ct ).
for unlabeled projects, if xi ′ j = 1, then
P ( ) (xi ′ j ∣ct ) = P ( ) (Xj = 1∣ct ). Otherwise xi ′ j = 0, then
P ( ) (xi ′ j ∣ct ) = P ( ) (Xj = 0∣ct ).
Here, P (0) (Xj = 1∣ct ) and P (0) (ct ) are initially estimated by
merely the labeled projects at the first step of iteration, and
the unlabeled project cases are appended into the learning
process after they were predicted probabilistic effort class
by P (1) (hi ′ = ct ∣Di ′ ).
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
15. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Predicting the effort class of unlabeled projects.
We loop the Equations 1, 2, 3 and 4 until their estimates
converge to stable values.
Then, P ( +1) (h
i′ = ct ∣Di ′ ) is used to predict effort class of
Di ′ .
The ct ∈ {c1 , ..cl } that maximizes P ( +1) (h
i′ = ct ∣Di ′ ) is
regarded as the effort class of Di ′ .
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
16. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies Missing data toleration strategy.
Experiments Missing data imputation strategy
Threats.
Conclusion and future work
Outline
1 Introduction
2 Naive Bayes and EM for software effort prediction
3 Missing data handling strategies
Missing data toleration strategy.
Missing data imputation strategy
4 Experiments
The datasets
Experiment setup
Experimental results
5 Threats.
6 Conclusion and future work
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
17. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies Missing data toleration strategy.
Experiments Missing data imputation strategy
Threats.
Conclusion and future work
Initial setting.
When we use Equation 1 to estimate the likelihood of Xj
with respect to ct , P(Xj = 1∣ct ) or P(Xj = 0∣ct ), we do not
consider missing values involved in xij (1 ≤ i ≤ m).
For each Xj , we can divide the whole historical dataset D
into two subsets, i.e. D = {Dobs,j ∣Dmis,j } where Dobs,j is the
set of projects whose values on attribute Xj are observed
and Dmis,j is the set of projects whose values on attribute
are unobserved.
We may also divide the attributes in a project Di into two
subsets, i.e. Di = {Xobs,i ∣Xmis,i } where Xobs,i is the set of
attributes whose values are observed in project Di and
Xmis,i denotes the set of attributes whose values are
unobserved in project Di .
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
18. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies Missing data toleration strategy.
Experiments Missing data imputation strategy
Threats.
Conclusion and future work
Missing data toleration strategy.
This strategy is very similar with the method adopted by
C4.5 to handle missing data. That is, we ignore missing
values in training prediction model.
To estimate P ( +1) (Xj = 1∣ct ) under this strategy, we
rewrite Equation 1 into Equation 5.
∣Dobs,j ∣
1+ xij P ( ) (hi = ct ∣Di )
i=1
P( +1)
(Xj = 1∣ct ) = n
. (5)
∣Dobs,j ∣
n+ i=1 xij P ( ) (hi = ct ∣Di )
j=1
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
19. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies Missing data toleration strategy.
Experiments Missing data imputation strategy
Threats.
Conclusion and future work
Missing data toleration strategy.
The difference between Equations 1 and 5 lies in that only
observed projects on attribute Xj , i.e., Dobs,j are used to
estimate P ( +1) (Xj = 1∣ct ).
Equation 2 can also be used here to estimate
P ( +1) (Xj = 0∣ct ). To estimate P ( +1) (ct ), Equation 3 can
also be used here.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
20. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies Missing data toleration strategy.
Experiments Missing data imputation strategy
Threats.
Conclusion and future work
Missing data toleration strategy.
Accordingly, the prediction model should be adapted from
Equation 4 to Equation 6.
P ( ) (ct )P ( ) (Di ′ ∣ct )
P( +1)
(hi ′ = ct ∣Di ′ ) =
P ( ) (Di ′ )
∣Xobs,i ∣
P ( ) (ct ) P ( ) (xi ′ j ∣ct )
j=1
= . (6)
∣Xobs,i ∣ l
P ( ) (ct )P ( ) (xi ′ j ∣ct )
j=1 t=1
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
21. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies Missing data toleration strategy.
Experiments Missing data imputation strategy
Threats.
Conclusion and future work
Outline
1 Introduction
2 Naive Bayes and EM for software effort prediction
3 Missing data handling strategies
Missing data toleration strategy.
Missing data imputation strategy
4 Experiments
The datasets
Experiment setup
Experimental results
5 Threats.
6 Conclusion and future work
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
22. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies Missing data toleration strategy.
Experiments Missing data imputation strategy
Threats.
Conclusion and future work
Missing data imputation strategy.
The basic idea of this strategy is that unobserved values of
attributes can be imputed using the observed values.
Then, both observed values and imputed values are used
to construct the prediction model.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
23. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies Missing data toleration strategy.
Experiments Missing data imputation strategy
Threats.
Conclusion and future work
Missing data imputation strategy.
This strategy is an embedded processing in na¨ Bayes
ive
and EM and we may rewrite Equation 1 to Equation 7 to
estimate P ( +1) (Xj = 1∣ct ).
P( +1)
(Xj = 1∣ct ) =
∣Dobs,j ∣ ∣Dmis,j ∣
1+ xij P ( ) (hi = ct ∣Di ) + x˜ P ( ) (hi = ct ∣Ds )
sj
i=1 s=1
.
n ∣Dobs,j ∣ ∣Dmis,j ∣
n+ { xij P ( ) (hi = ct ∣Di ) + x˜ P ( ) (hi = ct ∣Ds )}
sj
j=1 i=1 s=1
(7)
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
24. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies Missing data toleration strategy.
Experiments Missing data imputation strategy
Threats.
Conclusion and future work
Missing data imputation strategy.
The missing value xsj , which is the value of attribute Xj on
the project Ds , is imputed using x˜ with Equation 8
sj
∣Dobs,j ∣
xij P ( ) (hi = ct ∣Di )
i=1
x˜ =
sj . (8)
∣Dobs,j ∣
P ( ) (hi = ct ∣Di )
i=1
x˜ is a constant independent of Ds given ct .
sj
We regulate that x˜ is approximated to 1 if x˜ ≥ 0.5.
sj sj
Otherwise, x˜ is approximated to 0.
sj
Here, we also use Equation 3 to estimate P ( +1) (ct ) .
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
25. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies Missing data toleration strategy.
Experiments Missing data imputation strategy
Threats.
Conclusion and future work
Missing data imputation strategy.
As for the prediction model, P ( +1) (ct ∣Di ), can be
constructed in Equation 9 with considering the missing
values.
P ( ) (ct )P ( ) (Di ′ ∣ct )
P( +1)
(hi ′ = ct ∣Di ′ ) =
P ( ) (Di ′ )
n
P ( ) (ct ) P ( ) (xi ′ j ∣ct )
j=1
= . (9)
n l
P ( ) (ct )P ( ) (xi ′ j ∣ct )
j=1 t=1
Note that if xi ′ j is unobserved, it value will be substituted
with x˜′ j given by Equation 8.
i
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
26. Introduction
Naive Bayes and EM for software effort prediction
The datasets
Missing data handling strategies
Experiment setup
Experiments
Experimental results
Threats.
Conclusion and future work
Outline
1 Introduction
2 Naive Bayes and EM for software effort prediction
3 Missing data handling strategies
Missing data toleration strategy.
Missing data imputation strategy
4 Experiments
The datasets
Experiment setup
Experimental results
5 Threats.
6 Conclusion and future work
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
27. Introduction
Naive Bayes and EM for software effort prediction
The datasets
Missing data handling strategies
Experiment setup
Experiments
Experimental results
Threats.
Conclusion and future work
The ISBSG dataset.
The ISBSG data set (http://www.isbsg.org) has 70
attributes and many attributes have no values in the
corresponding places.
We extract 188 projects with 16 attributes with the criterion
that each project has at least 2/3 attributes whose values
are observed and, for an attribute, its values should be
observed at least in 2/3 of total projects.
13 attributes are nominal attributes and 3 attributes are
continuous attributes.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
28. Introduction
Naive Bayes and EM for software effort prediction
The datasets
Missing data handling strategies
Experiment setup
Experiments
Experimental results
Threats.
Conclusion and future work
The ISBSG dataset.
We use Equation 10 to normalize the efforts of projects
into l(= 3) classes.
l × (effortDi − effortmin )
ct = ⌊ ⌋+1 (10)
effortmax − effortmin
Table: The effort classes in ISBSG data set.
Class No. # of projects Label
1 85 Low
2 76 Medium
3 27 High
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
29. Introduction
Naive Bayes and EM for software effort prediction
The datasets
Missing data handling strategies
Experiment setup
Experiments
Experimental results
Threats.
Conclusion and future work
The CSBSG dataset.
CSBSG dataset contains 1103 projects collected from 140
organizations and 15 regions across China by Chinese
association of software industry.
We extract 94 projects and 21 attributes (15 nominal
attributes and 6 continuous attributes) with same selection
criterion of ISBSG data set. We use Equation 10 to
normalize the efforts of projects into l(= 3) classes.
Table: The effort classes in CSBSG data set.
Class No. # of projects Label
1 27 Low
2 31 Medium
3 36 High
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
30. Introduction
Naive Bayes and EM for software effort prediction
The datasets
Missing data handling strategies
Experiment setup
Experiments
Experimental results
Threats.
Conclusion and future work
Outline
1 Introduction
2 Naive Bayes and EM for software effort prediction
3 Missing data handling strategies
Missing data toleration strategy.
Missing data imputation strategy
4 Experiments
The datasets
Experiment setup
Experimental results
5 Threats.
6 Conclusion and future work
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
31. Introduction
Naive Bayes and EM for software effort prediction
The datasets
Missing data handling strategies
Experiment setup
Experiments
Experimental results
Threats.
Conclusion and future work
Experiment setup.
To evaluate the proposed method comparatively, we adopt
MI and MINI to impute the missing values of the assigned
ISBSG and CSBSG dataset.
BPNN is used to classify the projects in the data sets after
imputation.
Our experiments are conducted with 10-flod
cross-validation technique.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
32. Introduction
Naive Bayes and EM for software effort prediction
The datasets
Missing data handling strategies
Experiment setup
Experiments
Experimental results
Threats.
Conclusion and future work
Outline
1 Introduction
2 Naive Bayes and EM for software effort prediction
3 Missing data handling strategies
Missing data toleration strategy.
Missing data imputation strategy
4 Experiments
The datasets
Experiment setup
Experimental results
5 Threats.
6 Conclusion and future work
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
33. Introduction
Naive Bayes and EM for software effort prediction
The datasets
Missing data handling strategies
Experiment setup
Experiments
Experimental results
Threats.
Conclusion and future work
EM-T and EM-I on ISBSG dataset.
The following figure illustrates the performances, of the
missing data toleration strategy (hereafter called EM-T)
and missing data imputation strategy (hereafter called
EM-I) in handling the missing date for effort prediction on
ISBSG data set.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
34. Introduction
Naive Bayes and EM for software effort prediction
The datasets
Missing data handling strategies
Experiment setup
Experiments
Experimental results
Threats.
Conclusion and future work
EM-T and EM-I on ISBSG dataset.
EM−I
EM−T
BPNN+MI
BPNN+MINI
0.8
0.75
Accuracy
0.7
0.65
0.6
0 4 8 12 16 20
# of unlabeled projects
Figure: Performances of naive Bayes with EM-I and EM-T in
comparison with BPNN on effort prediction using ISBSG data set.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
35. Introduction
Naive Bayes and EM for software effort prediction
The datasets
Missing data handling strategies
Experiment setup
Experiments
Experimental results
Threats.
Conclusion and future work
EM-T and EM-I on ISBSG dataset.
What we can see from the figure.
Both EM-I and EM-T have better performances than BPNN
with either MI or MINI on classifying the projects in ISBSG
data set.
The performance of naive Bayes and EM is augmented
when unlabeled projects are appended. This outcome
illustrates that semi-supervised learning can improve the
prediction of software effort.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
36. Introduction
Naive Bayes and EM for software effort prediction
The datasets
Missing data handling strategies
Experiment setup
Experiments
Experimental results
Threats.
Conclusion and future work
EM-T and EM-I on ISBSG dataset.
What we can see from figure.
If supervised learning was used for software effort
prediction, MINI method is favorable to impute the missing
values but missing toleration strategy may not be desirable
to handle missing values.
Imputing strategy for missing data is more effective than
tolerating strategy when naive Bayes and EM is used for
predicting ISBSG software efforts.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
37. Introduction
Naive Bayes and EM for software effort prediction
The datasets
Missing data handling strategies
Experiment setup
Experiments
Experimental results
Threats.
Conclusion and future work
EM-T and EM-I on CSBSG dataset.
EM-T and EM-I in handling the missing date for effort
prediction on CSBSG dataset.
0.8
EM−I
EM−T
BPNN+MI
BPNN+MINI
0.75
0.7
Accuracy
0.65
0.6
0.55
0.5
0 2 4 6 8
# of unlabeled projects
Figure: Performances of EM-I and EM-T in comparison with BPNN on predicting effort with different
number of unlabeled projects using CSBSG dataset.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
38. Introduction
Naive Bayes and EM for software effort prediction
The datasets
Missing data handling strategies
Experiment setup
Experiments
Experimental results
Threats.
Conclusion and future work
EM-T and EM-I on CSBSG dataset.
What we can see from the above figure.
The better performance of EM-I than EM-T is also
observed using CSBSG data set, which is the same as
using ISBSG dataset. This further validate our conjecture
that EM-I outperforms EM-T in software effort prediction.
EM-T has better performance than EM-I on condition that
the number of unlabeled projects is larger than that of
"maxima", that is different from that of ISBSG dataset. We
explain this result may be brought out by the relative small
size of CSBSG dataset where imputation strategy will be
more prone to bring bias into predictive than toleration
strategy.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
39. Introduction
Naive Bayes and EM for software effort prediction
The datasets
Missing data handling strategies
Experiment setup
Experiments
Experimental results
Threats.
Conclusion and future work
More experiments and hypotheses testing.
More experimental results with explanations are detailed in the
paper. Also, we conduct hypotheses testing to examine the
significance of the conclusions draw from our experiments. One
of interest may refer to the paper.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
40. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
The threat to external validity primarily is the degree to
which the attributes we used to describe the projects and
the representative capacity of ISBSG and CSBSG sample
datasets.
The threat to internal validity are measurement and data
effects that can bias our results caused by performance
measure as accuracy.
The threat to construct validity is that our experiments
make use of clipping attributes and clipping project data
from both ISBSG and CSBSG datasets
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
41. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Conclusion
Semi-supervised learning as naive Bayes and EM is
employed to predict software effort.
We propose two embedded strategies in naive Bayes and
EM to handle the missing data.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
42. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Future work
We plan to compare the proposed techniques with other
missing data imputation techniques, such as FIML and
MSWR.
We will develop more missing data techniques embedded
with naive Bayes and EM for software effort prediction.
We have already investigated the underlying mechanism of
missingness (structural missing or unstructured missing) of
software effort data. With this progress, we will improve the
missing data handling strategies oriented to the underlying
missing mechanism of software effort data.
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm
43. Introduction
Naive Bayes and EM for software effort prediction
Missing data handling strategies
Experiments
Threats.
Conclusion and future work
Thanks
Any further questions about the content of the slides and the
paper can be sent to Mr. Wen Zhang.
Email: zhangwen@itechs.iscas.ac.cn
Wen Zhang, Ye Yang, Qing Wang Software effort prediction with naive Bayes and EM algorithm