Dear Student,
DREAMWEB TECHNO SOLUTIONS is one of the Hardware Training and Software Development centre available in
Trichy. Pioneer in corporate training, DREAMWEB TECHNO SOLUTIONS provides training in all software
development and IT-related courses, such as Embedded Systems, VLSI, MATLAB, JAVA, J2EE, CIVIL,
Power Electronics, and Power Systems. It’s certified and experienced faculty members have the
competence to train students, provide consultancy to organizations, and develop strategic
solutions for clients by integrating existing and emerging technologies.
ADD: No:73/5, 3rd Floor, Sri Kamatchi Complex, Opp City Hospital, Salai Road, Trichy-18
Contact @ 7200021403/04
phone: 0431-4050403
We propose a model for carrying out deep learning based multimodal sentiment analysis. The MOUD dataset is taken for experimentation purposes. We developed two parallel text based and audio basedmodels and further, fused these heterogeneous feature maps taken from intermediate layers to complete thearchitecture. Performance measures–Accuracy, precision, recall and F1-score–are observed to outperformthe existing models.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
Dear Student,
DREAMWEB TECHNO SOLUTIONS is one of the Hardware Training and Software Development centre available in
Trichy. Pioneer in corporate training, DREAMWEB TECHNO SOLUTIONS provides training in all software
development and IT-related courses, such as Embedded Systems, VLSI, MATLAB, JAVA, J2EE, CIVIL,
Power Electronics, and Power Systems. It’s certified and experienced faculty members have the
competence to train students, provide consultancy to organizations, and develop strategic
solutions for clients by integrating existing and emerging technologies.
ADD: No:73/5, 3rd Floor, Sri Kamatchi Complex, Opp City Hospital, Salai Road, Trichy-18
Contact @ 7200021403/04
phone: 0431-4050403
We propose a model for carrying out deep learning based multimodal sentiment analysis. The MOUD dataset is taken for experimentation purposes. We developed two parallel text based and audio basedmodels and further, fused these heterogeneous feature maps taken from intermediate layers to complete thearchitecture. Performance measures–Accuracy, precision, recall and F1-score–are observed to outperformthe existing models.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
ScalaItaly 2015 - Your Microservice as a FunctionPhil Calçado
SoundCloud's microservice architecture is built mostly in Scala, using Finagle as its distributed systems workhorse. Finagle is an RPC system for the JVM, and it is based on a pipes-and-filters architecture that maps very nicely to functional programming concepts of higher-order functions and combinators. Over the past few years we have found that it is extremely useful to go even a step further and think of microservices as functions themselves. In this talk let's explore how SoundCloud uses Scala and Finagle, and how we started thinking of a microservices architecture as a special case of a functional system.
A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...IJNSA Journal
This paper introduces a multi-layer hybrid text steganography approach by utilizing word tagging and recoloring. Existing approaches are planned to be either progressive in getting imperceptibility, or high hiding limit, or robustness. The proposed approach does not use the ordinary sequential inserting process and overcome issues of the current approaches by taking a careful of getting imperceptibility, high hiding limit, and robustness through its hybrid work by using a linguistic technique and a format-based technique. The linguistic technique is used to divide the cover text into embedding layers where each layer consists of a sequence of words that has a single part of speech detected by POS tagger, while the format-based technique is used to recolor the letters of a cover text with a near RGB color coding to embed 12 bits from the secret message in each letter which leads to high hidden capacity and blinds the embedding, moreover, the robustness is accomplished through a multi-layer embedding process, and the generated stego key significantly assists the security of the embedding messages and its size. The experimental results comparison shows that the purpose approach is better than currently developed approaches in providing an ideal balance between imperceptibility, high hiding limit, and robustness criteria.
Bayesian distance metric learning and its application in automatic speaker re...IJECEIAES
This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data.
A mix network by Wikstrom fails in correctness, provable privacy and soundness. Its claimed advantages in security and efficiency are compromised. The analysis in this paper illustrates that although the first two failures may be fixed by modifying the shuffling protocol, the last one is too serious to fix at a tolerable cost. Especially, an attack is proposed to show how easily soundness of the shuffling scheme can be compromised. Moreover, the most surprising discovery in this paper is that it is formally illustrated that in practice it is impossible to fix soundness of the shuffling scheme by Wikstrom
Finding Bad Code Smells with Neural Network Models IJECEIAES
Code smell refers to any symptom introduced in design or implementation phases in the source code of a program. Such a code smell can potentially cause deeper and serious problems during software maintenance. The existing approaches to detect bad smells use detection rules or standards using a combination of different object-oriented metrics. Although a variety of software detection tools have been developed, they still have limitations and constraints in their capabilities. In this paper, a code smell detection system is presented with the neural network model that delivers the relationship between bad smells and object-oriented metrics by taking a corpus of Java projects as experimental dataset. The most well-known objectoriented metrics are considered to identify the presence of bad smells. The code smell detection system uses the twenty Java projects which are shared by many users in the GitHub repositories. The dataset of these Java projects is partitioned into mutually exclusive training and test sets. The training dataset is used to learn the network model which will predict smelly classes in this study. The optimized network model will be chosen to be evaluated on the test dataset. The experimental results show when the modelis highly trained with more dataset, the prediction outcomes are improved more and more. In addition, the accuracy of the model increases when it performs with higher epochs and many hidden layers.
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...ijaia
Network intrusion detection often finds a difficulty in creating classifiers that could handle unequal distributed attack categories. Generally, attacks such as Remote to Local (R2L) and User to Root (U2R) attacks are very rare attacks and even in KDD dataset, these attacks are only 2% of overall datasets. So, these result in model not able to efficiently learn the characteristics of rare categories and this will result in poor detection rates of rare attack categories like R2L and U2R attacks. We even compared the accuracy of KDD and NSL-KDD datasets using different classifiers in WEKA.
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...gerogepatton
Network intrusion detection often finds a difficulty in creating classifiers that could handle unequal distributed attack categories. Generally, attacks such as Remote to Local (R2L) and User to Root (U2R) attacks are very rare attacks and even in KDD dataset, these attacks are only 2% of overall datasets. So,these result in model not able to efficiently learn the characteristics of rare categories and this will result in
poor detection rates of rare attack categories like R2L and U2R attacks. We even compared the accuracy of KDD and NSL-KDD datasets using different classifiers in WEKA.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Performance analysis of linkage learning techniques in genetic algorithmseSAT Journals
Abstract One variance of Genetic Algorithms is a Linkage Learning Genetic Algorithm (LLGA) enhances the efficiencies of Simple Genetic Algorithm (SGA) while solving NP hard Problems. Discovery of Linkage Learning Technique is an important task in GA. Almost all existing Linkage Learning Techniques follow either random approach or probabilistic approaches. This makes repeated passes over the population to determine the relationship between individuals. SGA with random linkage technique is simple but may take long time to converge to the optimal solutions. This paper uses a linkage learning operator called Gene Silencing which is an inspired mechanism from biological systems. The Gene Silencing mechanism is used to improve the linkages by preserving the building blocks in an individual from the disruption of recombination processes such as Crossover and Mutation. It converges quickly to the optimal solution without compromising the diversification on search spaces. To prove this phenomenon, the Travelling Sales Person problem (TSP) has been chosen to retain the order of cities in a tour. Experiments carried out on different TSP benchmark instances taken from TSPLIB which is a standard library for TSP problems. These benchmark instances have also been applied on various linkage learning techniques and analyses the performance of these techniques with Gene Silencing (GS) mechanism. The performance analysis has been made on experimental results with respect to optimal solution and convergence speed. Index Terms: Linkage Learning, Gene Silencing, Building Blocks, Genetic Algorithm, TSPLIB, Performance Analysis
This presentation outlines what makes a better landing page to enable your business to make more money. If your business doesn't create online revenue then the same methods and tactics will help improve stickiness and engagement.
ScalaItaly 2015 - Your Microservice as a FunctionPhil Calçado
SoundCloud's microservice architecture is built mostly in Scala, using Finagle as its distributed systems workhorse. Finagle is an RPC system for the JVM, and it is based on a pipes-and-filters architecture that maps very nicely to functional programming concepts of higher-order functions and combinators. Over the past few years we have found that it is extremely useful to go even a step further and think of microservices as functions themselves. In this talk let's explore how SoundCloud uses Scala and Finagle, and how we started thinking of a microservices architecture as a special case of a functional system.
A MULTI-LAYER HYBRID TEXT STEGANOGRAPHY FOR SECRET COMMUNICATION USING WORD T...IJNSA Journal
This paper introduces a multi-layer hybrid text steganography approach by utilizing word tagging and recoloring. Existing approaches are planned to be either progressive in getting imperceptibility, or high hiding limit, or robustness. The proposed approach does not use the ordinary sequential inserting process and overcome issues of the current approaches by taking a careful of getting imperceptibility, high hiding limit, and robustness through its hybrid work by using a linguistic technique and a format-based technique. The linguistic technique is used to divide the cover text into embedding layers where each layer consists of a sequence of words that has a single part of speech detected by POS tagger, while the format-based technique is used to recolor the letters of a cover text with a near RGB color coding to embed 12 bits from the secret message in each letter which leads to high hidden capacity and blinds the embedding, moreover, the robustness is accomplished through a multi-layer embedding process, and the generated stego key significantly assists the security of the embedding messages and its size. The experimental results comparison shows that the purpose approach is better than currently developed approaches in providing an ideal balance between imperceptibility, high hiding limit, and robustness criteria.
Bayesian distance metric learning and its application in automatic speaker re...IJECEIAES
This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data.
A mix network by Wikstrom fails in correctness, provable privacy and soundness. Its claimed advantages in security and efficiency are compromised. The analysis in this paper illustrates that although the first two failures may be fixed by modifying the shuffling protocol, the last one is too serious to fix at a tolerable cost. Especially, an attack is proposed to show how easily soundness of the shuffling scheme can be compromised. Moreover, the most surprising discovery in this paper is that it is formally illustrated that in practice it is impossible to fix soundness of the shuffling scheme by Wikstrom
Finding Bad Code Smells with Neural Network Models IJECEIAES
Code smell refers to any symptom introduced in design or implementation phases in the source code of a program. Such a code smell can potentially cause deeper and serious problems during software maintenance. The existing approaches to detect bad smells use detection rules or standards using a combination of different object-oriented metrics. Although a variety of software detection tools have been developed, they still have limitations and constraints in their capabilities. In this paper, a code smell detection system is presented with the neural network model that delivers the relationship between bad smells and object-oriented metrics by taking a corpus of Java projects as experimental dataset. The most well-known objectoriented metrics are considered to identify the presence of bad smells. The code smell detection system uses the twenty Java projects which are shared by many users in the GitHub repositories. The dataset of these Java projects is partitioned into mutually exclusive training and test sets. The training dataset is used to learn the network model which will predict smelly classes in this study. The optimized network model will be chosen to be evaluated on the test dataset. The experimental results show when the modelis highly trained with more dataset, the prediction outcomes are improved more and more. In addition, the accuracy of the model increases when it performs with higher epochs and many hidden layers.
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...ijaia
Network intrusion detection often finds a difficulty in creating classifiers that could handle unequal distributed attack categories. Generally, attacks such as Remote to Local (R2L) and User to Root (U2R) attacks are very rare attacks and even in KDD dataset, these attacks are only 2% of overall datasets. So, these result in model not able to efficiently learn the characteristics of rare categories and this will result in poor detection rates of rare attack categories like R2L and U2R attacks. We even compared the accuracy of KDD and NSL-KDD datasets using different classifiers in WEKA.
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...gerogepatton
Network intrusion detection often finds a difficulty in creating classifiers that could handle unequal distributed attack categories. Generally, attacks such as Remote to Local (R2L) and User to Root (U2R) attacks are very rare attacks and even in KDD dataset, these attacks are only 2% of overall datasets. So,these result in model not able to efficiently learn the characteristics of rare categories and this will result in
poor detection rates of rare attack categories like R2L and U2R attacks. We even compared the accuracy of KDD and NSL-KDD datasets using different classifiers in WEKA.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Performance analysis of linkage learning techniques in genetic algorithmseSAT Journals
Abstract One variance of Genetic Algorithms is a Linkage Learning Genetic Algorithm (LLGA) enhances the efficiencies of Simple Genetic Algorithm (SGA) while solving NP hard Problems. Discovery of Linkage Learning Technique is an important task in GA. Almost all existing Linkage Learning Techniques follow either random approach or probabilistic approaches. This makes repeated passes over the population to determine the relationship between individuals. SGA with random linkage technique is simple but may take long time to converge to the optimal solutions. This paper uses a linkage learning operator called Gene Silencing which is an inspired mechanism from biological systems. The Gene Silencing mechanism is used to improve the linkages by preserving the building blocks in an individual from the disruption of recombination processes such as Crossover and Mutation. It converges quickly to the optimal solution without compromising the diversification on search spaces. To prove this phenomenon, the Travelling Sales Person problem (TSP) has been chosen to retain the order of cities in a tour. Experiments carried out on different TSP benchmark instances taken from TSPLIB which is a standard library for TSP problems. These benchmark instances have also been applied on various linkage learning techniques and analyses the performance of these techniques with Gene Silencing (GS) mechanism. The performance analysis has been made on experimental results with respect to optimal solution and convergence speed. Index Terms: Linkage Learning, Gene Silencing, Building Blocks, Genetic Algorithm, TSPLIB, Performance Analysis
This presentation outlines what makes a better landing page to enable your business to make more money. If your business doesn't create online revenue then the same methods and tactics will help improve stickiness and engagement.
Sydney’s main source of water came from the tank stream in Sydney cove before it was polluted in 1826. Pipes were laid out throughout Sydney and we became more dependent on bore water. The Upper Nepean and Warragamba dam were considered to be the solution. The government has built Primary, Secondary and tertiary Wastewater treatment plants. These plants help remove solids, inorganic material, organic, metals, pathogens, nitrogen and phosphorus. Now the government has proposed a Desalination plant at Kurnell to help Sydney’s water crisis.
Implementation of reducing features to improve code change based bug predicti...eSAT Journals
Abstract Today, we are getting plenty of bugs in the software because of variations in the software and hardware technologies. Bugs are nothing but Software faults, existing a severe challenge for system reliability and dependability. To identify the bugs from the software bug prediction is convenient approach. To visualize the presence of a bug in a source code file, recently, Machine learning classifiers approach is developed. Because of a huge number of machine learned features current classifier-based bug prediction have two major problems i) inadequate precision for practical usage ii) measured prediction time. In this paper we used two techniques first, cos-triage algorithm which have a go to enhance the accuracy and also lower the price of bug prediction and second, feature selection methods which eliminate less significant features. Reducing features get better the quality of knowledge extracted and also boost the speed of computation. Keywords: Efficiency, Bug Prediction, Classification, Feature Selection, Accuracy
Abstract
Researchers in the field of software engineering, business process improvement and information engineering all want to drastically modernize software life-cycle processes and technologies to correct the problems and to improve the quality of software. Research goals have included ancillary issues, such as improving user services through conversion to new platforms and facilitating software processes by adopting automated tools. Automated tools for software development, understanding, maintenance, and documentation add to process maturity, leading to better quality and reliability of computer services and greater customer satisfaction. This paper focuses on critical issues of legacy program improvement. The program improvement needs the estimation of program from various perspectives. The paper highlights various elements of legacy program complexity which further can be taken in account for further program development.
Keywords: Legacy, Program, Software complexity, Code, Integration
A review paper: optimal test cases for regression testing using artificial in...IJECEIAES
The goal of the testing process is to find errors and defects in the software being developed so that they can be fixed and corrected before they are delivered to the customer. Regression testing is an essential quality testing technique during the maintenance phase of the program as it is performed to ensure the integrity of the program after modifications have been made. With the development of the software, the test suite becomes too large to be fully implemented within the given test cost in terms of budget and time. Therefore, the cost of regression testing using different techniques should be reduced, here we dealt many methods such as retest all technique, regression test selection technique (RTS) and test case prioritization technique (TCP). The efficiency of these techniques is evaluated through the use of many metrics such as average percentage of fault detected (APFD), average percentage block coverage (APBC) and average percentage decision coverage (APDC). In this paper we dealt with these different techniques used in test case selection and test case prioritization and the metrics used to evaluate their efficiency by using different techniques of artificial intelligent and describe the best of all.
The International Journal of Engineering and Science (IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Fusion of data from multiple sources is generating new information from existing data. Now users can access any information from inside or outside of the organization very easily. It helps to increase the user productivity and knowledge shared within the organization. But this leads to a new area of network security threat, “Inside Threat”. Now users can share critical information of organization to outside the organization if he/she has access to the information. The current network security tool cannot prevent the new threat. In this paper, we address this issue by “Building real time anomaly detection system based on users’ current behavior and previous behavior”.
ER Publication,
IJETR, IJMCTR,
Journals,
International Journals,
High Impact Journals,
Monthly Journal,
Good quality Journals,
Research,
Research Papers,
Research Article,
Free Journals, Open access Journals,
erpublication.org,
Engineering Journal,
Science Journals,
Study of Software Defect Prediction using Forward Pass RNN with Hyperbolic Ta...ijtsrd
For the IT sector and software specialists, software failure prediction and proneness have long been seen as crucial issues. Conventional methods need prior knowledge of errors or malfunctioning modules in order to identify software flaws inside an application. By using machine learning approaches, automated software fault recovery models allow the program to substantially forecast and recover from software problems. This feature helps the program operate more efficiently and lowers errors, time, and expense. Using machine learning methods, a software fault prediction development model was presented, which might allow the program to continue working on its intended mission. Additionally, we assessed the models performance using a variety of optimization assessment benchmarks, including accuracy, f1 measure, precision, recall, and specificity. Convolutional neural networks and its hyperbolic tangent functions are the basis of the deep learning prediction model FPRNN HTF Forward Pass RNN with Hyperbolic Tangent Function technique. The assessment procedure demonstrated the high accuracy rate and effective application of CNN algorithms. Moreover, a comparative measure is used to evaluate the suggested prediction model against other methodologies. The gathered data demonstrated the superior performance of the FPRNN HTF technique. Swati Rai | Dr. Kirti Jain "Study of Software Defect Prediction using Forward Pass RNN with Hyperbolic Tangent Function" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-6 , December 2023, URL: https://www.ijtsrd.com/papers/ijtsrd60159.pdf Paper Url: https://www.ijtsrd.com/humanities-and-the-arts/education/60159/study-of-software-defect-prediction-using-forward-pass-rnn-with-hyperbolic-tangent-function/swati-rai
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEijesajournal
Automation is a powerful word that lies everywhere. It shows that without automation, application will not
get developed. In a semiconductor industry, artificial intelligence played a vital role for implementing the
chip based design through automation .The main advantage of applying the machine learning & deep
learning technique is to improve the implementation rate based upon the capability of the society. The
main objective of the proposed system is to apply the deep learning using data driven approach for
controlling the system. Thus leads to a improvement in design, delay ,speed of operation & costs.
Through this system, huge volume of data’s that are generated by the system will also get control.
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEijesajournal
Automation is a powerful word that lies everywhere. It shows that without automation, application will not
get developed. In a semiconductor industry, artificial intelligence played a vital role for implementing the
chip based design through automation .The main advantage of applying the machine learning & deep learning technique is to improve the implementation rate based upon the capability of the society. The main objective of the proposed system is to apply the deep learning using data driven approach for controlling the system. Thus leads to a improvement in design, delay ,speed of operation & costs.Through this system, huge volume of data’s that are generated by the system will also get control.
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEijesajournal
Automation is a powerful word that lies everywhere. It shows that without automation, application will not get developed. In a semiconductor industry, artificial intelligence played a vital role for implementing the chip based design through automation .The main advantage of applying the machine learning & deep learning technique is to improve the implementation rate based upon the capability of the society. The main objective of the proposed system is to apply the deep learning using data driven approach for controlling the system. Thus leads to a improvement in design, delay ,speed of operation & costs. Through this system, huge volume of data’s that are generated by the system will also get control.
2. 0-knowledge fuzzing
Vincenzo Iozzo
vincenzo.iozzo@zynamics.com
February 9, 2010
Abstract and widely established methodology employed in
COTS software vulnerability discovery process.
Nowadays fuzzing is a pretty common technique The first appearance of fuzzing in software test-
used both by attackers and software developers. ing dates back to 1988 by Professor Barton
Currently known techniques usually involve Miller[1]; since then the technique has evolved
knowing the protocol/format that needs to be a lot and it is not only used by attackers to dis-
fuzzed and having a basic understanding of how cover vulnerabilities but also internally by many
the user input is processed inside the binary. companies to find bugs in their software.
In the past since fuzzing was little-used obtain- Over the course of time a lot of different imple-
ing good results with a small amount of effort mentations of fuzz testing have been researched,
was possible. nonetheless it is commonly believed that there
Today finding bugs requires digging a lot inside are two predominant approaches to fuzzing:
the code and the user-input as common vulner- Mutation-based and Generation-based.
abilies are already identified and fixed by devel- The former is based on random mutations of
opers. This paper will present an idea on how known well-formed data, whereas the latter cre-
to effectively fuzz with no knowledge of the user- ates testing samples using templates describing
input and the binary. the format of the software input.
Specifically the paper will demonstrate how tech- Both approaches have their advantages and pit-
niques like code coverage, data tainting and in- falls. The former requires little effort to be im-
memory fuzzing allow to build a smart fuzzer plemented and it is reusable across different soft-
with no need to instrument it. ware. Nonetheless given the raising interest com-
panies have shown in properly testing and devel-
oping products this approach will generally yield
1 Introduction worse results than generation-based fuzzers.
The second approach has the advantage of ob-
Fuzzing, or fuzz testing, is a software test-
taining better results in terms of bugs found, al-
ing methodology whose aim is to provide in-
though it requires knowledge of the input format
valid, unexpected or random inputs to a pro-
the binary expects and its reusability is bounded
gram. Although the idea behind this technique
to binaries that deal with the same input format.
is conceptually very simple it is a well known
2
3. The difficulty of creating input models can range of view.
from low for public data formats to almost infea- To the best of the author‘s knowledge there are
sible for proprietary formats. no public attempts at combining together these
In order to ease the process of creating input techniques for fuzz-testing purposes. A notable
templates various approaches have been stud- exception is Flayer which nonetheless only fo-
ied, most notably evolutionary fuzzers and in- cuses on dynamic analysis and program paths
memory fuzzers. manipulation in order to discover software de-
Both are derived from mutation-based fuzzers fects.
but for different purposes. The first type of The rest of this paper is organized as follows.
fuzzers, in fact, by employing genetic algorithms In section 2 we provide basic background infor-
attempts to generate sets of data which resem- mation on the metrics used. Section 3 discusses
ble as precisely as possible the input format. The related work. Section 4 presents our approach
latter, instead, first requires a human to manu- and implementation. Finally we conclude and
ally identify specific functions inside the binary discuss future work directions in Section 5.
then mutates the input in-memory in order to
prevent data validation which could lead to dif-
ferent code paths thus resulting in not fuzzing 2 Background
crucial pieces of an application.
Evolutionary based fuzzers suffer from the diffi- In this section, we present background informa-
culty of identifying proper scoring and mutation tion on static analysis metrics, data tainting, and
functions and for this approach to be effective it in-memory fuzzing.
usually requires more time than the generation- In our implementation we use primary two static
based one. In-memory fuzzing on the other hand analysis techniques: cyclomatic complexity and
has a high rate of false positives and negatives loop detection.
and it requires an expert reverse engineer in or- Cyclomatic complexity is a software metric used
der to identify proper test cases. to determine how complex in terms of code paths
In this paper the author presents an approach a function is. The computation is done on the
to fuzz testing based on in-memory fuzzing aim- number of edges and nodes a function contains.
ing at limiting human intervention and minimiz- Intuitively the more the structure of the function
ing the number of false positives and negatives is complicated the more complex the function
that currently affects this technique. The pro- is. In [2] the connection between function com-
posed methodology employs a range of known plexity and bugs presence has been discussed.
metrics from both static and dynamic program Although there is not always a correlation be-
analysis together with a new technique for in- tween the two, it is reasonable to assume that
memory fuzzing. Specifically we will use data more complex functions are prone to contain
tainting for tracking user input, thus being able bugs given the amount of code they contain.
to identify locations in-memory suitable for test- Another metric employed is loop detection. This
ing; we will also employ static analysis metrics algorithm takes advantage of some properties of
in order to identify functions in the binary that a function flowgraph and its dominator tree in
can be interesting from a security testing point order to detect loops present in compiled code.
3
4. This technique is widely used in compilers for op- the technique so that basic blocks execution is
timization purposes, and it has some interesting being traced. This implementation does not
aspects from a security prospective as well. It take into account code paths and therefore might
is commonly known, in fact, that memory write be imprecise in some circumstances, nonetheless
often happens inside loops and that most compil- we consider this trade-off to be acceptable as it
ers usually inline functions like memcpy so that avoids to overly complicating the implementa-
the function will effectively result in a loop. tion and improves the fuzzer performance.
Another crucial piece of infrastructure for the
proposed fuzzer is the data tainting engine. The
goal of data tainting is to gather information on
3 Related work
how user input is propagated through a binary. In this section we will briefly describe exist-
The concept of data tainting is intuitively very ing approaches to data tainting and in-memory
simple, one or more markings are associated with fuzzing together with a brief description of
some data supposedly representing the user in- Flayer[3] being it the closest work to the one de-
put and those markings are propagated follow- scribed in this paper.
ing the program flow. Although it is possible to
perform data tainting using static analysis the
3.1 Existing in-memory fuzzing im-
complexity of the task and the possible incom-
plementations
plete set of information led the author to choose
a dynamic analysis approach to the problem by
taking advantage of an existing dynamic data
tainting framework called Dytan[6]. Using dy-
namic data tainting has the benefit of obtain-
ing more precise and richer information on data
propagation although it will not be able to ex-
plore program paths that are not executed at (a) Mutation loop insertion
run-time. Given the nature of the fuzzer, ob-
taining information on non-executed code paths
is of no interest as in-memory fuzzing relies on
the ability to reach code paths by mutating a set
of known good data.
Finally in order to monitor the effectiveness
of our fuzzer, we employ a software testing mea-
sure known as code coverage. This technique
(b) Snapshot restoration
verifies the degree to which the code of a pro-
mutation
gram has been tested by tracing the execution
of the binary. Although there are many differ- Figure 1: Known implementations of in-memory
ent implementations of code coverage all using fuzzing
different criteria in terms of the kind of informa-
tion to record, the author decided to implement To the best of the author‘s knowledge in-
4
5. memory fuzzing was first introduced to the pub- of software. A lot of implementations of data
lic by Greg Hoglund of HBGary in [4] and later tainting frameworks exist, for this reason the
further developed by Amini at al[5]. Currently author decided to use a framework previously
there are two public methods: Mutation loop in- created by James Clause and Alessando Orso of
sertion and snapshot restoration mutation. Gatech called Dytan[6]. The decision was made
The first method works by inserting an uncon- based on a number of requirements.
ditional jump from the function being tested to First and foremost the ability to instrument bi-
a function responsible for mutating the data re- naries without any recompilation or access to the
siding in the process address space of the fuzzed source code.
binary. At the end of the mutation function an- Another very important requirement was porta-
other unconditional jump to the beginning of the bility, most of the existing implementations are
currently tested function is inserted. The control based on Valgrind[7] which does not support the
flow graph of this approach is shown in 1(a). Windows platform. The two most appealing
This approach suffers of a number of drawbacks candidates were Temu[9] and Dytan.
with a high rate of false negatives and stack con- The first one is built on the top of a modified
sumption being the two major ones. Another version of Qemu[8]. Although this would have
disadvantage of this method is the general insta- respected both the initial requirements we think
bility of the memory after a few fuzzing itera- that a data tainting framework based on a vir-
tions. tual machine emulator is overkill for our goals.
The second approach works by inserting an un- Besides the implementation in the author‘s opin-
conditional jump from the beginning of the func- ion is not yet robust enough.
tion being tested to a function responsible of tak- Dytan is implemented as a pintool[16]. It is a
ing a memory snapshot. This function will later flexible framework and can run on both Linux
call again the tested function. At the end of the and Windows.
analyzed function another unconditional jump is
inserted. The jump points to a function respon- 3.3 Additional related work
sible of restoring the memory, fuzzing data and
executing again the fuzzed function. A control As already mentioned in the previous section
flow diagram employing this approach is shown Flayer[3] is the most similar work to the ap-
in Figure 1(b). Although this method has some proach discussed in this paper. The software
advantages in respect to the first one described, combines data tainting and the ability to force
code paths. Differently from many other data
it still suffers from a high false positives rate and
it is also slower given the need of continuously tainting tools Flayer has bit-precision markings.
having to restore process memory. Although this grants a higher degree of precision
in obtaining information on data propagation for
3.2 Existing data tainting implemen- the purpose of our work byte-precision markings
are detailed enough.
tations
Another limitation is the software the tool is
Dynamic data tainting has gained momentum in based on; as already mentioned Valgrind does
the last few years given the increase complexity not support Windows which severly impairs the
5
6. usefulness of the tool. should be replaced by a more sophisticated ap-
Finally even if the main aim of the tool is not proach which takes into account scores coming
fuzzing it has the ability of forcing code paths from various metrics and weights them in respect
and therefore it can be used to test various code to their relevance from a security prospective.
paths. This method has three main drawbacks;
the first one is a high number of false positives,
the second one is the absence of a sample which
can be later used by the attacker to reproduce
the bug and finally a problem known as code-
path explosion. This problem arises because the
number of code paths to force increases exponen-
tially with the complexity of the software.
Figure 3: The edge in red is missed by the ap-
4 Proposed approach and im- proximative cyclomatic complexity formula.
plementation
Cyclomatic complexity Cyclomatic com-
plexity was first described by Robert McCabe
in [10]. The purpose of this metric is to calcu-
late the number of independent paths in a code
section. Many formulation of this metric have
been given, we briefly explain the ones that are
Figure 2: Fuzzer components relevant to our fuzzer.
In this section we will present the idea and Definition Let G be a flowgraph, E the num-
implementation of our work. As shown in Figure ber of edges in G, N the number of nodes in G
2 our fuzzer can be divided into 4 parts. and P the number of connected components in
G.
Cyclomatic complexity is defined as:
4.1 Static analysis metrics
M = E − N + 2P (1)
Static analysis algorithms are used to determine
which functions could be potentially of inter- A connected component is a subgraph in which
est for our fuzzer. We assign a higher score to any two vertices are connected to each other by
functions that have a high cyclomatic complex- paths. This formula originates from the cyclo-
ity score and at least one loop in them; we then matic number:
consider all the functions that have loops but a
low cyclomatic complexity score and finally we Definition Let G be a strongly connected
take into account the remaining functions. Ide- graph, E the number of edges in G, N the
ally we will add more metrics to the implementa- number of nodes in G and P the number of
tion, therefore this rather trivial scoring system connected components in G.
6
7. The Cyclomatic number is defined as:
V (G) = E − N + P (2)
It should be notice that the cyclomatic num-
(a) A (b) Domi- (c) The
ber can be calculated only on strongly connected
function nator tree nodes
graphs, that is a graph in which from every pair flowgraph, of the in green
of vertices there is a direct path connecting them nodes in previous dominate
in both directions. McCabe proved that the flow- blue belong function, the node in
to a loop nodes red in the
graph of a function with a single entry point and
in green dominator
a single exit point can be considered a strongly correspond tree
connected graph and therefore the cyclomatic to the
number theorem applies and that P = 1, thus blue ones
the resulting simplified formula is: highlighted
in picture
(a)
M =E−N +2 (3)
Intuitively when a flowgraph has multiple exit Figure 4: Graphs used in loop detection algo-
points the aforementioned formula doesnt hold rithm
true anymore. Another one should be therefore
used: Loop detection algorithm As previously
mentioned another metric, loop detection, is
Definition Let G be a flowgraph, π the number used to select functions. The first required step
of decision points in G and s the number of exit is to extract the dominator tree out of a function.
points in G. Cyclomatic complexity is defined Formally:
as:
Definition A dominator tree is a tree where
M =π−s+2 (4) each node‘s children are the nodes it immedi-
ately dominates.
Applying (3) to functions with multiple exit
A node d is said to dominate node k if every
points we will have, in fact, lower cyclomatic
path from the start node s to node k must go
complexity values by a minimum factor of 2. Fig-
throught node d.
ure 3 shows typical edges and connected com-
ponents missed by using (3). To give a visual example of a dominator tree
Nonetheless the author believes that the less pre- of a function please refer to Figure 4. Nodes in
cise measurement can be used without impairing blue in Figure 4(a) are highlighted in the dom-
the results. inator tree in green in Figure 4(b).
We implemented cyclomatic complexity calcula- There are two known algorithms used to cal-
tion for each function in a module by using Bin- culate the dominator tree of a flowgraph. It
Navi API. A detailed explanation of the imple- is out of the scope of this paper to discuss
mentation can be found in[11]. them. It should be noticed, though, that the
7
8. tool upon which we built our loop detection Each data tainting implementation can choose
algorithm, BinNavi[12], implements Lengauer- the type of markings to use, more precisely it
Tarjan[13] dominator tree algorithm which is al- is possible to determine the granularity of those
most linear thus granting us a higher computa- markings.
tional speed. Dytan is able to either assign a single marking to
The second step is to calculate for each node its each piece of input or have byte-level markings.
dominators. In Figure 4(c) the dominators of We chose to use the second type of markings as
the node in red are the ones in green. it is more precise but at the same time does not
The last step is to search for edges from a node cause an excessive overhead during the execu-
to one of its dominators. Recalling the definition tion.
of domination it is trivial to show that if there In order to make data tainting work it is impor-
is an edge from a node to one of its dominators tant to define what data needs to be tracked. In
a loop is present. Dytan it is possible to track user input coming
Most complex assembly instruction sets have from network operations, files access and com-
what are called implicit loops instruction, for in- mand line arguments passed to the main() func-
stance rep movs in x86 ISA. Applying this al- tion. That is system calls and functions respon-
gorithm to a flowgraph will therefore miss this sible for the aforementioned input sources are
type of loops. monitored and their output is tracked through
In order to overcome this problem we will trans- the binary.
late the function to an intermediate language Another important factor to take into account
called REIL[14] implemented in BinNavi. This while implementing a data tainting tool is a
intermediate language provides a very small set propagation policy.
of instructions which helps in the process of un- A propagation policy is a set of rules followed
folding implicit loops. while taint markings are assigned during pro-
In [15] a detailed implementation of this algo- gram execution.
rithm can be found. Dytan currently is able to perform control and
data-flow or data-flow only analysis. The former
4.2 Data tainting tracks direct or transitive data assignments as
well as indirect propagation due to control flow
As stated before the author did not implement dependencies upon user input. The latter in-
the data tainting framework employed by the stead can only track direct and transitive data
fuzzer, nonetheless given the critical importance assignments. In our fuzzer we use the second
of data tainting for this project the author thinks approach as control flow analysis does not add
it is important to briefly describe how dytan any useful information on data locations to be
works and how we use this framework for our tested.
purposes. Another problem to tackle while creating a prop-
We previously mentioned that data tainting is a agation policy is how to deal with multiple mark-
technique to track user input inside a binary. ings assigned to the same input. Dytan currently
Tracking is usually performed by assigning mark- assigns to the resulting taint marking the union
ings to data while executing the binary. of all the taint markings related to it. Although
8
9. for our fuzzer a different approach might grant At point 1 we search for the address of the func-
better results we currently use the default dytan tion we are interested in fuzzing and install the
policy. analysis function for that function. At point 2
Finally we make dytan provide information on we iterate through function instructions locating
every instruction that assigns taint markings. the ones that are of interest in order to install an
That is for each of those instructions we obtain analysis function as described in 3.
the state of taint markings on machine registers The first approach consists of mutating memory
and on memory locations that are tainted at that locations and registers in place. That is instead
specific program point. of allocating new memory and pointing instruc-
tions operands to it we modify the content of
both memory locations and registers within their
4.3 In-memory fuzzing
length boundaries.
We presented in section 3 the two known ap- We then continue the program execution until
proaches to in-memory fuzzing. In this section the program quits or new data is obtained from
we are going to present two slightly different ap- a tainted source.
proaches which we believe to gain better results This approach is more conservative than all the
given the amount of information we can gather others as it does not change the memory layout
from data tainting analysis. thus the number of false positives is reduced but
We implemented our in-memory fuzzer on top at the expenses of an increased number of false
of PIN[16]. PIN has the ability to add instru- negatives.
mentation functions before and after a binary is The second approach works very similar to SRM
loaded in memory, functions and instructions. 1(b). In addition to the first three steps we also
Recalling that for each instruction that assigns add an instrumentation functions at the end of
taint markings we retrieve from data tainting the tested function. This function will be re-
analysis, we get the markings associated to ma- sponsible of restoring memory after fuzzing was
chine registers and memory locations, we are performed. With the second approach the mem-
able to precisely identify program points during ory layout is changed as the fuzzer will allocate
binary execution that are suitable for fuzzing. chunks of memory to be used during the fuzzing
For both approaches we perform a number of phase.
steps: As for the first approach the program execution
is continued until the application quits or new
1. Install an analysis function on image load- data is obtained from a tainted source.
ing. Although our second approach is similar to SRM
there are a few notable differences that have to
2. Install an analysis function before the func- be considered. First we do not take a full snap-
tion we are interesting in fuzzing is exe- shot of the process memory but we only track
cuted. modifications that occurred due to fuzzing dur-
ing the execution of the tested function. The
3. Install an analysis function before each in- second difference is that memory is not totally
struction that assigns taint markings. restored after the function was fuzzed, this can
9
10. allow us to reduce the number of false negatives The following must hold true:
since possible bugs caused by a faulty execution
of the function are not missed by restoring the C1 ≤ C + t (5)
full process memory.
The halting point is defined as:
It has to be noted that both approaches de-
scribed here although more effective cannot be C1 = C + t (6)
used without a proper amount of information
gathered by the means of data tainting analy- The code coverage score is calculated as fol-
sis or some similar techniques. lows:
Definition Let BBt be the totality of basic
4.4 Code coverage blocks in a binary, BBf the number of basic
blocks touched in a single execution.
The combination of code coverage with fuzz test-
The code coverage score is defined as:
ing has long been used in order to measure the
effectiveness of fuzzing. We implemented code BBf
coverage on the top of BinNavi debugging API. C= (7)
BBt
The choice of using BinNavi debugger serves a
double purpose, not only we are able to imple- A detailed implementation of code coverage
ment code coverage using lightwave breakpoints using BinNavi API can be found in [17].
which highly reduce execution overhead but we
are also able to monitor the execution for possi- 5 Results and future work
ble faults. We decided to implement code cov-
erage at basic blocks level, that is a breakpoint In this paper we have described a new approach
is set at the beginning of each basic block in the to fuzz testing which highly reduce instrumenta-
tested binary. We perform code coverage first tion costs thus resulting very useful when dealing
when the binary is executed with a known good with large proprietary applications.
sample, later it is calculated again every time We have also shown how it is possible to com-
the program is fuzzed. We require the fuzzing bine static and dynamic analysis techniques to
sample to perform at least as good as the known triage interesting functions from a security test-
good sample, we also set a threshold defining the ing point of view.
upper-bound after which the sample reaches the Finally we have proposed a new approach to in-
”halting point”. The ”halting point” is the point memory fuzzing which is more precise and less
where the fuzzing process is re-initialized with a prone to false negatives than previous known
new known good sample as shown in Figure 2. techniques.
Formally: We do not have enough data to determine
whether this approach has better results com-
Definition Let C be the code coverage score pared to other fuzzing techniques.
of a known good sample, C1 the code coverage The author believes that compared to other
score of a fuzzing sample, t a user supplied mutation-based and evolutionary-based method-
delta. ologies the one proposed in this paper will have
10
11. better results. In comparison to generation- first USENIX workshop on Offensive Tech-
based fuzzers our technique will have better re- nologies.
sults when dealing with complex software but
worse results when the software input is simple. [4] G. Hoglund: Runtime Decompilation: The
The main direction of future work will be focused GreyBox process for Exploiting Software,
on reducing false positives by employing con- Black Hat DC 2003
straint reasoners to determine whether a given [5] M. Sutton, A. Greene, P. Amini:
bug is reproducible with valid but unexpected Fuzzing:brute force vulnerability discovery.
input. Addison-Wesley.
Another important challenge is to implement
more static analysis metrics to triage functions [6] J.Clause, W. Li, A. Orso: Dytan:a generic
with a higher degree of precision. dynamic taint analysis framework, Proceed-
ings of the 2007 international symposium on
Software testing and analysis
Acknowledgments
[7] Valgrind: http://www.valgrind.org
The author would like to thank Thomas Dullien,
[8] Qemu: http://www.qemu.org
Dino Dai Zovi and Shauvik Roy Choudhary for
their suggestions and help while researching the [9] Temu:http://bitblaze.cs.berkeley.edu/temu.html
topic.
The author would also like to thank James [10] T. J. McCabe: A Complexity measure,
Clause and Alessandro Orso for having provided IEEE transactions on software engineering,
access to dytan source code and their help while vol. se-2, no.4, december 1976
testing and improving the original code base. [11] V. Iozzo: Scripting with BinNavi - Cyclo-
Finally we want to thank all the people who have matic Complexity
reviewed the paper.
[12] BinNavi: http://www.zynamics.com/binnavi.html
References [13] T. Lengauer and R. E. Tarjan: A fast algo-
rithm for nding dominators in a owgraph,
[1] B.P. Miller, L. Fredriksen, and B. So: ”An ACM Transactions on Programming Lan-
Empirical Study of the Reliability of UNIX guages and Systems
Utilities”, Communications of the ACM 33,
[14] T. Dullien, S. Porst: REIL: A platform-
12 (December 1990)
independent intermediate representation of
[2] Kan: Metrics and Models in Software disassembled code for static code analysis,
Quality Engineering. Addison-Wesley. pp. CanSecWest 2009
316317. [15] V. Iozzo: Finding interesting loops us-
ing(Mono)REIL
[3] W. Drewry, T. Ormandy: Flayer:exposing
application internals, Proceedings of the [16] PIN: http://www.pintool.org
11