The document discusses software birthmarks, which are characteristics of a program that uniquely identify it. A birthmark can be used to detect software theft by searching for the birthmark of a plaintiff program in a suspected program. Specifically, the document discusses heap graph-based birthmarks, which are generated from a program's runtime heap structure and object references. A subgraph of the heap graph forms the birthmark. Subgraph monomorphism is used to search for the birthmark in a suspected program's heap graph to detect copying of code. Heap graph-based birthmarks are robust against attacks like code obfuscation that aim to disguise stolen code.
This paper presents a machine learning approach to identify malicious URLs. The researchers use URL lexical features, JavaScript source code features, and payload size as inputs to an SVM classifier. They achieve an accuracy of 0.81 and an F1 score of 0.74 when combining all feature types. Future work could involve testing on more malicious URLs and incorporating additional JavaScript and network features to improve detection of evolving attacks. The goal is to develop a real-time system for classifying URLs on mobile devices.
This document discusses using machine learning to classify malware into families based on the DREBIN dataset. It covers:
1. Preprocessing the dataset, including integer encoding and one-hot encoding to convert categorical data to numeric form for modeling.
2. Addressing overfitting by splitting the data into training and test sets and using cross-validation.
3. Using classifiers like Random Forest and SVM with strategies like one-vs-all and one-vs-one to perform multiclass classification of malware families.
4. The process of using binary classifiers for each family first, then combining the results to classify malware into the appropriate family.
This document describes CrowdSource, a system that uses natural language processing to infer high-level malware capabilities based on low-level strings extracted from malware binaries. It trains a machine learning model on millions of technical documents from StackExchange to map low-level strings to high-level capabilities. The system was evaluated on 1,457 malware samples and shown to detect 14 capabilities with an average F1-score of 0.86 and can analyze tens of thousands of samples per day.
We present a novel code search approach for answering queries focused on
API-usage with code showing how the API should be used.
To construct a search index, we develop new techniques for statically mining
and consolidating temporal API specifications from code snippets. In
contrast to existing semantic-based techniques, our approach handles partial
programs in the form of code snippets. Handling snippets allows us to consume
code from various sources such as parts of open source projects,
educational resources (e.g. tutorials), and expert code sites. To handle code
snippets, our approach (i) extracts a possibly \emph{partial} temporal
specification from each snippet using a relatively precise static analysis
tracking a generalized notion of typestate, and (ii) consolidates the
partial temporal specifications, combining consistent partial information to
yield consolidated temporal specifications, each of which captures a full(er)
usage scenario.
To answer a search query, we define a notion of relaxed inclusion
matching a query against temporal specifications and their corresponding code
snippets.
We have implemented our approach in a tool called PRIME and applied it to
search for API usage of several challenging APIs. PRIME was able to analyze
and consolidate thousands of snippets per tested API, and our results
indicate that the combination of a relatively precise analysis and
consolidation allowed PRIME to answer challenging queries effectively.
The document proposes a framework called TOB (Thread-Oblivious dynamic Birthmark) that can revive existing dynamic birthmark techniques like SCSSB, DYKIS, and JB to handle plagiarism detection for multithreaded programs. Existing birthmark techniques become ineffective for multithreaded programs due to non-determinism from thread scheduling. The TOB framework applies the var-gram algorithm to generate birthmarks that are oblivious to thread scheduling. Experiments on 418 versions of 35 multithreaded programs show the TOB-based tools are effective at detecting plagiarism and resilient to code obfuscation techniques. The framework addresses challenges in applying dynamic birthmarks to detect whole program plagiarism of
The document describes using a VGG model for image classification of Venice boat types from the MarDCT dataset. It discusses:
1. Using the VGG16 and VGG19 pre-trained models from Keras to extract features from images in the MarDCT training and test sets.
2. Training linear SVM and Random Forest classifiers on the extracted features to classify images into 24 boat types.
3. Evaluating the classifiers using techniques like k-fold cross-validation, and calculating accuracy, precision, recall, and F1 scores.
The document discusses techniques for analyzing unstructured text data from software repositories. It describes using textual analysis on code identifiers, comments, commit messages, issue trackers, emails, and forums to perform tasks like traceability link recovery, feature location, clone detection, and bug prediction. Different techniques are discussed, including pattern matching, island parsers, information retrieval methods, and natural language parsing. Choosing the right technique depends on the type of unstructured data and needs of the analysis.
‘CodeAliker’ - Plagiarism Detection on the Cloud acijjournal
Plagiarism is a burning problem that academics have been facing in all of the varied levels of the educational system. With the advent of digital content, the challenge to ensure the integrity of academic work has been amplified. This paper discusses on defining a precise definition of plagiarized computer code, various solutions available for detecting plagiarism and building a cloud platform for plagiarism disclosure.
‘CodeAliker’, our application thus developed automates the submission of assignments and the review process associated for essay text as well as computer code. It has been made available under the GNU’s General Public License as a Free and Open Source Software.
This paper presents a machine learning approach to identify malicious URLs. The researchers use URL lexical features, JavaScript source code features, and payload size as inputs to an SVM classifier. They achieve an accuracy of 0.81 and an F1 score of 0.74 when combining all feature types. Future work could involve testing on more malicious URLs and incorporating additional JavaScript and network features to improve detection of evolving attacks. The goal is to develop a real-time system for classifying URLs on mobile devices.
This document discusses using machine learning to classify malware into families based on the DREBIN dataset. It covers:
1. Preprocessing the dataset, including integer encoding and one-hot encoding to convert categorical data to numeric form for modeling.
2. Addressing overfitting by splitting the data into training and test sets and using cross-validation.
3. Using classifiers like Random Forest and SVM with strategies like one-vs-all and one-vs-one to perform multiclass classification of malware families.
4. The process of using binary classifiers for each family first, then combining the results to classify malware into the appropriate family.
This document describes CrowdSource, a system that uses natural language processing to infer high-level malware capabilities based on low-level strings extracted from malware binaries. It trains a machine learning model on millions of technical documents from StackExchange to map low-level strings to high-level capabilities. The system was evaluated on 1,457 malware samples and shown to detect 14 capabilities with an average F1-score of 0.86 and can analyze tens of thousands of samples per day.
We present a novel code search approach for answering queries focused on
API-usage with code showing how the API should be used.
To construct a search index, we develop new techniques for statically mining
and consolidating temporal API specifications from code snippets. In
contrast to existing semantic-based techniques, our approach handles partial
programs in the form of code snippets. Handling snippets allows us to consume
code from various sources such as parts of open source projects,
educational resources (e.g. tutorials), and expert code sites. To handle code
snippets, our approach (i) extracts a possibly \emph{partial} temporal
specification from each snippet using a relatively precise static analysis
tracking a generalized notion of typestate, and (ii) consolidates the
partial temporal specifications, combining consistent partial information to
yield consolidated temporal specifications, each of which captures a full(er)
usage scenario.
To answer a search query, we define a notion of relaxed inclusion
matching a query against temporal specifications and their corresponding code
snippets.
We have implemented our approach in a tool called PRIME and applied it to
search for API usage of several challenging APIs. PRIME was able to analyze
and consolidate thousands of snippets per tested API, and our results
indicate that the combination of a relatively precise analysis and
consolidation allowed PRIME to answer challenging queries effectively.
The document proposes a framework called TOB (Thread-Oblivious dynamic Birthmark) that can revive existing dynamic birthmark techniques like SCSSB, DYKIS, and JB to handle plagiarism detection for multithreaded programs. Existing birthmark techniques become ineffective for multithreaded programs due to non-determinism from thread scheduling. The TOB framework applies the var-gram algorithm to generate birthmarks that are oblivious to thread scheduling. Experiments on 418 versions of 35 multithreaded programs show the TOB-based tools are effective at detecting plagiarism and resilient to code obfuscation techniques. The framework addresses challenges in applying dynamic birthmarks to detect whole program plagiarism of
The document describes using a VGG model for image classification of Venice boat types from the MarDCT dataset. It discusses:
1. Using the VGG16 and VGG19 pre-trained models from Keras to extract features from images in the MarDCT training and test sets.
2. Training linear SVM and Random Forest classifiers on the extracted features to classify images into 24 boat types.
3. Evaluating the classifiers using techniques like k-fold cross-validation, and calculating accuracy, precision, recall, and F1 scores.
The document discusses techniques for analyzing unstructured text data from software repositories. It describes using textual analysis on code identifiers, comments, commit messages, issue trackers, emails, and forums to perform tasks like traceability link recovery, feature location, clone detection, and bug prediction. Different techniques are discussed, including pattern matching, island parsers, information retrieval methods, and natural language parsing. Choosing the right technique depends on the type of unstructured data and needs of the analysis.
‘CodeAliker’ - Plagiarism Detection on the Cloud acijjournal
Plagiarism is a burning problem that academics have been facing in all of the varied levels of the educational system. With the advent of digital content, the challenge to ensure the integrity of academic work has been amplified. This paper discusses on defining a precise definition of plagiarized computer code, various solutions available for detecting plagiarism and building a cloud platform for plagiarism disclosure.
‘CodeAliker’, our application thus developed automates the submission of assignments and the review process associated for essay text as well as computer code. It has been made available under the GNU’s General Public License as a Free and Open Source Software.
Architecture of a morphological malware detectorUltraUploader
This document proposes an architecture for a morphological malware detector that combines syntactic and semantic analysis. It builds an efficient signature matching engine using tree automata techniques to represent control flow graphs (CFG). It also describes a graph rewriting engine to handle common malware mutations. The detector extracts CFGs from malware binaries to generate signatures, which are compiled into a minimal automaton database for efficient matching. Experiments showed promising results with a low false positive rate.
In 2003 Dave et al. have coined the term “opinion mining” to refer to “processing a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good)”. Nine years later, in 2012 Brooks and Swigger have applied sentiment analysis in the context of software engineering. Today another nine years have passed and it is time to look back: what have we achieved as a research community and where should we go next?
To answer this question we conducted a systematic literature review involving 185 papers. Based on the literature review we present 1) well-defined categories of opinion mining-related software development activities, 2) available opinion mining approaches, whether they are evaluated when adopted in other studies, and how their performance is compared, 3) available datasets for performance evaluation and tool customization, and 4) concerns or limitations SE researchers might need to take into account when applying/customizing these opinion mining techniques. The results of our study serve as references to choose suitable opinion mining tools for SE tasks, and provide critical insights for the further development of opinion mining techniques in the SE domain.
This work has been done together with Bin Lin, Gabriele Bavota and Michele Lanza from Università della Svizzera italiana, Switzerland, Nathan Cassee from Eindhoven University of Technology, The Netherlands and Nicole Novielli from University of Bari, Italy.
Multi step automated refactoring for code smelleSAT Journals
Abstract
Brain MR Image can detect many abnormalities like tumor, cysts, bleeding, infection etc. Analysis of brain MRI using image
processing techniques has been an active research in the field of medical imaging. In this work, it is shown that MR image of brain
represent a multi fractal system which is described a continuous spectrum of exponents rather than a single exponent (fractal
dimension). Multi fractal analysis has been performed on number of images from OASIS database are analyzed. The properties of
multi fractal spectrum of a system have been exploited to prove the results. Multi fractal spectra are determined using the modified
box-counting method of fractal dimension estimation.
Keywords: Brain MR Image, Multi fractal, Box-counting
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Binary code obfuscation through c++ template meta programmingnong_dan
This document presents methods for obfuscating C++ code through the use of template metaprogramming. It introduces template metaprogramming in C++ and how it can be used to perform computations at compile time. It then describes how to add randomization to template metaprograms using a linear congruential pseudorandom number generator. The document outlines various techniques for obfuscating integer arithmetic operations, including through the use of identities from boolean logic and two's complement arithmetic. It also discusses the insertion of opaque predicates to increase program complexity and generation of dead code. Finally, it covers obfuscating data values through encryption and masking techniques.
Automatic reverse engineering of malware emulatorsUltraUploader
This document proposes techniques for automatically reverse engineering malware emulators. It presents an algorithm using dynamic analysis to execute emulated malware, record the x86 instruction trace, and use data flow and taint analysis to identify the bytecode program and extract syntactic and semantic information about the bytecode instruction set. The authors implemented a proof-of-concept system called Rotalumé, which accurately revealed the syntax and semantics of emulated instruction sets for programs obfuscated by VMProtect and Code Virtualizer.
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...IRJET Journal
This document discusses various techniques for detecting trolls using artificial intelligence and machine learning. It first reviews related work on sentiment analysis, supervised machine learning for troll detection, real-time sentiment analysis, and analyzing vulnerabilities in social networks. It then analyzes the limitations of current troll detection systems and how AI/ML solutions can help overcome these. The literature survey covers key approaches used for troll detection, including sentiment analysis, supervised learning models, and analyzing post vulnerabilities.
This document summarizes a research paper on a new automatic mechanism called Packet-Deployment payload (P-DPL) for extracting signatures from malware files to detect unknown malware. P-DPL aims to generate signatures comprised of multiple byte-strings that can be used by high-speed network devices to filter malware, while minimizing false positives. It works by removing common code segments from executables to isolate the malware's unique code for signature generation. The researchers believe P-DPL can help detect zero-day malware more quickly until security software is updated with the new signature.
Thrift is a software library and code generation tool developed at Facebook to facilitate the development of efficient and scalable backend services across multiple programming languages. It defines a common data representation and service interface definition that can be used to generate code for easily building RPC clients and servers. The key components that Thrift addresses are common data types, an abstracted transport layer, a flexible protocol, and support for backwards compatible data evolution through versioning.
An automated approach to fix buffer overflows IJECEIAES
Buffer overflows are one of the most common software vulnerabilities that occur when more data is inserted into a buffer than it can hold. Various manual and automated techniques for detecting and fixing specific types of buffer overflow vulnerability have been proposed, but the solution to fix Unicode buffer overflow has not been proposed yet. Public security vulnerability repository e.g., Common Weakness Enumeration (CWE) holds useful articles about software security vulnerabilities. Mitigation strategies listed in CWE may be useful for fixing the specified software security vulnerabilities. This research contributes by developing a prototype that automatically fixes different types of buffer overflows by using the strategies suggested in CWE articles and existing research. A static analysis tool has been used to evaluate the performance of the developed prototype tools. The results suggest that the proposed approach can automatically fix buffer overflows without inducing errors.
In the software development life cycle (SDLC), testing is an important step to reveal and fix the vulnerabilities and flaws in the software. Testing commercial off-the-shelf applications for security has never been easy, and this is exacerbated when their source code is not accessible. Without access to source code, binary executables of such applications are employed for testing. Binary analysis is commonly used to analyze on the binary executable of an application to discover vulnerabilities. Various means, such as symbolic execution, concolic execution, taint analysis, can be used in binary analysis to help collect control flow information, execution path information, etc. This paper presents the basics of the symbolic execution approach and studies the common tools which utilize symbolic execution in them. With the review, we identified that there are a number of challenges that are associated with the symbolic values fed to the programs as well as the performance and space consumption of the tools. Different tools approached the challenges in different ways, therefore the strengths and weaknesses of each tool are summarized in a table to make it available to interested researchers.
ava and Python are two of the most popular and powerful programming languages of present time. Both
of them are Object-Oriented programming languages with unique advantages for developers and end
users. Given the features of Python and how it is related to emerging fields in computer science such as
Internet of Things, Python is considered a strong candidate of becoming the main programming language
for academia and industry in the near future. In this paper, we develop JPT, which is a translator that
converts Java code into Python. Our desktop application takes Java code as an input and translates it to
Python code using XML as an intermediate language. The translator enables this conversion instead of
having to rewrite the whole Python program from start. We address a number of cases where the
translation process is challenging and highlight cases where manual inspection is recommended.
"An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021. Sebastiano Panichella
The document describes an NLP-based tool called NEON that automatically identifies natural language rules for analyzing informal software artifacts. NEON trains on a set of artifacts to infer rules for identifying recurrent patterns. It was evaluated on its ability to classify app reviews using rules identified from a training set. Over a third of NEON's recommended rules were judged useful by human evaluators for classifying reviews into feature requests and problem discoveries. Future work involves using NEON to develop recommender systems for various software engineering tasks.
IRJET - Event Notifier on Scraped Mails using NLPIRJET Journal
This document describes a system that uses natural language processing and APIs to automatically add events from a user's emails to their Google Calendar. It scrapes a user's Gmail inbox using the Gmail API to find emails containing event details. Natural language processing techniques are used to extract information like the date, time, location and event name from the email text. The extracted details are then used to automatically create calendar reminders by integrating with the Google Calendar API. This allows users to have all their email-based events in one place on their calendar without having to manually add them. The system was implemented and tested on sample emails, successfully extracting event details and adding reminders to the calendar.
Software Birthmark Based Theft/Similarity Comparisons of JavaScript ProgramsSwati Patel
A birthmark is a set of characteristic possessed by a program that uniquely recognizes a program. Birthmark of the software is based on Heap Graph. It is generated by using Google Chrome Developer Tools when the program is in execution. Software’s behavioural structure is demonstrated in the heap graph. It describes how the objects are related to each other to deliver the desired functionality of the website. Our aim is to develop and evaluate a system that can find theft/similarity between websites by using Agglomerative Clustering and Improved Frequent Subgraph Mining. To identify if a website is using the original program’s code or its module, birthmark of the original program is explored in the suspected program’s heap graph.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Automated server-side model for recognition of security vulnerabilities in sc...IJECEIAES
With the increase of global accessibility of web applications, maintaining a reasonable security level for both user data and server resources has become an extremely challenging issue. Therefore, static code analysis systems can help web developers to reduce time and cost. In this paper, a new static analysis model is proposed. This model is designed to discover the security problems in scripting languages. The proposed model is implemented in a prototype SCAT, which is a static code analysis tool. SCAT applies the phases of the proposed model to catch security vulnerabilities in PHP 5.3. Empirical results attest that the proposed prototype is feasible and is able to contribute to the security of real-world web applications. SCAT managed to detect 94% of security vulnerabilities found in the testing benchmarks; this clearly indicates that the proposed model is able to provide an effective solution to complicated web systems by offering benefits of securing private data for users and maintaining web application stability for web applications providers.
Malware detection and pattern classification using NPLIRJET Journal
This document discusses methods for detecting malware and classifying patterns using natural language processing (NLP). It describes how artificial intelligence techniques can be used to analyze URLs and classify them as malicious or benign based on static and dynamic features. It also discusses using malware blacklists to recognize known malicious URLs but notes the limitations since new URLs are constantly being created. The document provides an overview of malware detection and analyzes state-of-the-art machine learning methods that have been used for identifying malicious URLs.
This document proposes a technique for detecting software theft of JavaScript programs based on dynamic birthmarks extracted from the runtime heap graph. It involves extracting a birthmark from the merged runtime heap graph of an original program using frequent subgraph mining. The same birthmark is then searched for in the heap graph of a suspected stolen program. If found, it indicates the suspected program is a copy. The document outlines the proposed algorithm involving graph generation from heap snapshots, filtering, merging, frequent subgraph selection as the birthmark, and detection. It provides results of implementing the first three modules on sample Google.com heap data, showing generated, filtered and merged graphs, as well as a selected frequent subgraph birthmark.
“Software Theft Detection for JavaScript programs based on dynamic birthmark ...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
The document describes a proposed system called Code-a-Maze that aims to obfuscate source code through various transformations to deter software reverse engineering. Code-a-Maze works by taking source code as input and applying different obfuscation techniques depending on the path taken through an abstract "maze" represented by the code. The transformations include pointless allocation and deallocation of memory, insertion of dummy method calls, addition of bogus code and trampoline functions, flipping of conditional branches, and modification of variable and function names. The goal is to complicate the code and transfer control flow in unpredictable ways, making it difficult for attackers to analyze the code without affecting its functionality or performance.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Architecture of a morphological malware detectorUltraUploader
This document proposes an architecture for a morphological malware detector that combines syntactic and semantic analysis. It builds an efficient signature matching engine using tree automata techniques to represent control flow graphs (CFG). It also describes a graph rewriting engine to handle common malware mutations. The detector extracts CFGs from malware binaries to generate signatures, which are compiled into a minimal automaton database for efficient matching. Experiments showed promising results with a low false positive rate.
In 2003 Dave et al. have coined the term “opinion mining” to refer to “processing a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good)”. Nine years later, in 2012 Brooks and Swigger have applied sentiment analysis in the context of software engineering. Today another nine years have passed and it is time to look back: what have we achieved as a research community and where should we go next?
To answer this question we conducted a systematic literature review involving 185 papers. Based on the literature review we present 1) well-defined categories of opinion mining-related software development activities, 2) available opinion mining approaches, whether they are evaluated when adopted in other studies, and how their performance is compared, 3) available datasets for performance evaluation and tool customization, and 4) concerns or limitations SE researchers might need to take into account when applying/customizing these opinion mining techniques. The results of our study serve as references to choose suitable opinion mining tools for SE tasks, and provide critical insights for the further development of opinion mining techniques in the SE domain.
This work has been done together with Bin Lin, Gabriele Bavota and Michele Lanza from Università della Svizzera italiana, Switzerland, Nathan Cassee from Eindhoven University of Technology, The Netherlands and Nicole Novielli from University of Bari, Italy.
Multi step automated refactoring for code smelleSAT Journals
Abstract
Brain MR Image can detect many abnormalities like tumor, cysts, bleeding, infection etc. Analysis of brain MRI using image
processing techniques has been an active research in the field of medical imaging. In this work, it is shown that MR image of brain
represent a multi fractal system which is described a continuous spectrum of exponents rather than a single exponent (fractal
dimension). Multi fractal analysis has been performed on number of images from OASIS database are analyzed. The properties of
multi fractal spectrum of a system have been exploited to prove the results. Multi fractal spectra are determined using the modified
box-counting method of fractal dimension estimation.
Keywords: Brain MR Image, Multi fractal, Box-counting
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Binary code obfuscation through c++ template meta programmingnong_dan
This document presents methods for obfuscating C++ code through the use of template metaprogramming. It introduces template metaprogramming in C++ and how it can be used to perform computations at compile time. It then describes how to add randomization to template metaprograms using a linear congruential pseudorandom number generator. The document outlines various techniques for obfuscating integer arithmetic operations, including through the use of identities from boolean logic and two's complement arithmetic. It also discusses the insertion of opaque predicates to increase program complexity and generation of dead code. Finally, it covers obfuscating data values through encryption and masking techniques.
Automatic reverse engineering of malware emulatorsUltraUploader
This document proposes techniques for automatically reverse engineering malware emulators. It presents an algorithm using dynamic analysis to execute emulated malware, record the x86 instruction trace, and use data flow and taint analysis to identify the bytecode program and extract syntactic and semantic information about the bytecode instruction set. The authors implemented a proof-of-concept system called Rotalumé, which accurately revealed the syntax and semantics of emulated instruction sets for programs obfuscated by VMProtect and Code Virtualizer.
IRJET- An Effective Analysis of Anti Troll System using Artificial Intell...IRJET Journal
This document discusses various techniques for detecting trolls using artificial intelligence and machine learning. It first reviews related work on sentiment analysis, supervised machine learning for troll detection, real-time sentiment analysis, and analyzing vulnerabilities in social networks. It then analyzes the limitations of current troll detection systems and how AI/ML solutions can help overcome these. The literature survey covers key approaches used for troll detection, including sentiment analysis, supervised learning models, and analyzing post vulnerabilities.
This document summarizes a research paper on a new automatic mechanism called Packet-Deployment payload (P-DPL) for extracting signatures from malware files to detect unknown malware. P-DPL aims to generate signatures comprised of multiple byte-strings that can be used by high-speed network devices to filter malware, while minimizing false positives. It works by removing common code segments from executables to isolate the malware's unique code for signature generation. The researchers believe P-DPL can help detect zero-day malware more quickly until security software is updated with the new signature.
Thrift is a software library and code generation tool developed at Facebook to facilitate the development of efficient and scalable backend services across multiple programming languages. It defines a common data representation and service interface definition that can be used to generate code for easily building RPC clients and servers. The key components that Thrift addresses are common data types, an abstracted transport layer, a flexible protocol, and support for backwards compatible data evolution through versioning.
An automated approach to fix buffer overflows IJECEIAES
Buffer overflows are one of the most common software vulnerabilities that occur when more data is inserted into a buffer than it can hold. Various manual and automated techniques for detecting and fixing specific types of buffer overflow vulnerability have been proposed, but the solution to fix Unicode buffer overflow has not been proposed yet. Public security vulnerability repository e.g., Common Weakness Enumeration (CWE) holds useful articles about software security vulnerabilities. Mitigation strategies listed in CWE may be useful for fixing the specified software security vulnerabilities. This research contributes by developing a prototype that automatically fixes different types of buffer overflows by using the strategies suggested in CWE articles and existing research. A static analysis tool has been used to evaluate the performance of the developed prototype tools. The results suggest that the proposed approach can automatically fix buffer overflows without inducing errors.
In the software development life cycle (SDLC), testing is an important step to reveal and fix the vulnerabilities and flaws in the software. Testing commercial off-the-shelf applications for security has never been easy, and this is exacerbated when their source code is not accessible. Without access to source code, binary executables of such applications are employed for testing. Binary analysis is commonly used to analyze on the binary executable of an application to discover vulnerabilities. Various means, such as symbolic execution, concolic execution, taint analysis, can be used in binary analysis to help collect control flow information, execution path information, etc. This paper presents the basics of the symbolic execution approach and studies the common tools which utilize symbolic execution in them. With the review, we identified that there are a number of challenges that are associated with the symbolic values fed to the programs as well as the performance and space consumption of the tools. Different tools approached the challenges in different ways, therefore the strengths and weaknesses of each tool are summarized in a table to make it available to interested researchers.
ava and Python are two of the most popular and powerful programming languages of present time. Both
of them are Object-Oriented programming languages with unique advantages for developers and end
users. Given the features of Python and how it is related to emerging fields in computer science such as
Internet of Things, Python is considered a strong candidate of becoming the main programming language
for academia and industry in the near future. In this paper, we develop JPT, which is a translator that
converts Java code into Python. Our desktop application takes Java code as an input and translates it to
Python code using XML as an intermediate language. The translator enables this conversion instead of
having to rewrite the whole Python program from start. We address a number of cases where the
translation process is challenging and highlight cases where manual inspection is recommended.
"An NLP-based Tool for Software Artifacts Analysis" at @ICSME2021. Sebastiano Panichella
The document describes an NLP-based tool called NEON that automatically identifies natural language rules for analyzing informal software artifacts. NEON trains on a set of artifacts to infer rules for identifying recurrent patterns. It was evaluated on its ability to classify app reviews using rules identified from a training set. Over a third of NEON's recommended rules were judged useful by human evaluators for classifying reviews into feature requests and problem discoveries. Future work involves using NEON to develop recommender systems for various software engineering tasks.
IRJET - Event Notifier on Scraped Mails using NLPIRJET Journal
This document describes a system that uses natural language processing and APIs to automatically add events from a user's emails to their Google Calendar. It scrapes a user's Gmail inbox using the Gmail API to find emails containing event details. Natural language processing techniques are used to extract information like the date, time, location and event name from the email text. The extracted details are then used to automatically create calendar reminders by integrating with the Google Calendar API. This allows users to have all their email-based events in one place on their calendar without having to manually add them. The system was implemented and tested on sample emails, successfully extracting event details and adding reminders to the calendar.
Software Birthmark Based Theft/Similarity Comparisons of JavaScript ProgramsSwati Patel
A birthmark is a set of characteristic possessed by a program that uniquely recognizes a program. Birthmark of the software is based on Heap Graph. It is generated by using Google Chrome Developer Tools when the program is in execution. Software’s behavioural structure is demonstrated in the heap graph. It describes how the objects are related to each other to deliver the desired functionality of the website. Our aim is to develop and evaluate a system that can find theft/similarity between websites by using Agglomerative Clustering and Improved Frequent Subgraph Mining. To identify if a website is using the original program’s code or its module, birthmark of the original program is explored in the suspected program’s heap graph.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Automated server-side model for recognition of security vulnerabilities in sc...IJECEIAES
With the increase of global accessibility of web applications, maintaining a reasonable security level for both user data and server resources has become an extremely challenging issue. Therefore, static code analysis systems can help web developers to reduce time and cost. In this paper, a new static analysis model is proposed. This model is designed to discover the security problems in scripting languages. The proposed model is implemented in a prototype SCAT, which is a static code analysis tool. SCAT applies the phases of the proposed model to catch security vulnerabilities in PHP 5.3. Empirical results attest that the proposed prototype is feasible and is able to contribute to the security of real-world web applications. SCAT managed to detect 94% of security vulnerabilities found in the testing benchmarks; this clearly indicates that the proposed model is able to provide an effective solution to complicated web systems by offering benefits of securing private data for users and maintaining web application stability for web applications providers.
Malware detection and pattern classification using NPLIRJET Journal
This document discusses methods for detecting malware and classifying patterns using natural language processing (NLP). It describes how artificial intelligence techniques can be used to analyze URLs and classify them as malicious or benign based on static and dynamic features. It also discusses using malware blacklists to recognize known malicious URLs but notes the limitations since new URLs are constantly being created. The document provides an overview of malware detection and analyzes state-of-the-art machine learning methods that have been used for identifying malicious URLs.
This document proposes a technique for detecting software theft of JavaScript programs based on dynamic birthmarks extracted from the runtime heap graph. It involves extracting a birthmark from the merged runtime heap graph of an original program using frequent subgraph mining. The same birthmark is then searched for in the heap graph of a suspected stolen program. If found, it indicates the suspected program is a copy. The document outlines the proposed algorithm involving graph generation from heap snapshots, filtering, merging, frequent subgraph selection as the birthmark, and detection. It provides results of implementing the first three modules on sample Google.com heap data, showing generated, filtered and merged graphs, as well as a selected frequent subgraph birthmark.
“Software Theft Detection for JavaScript programs based on dynamic birthmark ...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
The document describes a proposed system called Code-a-Maze that aims to obfuscate source code through various transformations to deter software reverse engineering. Code-a-Maze works by taking source code as input and applying different obfuscation techniques depending on the path taken through an abstract "maze" represented by the code. The transformations include pointless allocation and deallocation of memory, insertion of dummy method calls, addition of bogus code and trampoline functions, flipping of conditional branches, and modification of variable and function names. The goal is to complicate the code and transfer control flow in unpredictable ways, making it difficult for attackers to analyze the code without affecting its functionality or performance.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Abstract A key element of any software is to protect the software from various attacks and in case if any theft has occurred then the developer must be able to prove their ownership. Software Watermarking is one of the technique which helps in proving the authentication of the developer. A lot of research was done in Software watermarking but all this work is discussing on various embedding techniques of a software watermark into a program and the software watermark which was used for embedding is a serial number or a unique identification of the developer. In this paper a new methodology to obtain the software watermark, which is known as Version Based Software Watermark, is given. The remaining part of the paper discusses on the importance of the Version Based Software Watermark, properties of VBSW and evaluation results of the VBSW. Keywords: Software Watermark, VBSW – Version Based Software Watermark, LOC, version number
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyDevOps.com
Graph databases offer security teams a new and more efficient way to find zero day vulnerabilities. As software development increases its reliance on open source libraries and release cycles get faster and faster application security is becoming more and more difficult. AppSec still has the same charter -- to find vulnerabilities in dev, before they reach prod, but now with more complexity and less time. Graphing source code, and traversing it to identify technical and business logic vulnerabilities, gives AppSec teams a much needed leg up identify zero days and stay ahead of attackers.
As numerous famous examples demonstrate, open source libraries are a common attack vector. Hence, AppSec teams must secure 3rd party dependencies just as vigorously as custom code. While much of the emphasis for securing open source libraries (OSS) has been on identifying and eliminating known CVEs, because OSS is widely used, zero-day vulnerabilities are often more likely to be found in popular OSS than custom code.
This webinar will cover the following:
An introduction to the emerging graph landscape and why it matters for AppSec
How a Fortune 500 company is using graphs to find zero days
Technical demo of finding technical and business logic vulnerabilities in source code
Online java compiler with security editorIRJET Journal
This document describes an online Java compiler with a security editor. The system allows users to write, compile, and debug Java programs online without needing to install a Java development kit locally. The system also includes a security editor that can encrypt and decrypt files using the MD5 algorithm. The goals of the project are to make Java programming more accessible and provide security for files. It uses a client-server architecture where the server runs the Java compiler and encryption/decryption and the client can access these features through a web interface.
The document discusses an improved method for storing feature vectors to detect Android malware. It proposes using a compressed row storage format to efficiently store the statistical features that represent malware families. This involves storing only the non-zero elements of sparse feature matrices in three vectors, which reduces storage needs by 79% compared to conventional methods. This improved storage technique leads to reduced processing time for feature vector generation and malware detection overall. The proposed method aims to enhance Android malware analysis by making feature vector searches and classification faster.
This document discusses the importance of mobile application security and penetration testing. It describes penetration testing as discovering vulnerabilities before attackers through vulnerability detection, comprehensive penetration attempts, and analysis/reporting. The document outlines static and dynamic analysis methods used for Android application security assessments. These include code review, function hooking, runtime debugging, and analyzing data at rest and in transit. It promotes understanding how applications work through reverse engineering, decompilation, and deobfuscation. The methodology uses tools like MARA, MobSF, Xposed, Frida, and BurpSuite.
A Tool to Detect Plagiarism in Java Source Code.pdfKayla Smith
This paper proposes a tool to detect plagiarism in Java source code. It first normalizes the input and original codes by removing whitespace, comments, keywords, operators and standardizing identifiers. It then uses the Levenshtein distance algorithm to calculate the distance between the normalized codes. Based on this distance and the code lengths, it calculates a plagiarism percentage. The tool was tested on sample code pairs, finding lower plagiarism percentages than existing tools. It is concluded to be more suitable for detecting plagiarism in Java codes.
Knowledge and Data Engineering IEEE 2015 ProjectsVijay Karan
List of Knowledge and Data Engineering IEEE 2015 Projects. It Contains the IEEE Projects in the Domain Knowledge and Data Engineering for the year 2015
This document discusses code obfuscation techniques for protecting software from reverse engineering. It begins with an abstract discussing the use of code obfuscation to protect proprietary algorithms and keys from extraction during reverse engineering. It then provides definitions of code obfuscation and discusses classifications of obfuscation techniques including layout, data, control, and preventive obfuscations. The document surveys various code obfuscation techniques from literature and evaluates them based on criteria like potency, resilience, cost, and resistance to static and dynamic attacks. It concludes with a discussion of empirical evaluation of obfuscation techniques.
This document discusses applying privacy preserving data mining techniques to code profiling data. Code profiling generates metrics about software attributes and performance. The author applies encryption to code profiling data from 140 Java codes to preserve privacy. K-means clustering and k-NN classification are performed on the actual and encrypted data, showing similar results while preserving privacy. Correlation analysis identifies weakly correlated attributes that are removed to improve clustering accuracy, though this decreases classifier accuracy. The paper concludes privacy preserving data mining of code profiling data is an emerging area that could benefit from additional encryption and classification techniques.
MACHINE LEARNING APPROACH TO LEARN AND DETECT MALWARE IN ANDROIDIRJET Journal
This document discusses machine learning approaches for detecting malware in Android apps. It first classifies different types of malware like viruses, trojans, worms, spyware, adware, and ransomware. It then discusses important features for malware detection like n-grams, opcodes, strings, memory access, and API calls. The document reviews several papers on machine learning techniques for Android malware detection using methods like random forest, SVM, decision trees, and evaluating accuracy and efficiency. It proposes using ANN and SVM models to identify malicious and benign apps and providing a category-based machine learning approach to improve detection accuracy.
Similar to Software Birthmark for Theft Detection of JavaScript Programs: A Survey (20)
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
The CBC machine is a common diagnostic tool used by doctors to measure a patient's red blood cell count, white blood cell count and platelet count. The machine uses a small sample of the patient's blood, which is then placed into special tubes and analyzed. The results of the analysis are then displayed on a screen for the doctor to review. The CBC machine is an important tool for diagnosing various conditions, such as anemia, infection and leukemia. It can also help to monitor a patient's response to treatment.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.