Feature extraction and feature selection are the first tasks in pre-processing of input logs in order to detect cyber security threats and attacks while utilizing machine learning. When it comes to the analysis of heterogeneous data derived from different sources, these tasks are found to be time-consuming and difficult to be managed efficiently. In this paper, we present an approach for handling feature extraction and feature selection for security analytics of heterogeneous data derived from different network sensors. The approach is implemented in Apache Spark, using its python API, named pyspark.
A predictive model for network intrusion detection using stacking approach IJECEIAES
This document presents a predictive stacking model for network intrusion detection using two machine learning datasets - UNSW NB-15 and UGR'16. The stacking approach uses logistic regression, K nearest neighbor, decision tree, and random forest classifiers at level 0, with random forest as the meta classifier. Feature selection is performed on UNSW NB-15 to identify the most important 16 features out of 47 using permutation feature importance. Hyperparameters of the classifiers are tuned to improve performance. The stacking model is implemented using the Graphlab Create machine learning platform and evaluated on the two datasets, showing reasonably good results with fewer misclassifications of attack types.
Data Search in Cloud using the Encrypted KeywordsIRJET Journal
This document presents a proposed system for searching encrypted data stored in the cloud without decrypting it. The system would allow users to perform expressive boolean keyword searches using encrypted trapdoors. It aims to improve efficiency and security over existing methods by supporting boolean expressions, hiding keyword values from servers, and proving security under a formal model. The system design involves data owners encrypting documents and keywords before outsourcing to the cloud. Users generate trapdoors from a trusted center and send them to the cloud server to retrieve matching encrypted files. The goals are to enable expressive searching, efficiency, privacy of keyword values, and provable security.
A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...Nexgen Technology
bulk ieee projects in pondicherry,ieee projects in pondicherry,final year ieee projects in pondicherry
Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
IRJET- Proficient Recovery Over Records using Encryption in Cloud ComputingIRJET Journal
This document proposes a scheme for securely storing and retrieving encrypted documents in cloud computing based on attributes. It first designs a hierarchical attribute-based encryption scheme to encrypt document collections such that documents with shared attributes can be encrypted together efficiently. It then constructs an Attribute-based Retrieval Features (ARF) tree index structure based on document vectors incorporating term frequency-inverse document frequency and document attributes. A depth-first search algorithm is designed for efficient retrieval from the encrypted index. The scheme aims to allow fine-grained access control of documents while supporting accurate and efficient searches over the encrypted collection.
Survey on Privacy- Preserving Multi keyword Ranked Search over Encrypted Clou...Editor IJMTER
The advent of cloud computing, data owners are motivated to outsource their complex
data management systems from local sites to commercial public cloud for great flexibility and
economic savings. But for protecting data privacy, sensitive data has to be encrypted before
outsourcing.Considering the large number of data users and documents in cloud, it is crucial for
the search service to allow multi-keyword query and provide result similarity ranking to meet the
effective data retrieval need. Related works on searchable encryption focus on single keyword
search or Boolean keyword search, and rarely differentiate the search results. We first propose a
basic MRSE scheme using secure inner product computation, and then significantly improve it to
meet different privacy requirements in two levels of threat models. The Incremental High Utility
Pattern Transaction Frequency Tree (IHUPTF-Tree) is designed according to the transaction
frequency (descending order) of items to obtain a compact tree.
By using high utility pattern the items can be arranged in an efficient manner. Tree structure
is used to sort the items. Thus the items are sorted and frequent pattern is obtained. The frequent
pattern items are retrieved from the database by using hybrid tree (H-Tree) structure. So the
execution time becomes faster. Finally, the frequent pattern item that satisfies the threshold value
is displayed.
IRJET- An Approach for Implemented Secure Proxy Server for Multi-User Searcha...IRJET Journal
This document proposes a secure proxy server approach for multi-user searchable encryption without query transformation. The approach allows a proxy server to perform search operations on an encrypted database without decrypting user queries. It returns the top-k most relevant documents to user queries using Euclidean distance similarity. The proposed approach aims to improve search efficiency and accuracy compared to prior works, while maintaining security. It outlines the key concepts, notation, and structure of the proposed scheme.
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...LeMeniz Infotech
A secure and dynamic multi keyword ranked search scheme over encrypted cloud data
Do Your Projects With Technology Experts
To Get this projects Call : 9566355386 / 99625 88976
Visit : www.lemenizinfotech.com / www.ieeemaster.com
Mail : projects@lemenizinfotech.com
Blog : http://ieeeprojectspondicherry.weebly.com
Blog : http://www.ieeeprojectsinpondicherry.blogspot.in/
Youtube:https://www.youtube.com/watch?v=eesBNUnKvws
A predictive model for network intrusion detection using stacking approach IJECEIAES
This document presents a predictive stacking model for network intrusion detection using two machine learning datasets - UNSW NB-15 and UGR'16. The stacking approach uses logistic regression, K nearest neighbor, decision tree, and random forest classifiers at level 0, with random forest as the meta classifier. Feature selection is performed on UNSW NB-15 to identify the most important 16 features out of 47 using permutation feature importance. Hyperparameters of the classifiers are tuned to improve performance. The stacking model is implemented using the Graphlab Create machine learning platform and evaluated on the two datasets, showing reasonably good results with fewer misclassifications of attack types.
Data Search in Cloud using the Encrypted KeywordsIRJET Journal
This document presents a proposed system for searching encrypted data stored in the cloud without decrypting it. The system would allow users to perform expressive boolean keyword searches using encrypted trapdoors. It aims to improve efficiency and security over existing methods by supporting boolean expressions, hiding keyword values from servers, and proving security under a formal model. The system design involves data owners encrypting documents and keywords before outsourcing to the cloud. Users generate trapdoors from a trusted center and send them to the cloud server to retrieve matching encrypted files. The goals are to enable expressive searching, efficiency, privacy of keyword values, and provable security.
A SECURE AND DYNAMIC MULTI-KEYWORD RANKED SEARCH SCHEME OVER ENCRYPTED CLOUD...Nexgen Technology
bulk ieee projects in pondicherry,ieee projects in pondicherry,final year ieee projects in pondicherry
Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
IRJET- Proficient Recovery Over Records using Encryption in Cloud ComputingIRJET Journal
This document proposes a scheme for securely storing and retrieving encrypted documents in cloud computing based on attributes. It first designs a hierarchical attribute-based encryption scheme to encrypt document collections such that documents with shared attributes can be encrypted together efficiently. It then constructs an Attribute-based Retrieval Features (ARF) tree index structure based on document vectors incorporating term frequency-inverse document frequency and document attributes. A depth-first search algorithm is designed for efficient retrieval from the encrypted index. The scheme aims to allow fine-grained access control of documents while supporting accurate and efficient searches over the encrypted collection.
Survey on Privacy- Preserving Multi keyword Ranked Search over Encrypted Clou...Editor IJMTER
The advent of cloud computing, data owners are motivated to outsource their complex
data management systems from local sites to commercial public cloud for great flexibility and
economic savings. But for protecting data privacy, sensitive data has to be encrypted before
outsourcing.Considering the large number of data users and documents in cloud, it is crucial for
the search service to allow multi-keyword query and provide result similarity ranking to meet the
effective data retrieval need. Related works on searchable encryption focus on single keyword
search or Boolean keyword search, and rarely differentiate the search results. We first propose a
basic MRSE scheme using secure inner product computation, and then significantly improve it to
meet different privacy requirements in two levels of threat models. The Incremental High Utility
Pattern Transaction Frequency Tree (IHUPTF-Tree) is designed according to the transaction
frequency (descending order) of items to obtain a compact tree.
By using high utility pattern the items can be arranged in an efficient manner. Tree structure
is used to sort the items. Thus the items are sorted and frequent pattern is obtained. The frequent
pattern items are retrieved from the database by using hybrid tree (H-Tree) structure. So the
execution time becomes faster. Finally, the frequent pattern item that satisfies the threshold value
is displayed.
IRJET- An Approach for Implemented Secure Proxy Server for Multi-User Searcha...IRJET Journal
This document proposes a secure proxy server approach for multi-user searchable encryption without query transformation. The approach allows a proxy server to perform search operations on an encrypted database without decrypting user queries. It returns the top-k most relevant documents to user queries using Euclidean distance similarity. The proposed approach aims to improve search efficiency and accuracy compared to prior works, while maintaining security. It outlines the key concepts, notation, and structure of the proposed scheme.
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...LeMeniz Infotech
A secure and dynamic multi keyword ranked search scheme over encrypted cloud data
Do Your Projects With Technology Experts
To Get this projects Call : 9566355386 / 99625 88976
Visit : www.lemenizinfotech.com / www.ieeemaster.com
Mail : projects@lemenizinfotech.com
Blog : http://ieeeprojectspondicherry.weebly.com
Blog : http://www.ieeeprojectsinpondicherry.blogspot.in/
Youtube:https://www.youtube.com/watch?v=eesBNUnKvws
The document proposes a system for multi-keyword ranked search over encrypted cloud data while preserving privacy. It addresses limitations in previous systems that allowed single keyword search or did not consider privacy. The proposed system uses asymmetric key encryption, a block-max index, and dynamic key generation to allow efficient retrieval of relevant encrypted data from the cloud without security breaches. It involves three parts: (1) a server that encrypts and stores data in the cloud and sends decryption keys; (2) a cloud server that handles search requests, ranks results, and responds; and (3) users that request data from the cloud server.
Efficient Similarity Search Over Encrypted DataIRJET Journal
1) The document discusses efficient similarity search over encrypted data stored in the cloud. It proposes using Locality Sensitive Hashing (LSH) to enable fast similarity searches of encrypted data without decrypting it first.
2) When a user uploads data, features are extracted and hashed using LSH to group similar documents into buckets. When performing a search, the user's query is hashed to identify matching buckets. Matches are identified by finding correlations between stored documents and the query.
3) The method allows similarity searches of encrypted cloud data efficiently by indexing and hashing documents during upload and generating query hashes to match documents during search, without decrypting the actual data. This addresses privacy and security issues of sensitive data stored
IRJET- Data Security in Cloud Computing using Cryptographic AlgorithmsIRJET Journal
This document discusses data security in cloud computing using cryptographic algorithms. It begins by introducing cloud computing and cryptography. Cryptography is used to securely store and transmit data in the cloud since the data is no longer under the user's direct control. The document then discusses how AES (Advanced Encryption Standard) can be used to encrypt data for secure storage and transmission in cloud computing. It provides an overview of the AES algorithm, including the encryption process which involves sub-processes like byte substitution, shift rows, mix columns and adding round keys over multiple rounds. The document also provides pseudocode for the AES encryption process and discusses how AES encryption provides stronger security than other algorithms like DES.
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...CloudTechnologies
We are the company providing Complete Solution for all Academic Final Year/Semester Student Projects. Our projects are suitable for B.E (CSE,IT,ECE,EEE), B.Tech (CSE,IT,ECE,EEE),M.Tech (CSE,IT,ECE,EEE) B.sc (IT & CSE), M.sc (IT & CSE), MCA, and many more..... We are specialized on Java,Dot Net ,PHP & Andirod technologies. Each Project listed comes with the following deliverable: 1. Project Abstract 2. Complete functional code 3. Complete Project report with diagrams 4. Database 5. Screen-shots 6. Video File
SERVICE AT CLOUDTECHNOLOGIES
IEEE, WEB, WINDOWS PROJECTS ON DOT NET, JAVA& ANDROID TECHNOLOGIES,EMBEDDED SYSTEMS,MAT LAB,VLSI DESIGN.
ME, M-TECH PAPER PUBLISHING
COLLEGE TRAINING
Thanks&Regards
cloudtechnologies
# 304, Siri Towers,Behind Prime Hospitals
Maitrivanam, Ameerpet.
Contact:-8121953811,8522991105.040-65511811
cloudtechnologiesprojects@gmail.com
http://cloudstechnologies.in/
Efficient Privacy Preserving Clustering Based Multi Keyword Search IRJET Journal
This document proposes an efficient privacy-preserving clustering-based multi-keyword search system. It uses hierarchical clustering to generate clusters of encrypted documents in the cloud. The system aims to improve search efficiency while maintaining security. It utilizes EM clustering, SHA-1 hashing for deduplication, and a user revocation method. Experimental results show the framework has advantages such as efficient memory and time utilization, secure search over encrypted data, secure data storage, and deduplication.
IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...IRJET Journal
This document discusses implementing a threshold-based cryptographic technique for data and key storage security over cloud computing. It proposes a system that encrypts data stored on the cloud to prevent unauthorized access and data attacks by the cloud service provider. The system uses a threshold-based cryptographic approach that distributes encryption keys among multiple users, requiring a threshold number of keys to decrypt the data. This prevents collusion attacks and ensures data remains secure even if some user keys are compromised. The implementation results show the system can effectively secure data on the cloud and protect legitimate users from cheating or attacks from the cloud service provider or other users.
A data estimation for failing nodes using fuzzy logic with integrated microco...IJECEIAES
Continuous data transmission in wireless sensor networks (WSNs) is one of the most important characteristics which makes sensors prone to failure. A backup strategy needs to co-exist with the infrastructure of the network to assure that no data is missing. The proposed system relies on a backup strategy of building a history file that stores all collected data from these nodes. This file is used later on by fuzzy logic to estimate missing data in case of failure. An easily programmable microcontroller unit is equipped with a data storage mechanism used as cost worthy storage media for these data. An error in estimation is calculated constantly and used for updating a reference “optimal table” that is used in the estimation of missing data. The error values also assure that the system doesn’t go into an incremental error state. This paper presents a system integrated of optimal data table, microcontroller, and fuzzy logic to estimate missing data of failing sensors. The adapted approach is guided by the minimum error calculated from previously collected data. Experimental findings show that the system has great potentials of continuing to function with a failing node, with very low processing capabilities and storage requirements.
IRJET - Efficient and Verifiable Queries over Encrypted Data in CloudIRJET Journal
This document proposes a scheme for efficient and verifiable queries over encrypted data stored in the cloud. It aims to allow an authorized user to query encrypted documents of interest while maintaining privacy. The scheme provides a verification mechanism to allow users to check the correctness of query results and identify any valid results omitted by a potentially untrustworthy cloud server. The document reviews related work on searchable encryption and verifiable queries. It then outlines the proposed approach to build secure verifiable queries for encrypted cloud data.
Privacy preserving multi-keyword ranked search over encrypted cloud dataIGEEKS TECHNOLOGIES
This document proposes a system called privacy-preserving multi-keyword ranked search over encrypted cloud data (MRSE). Existing searchable encryption systems only support single-keyword or boolean keyword search without result ranking. The proposed MRSE system allows a user to search for multiple keywords and returns documents ranked by relevance. It establishes privacy requirements and uses an efficient "coordinate matching" semantic to quantify document similarity based on keyword matches. The system architecture includes modules for data owners to encrypt and upload files, for users to search and download encrypted files, and for ranking search results.
Accurate and Efficient Secured Dynamic Multi-keyword Ranked SearchDakshineshwar Swain
The document presents a project on developing an "accurate and efficient dynamic multi-keyword ranked search scheme". It aims to address issues with searching encrypted data outsourced to the cloud. The proposed system uses a tree-based index structure and "Greedy Depth-first Search" algorithm to improve search efficiency. It also utilizes TF-IDF scores to rank documents by relevance to search terms. The project aims to provide a searchable encryption scheme that supports accurate, efficient multi-keyword searches and dynamic operations on document collections.
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...Pvrtechnologies Nellore
This document proposes a secure multi-keyword ranked search scheme over encrypted cloud data that allows for dynamic operations like deletion and insertion of documents. It constructs a special tree-based index structure and uses a "Greedy Depth-first Search" algorithm to provide efficient search. The scheme utilizes a secure kNN algorithm to encrypt index and query vectors while still allowing accurate relevance scoring between encrypted vectors. It aims to achieve sub-linear search time and flexible handling of document updates.
A PPLICATION OF C LASSICAL E NCRYPTION T ECHNIQUES FOR S ECURING D ATA -...IJCI JOURNAL
The process of protecting information by transformi
ng (encrypting) it into an unreadable format is cal
led
cryptography. Only those who possess secret key can
decipher (decrypt) the message into plain text.
Encrypted messages can sometimes be broken by crypt
analysis, also called code breaking, so there is a
need for strong and fast cryptographic methods for
securing the data from attackers. Although modern
cryptography techniques are virtually unbreakable,
sometimes they also tend to attack.
As the Internet, big data, cloud data storage and
other forms of electronic communication become more
prevalent, electronic security is becoming increasi
ngly important. Cryptography is used to protect e-m
ail
messages, credit card information, corporate data,
cloud data and big data so on... So there is a need
for
best and fast cryptographic methods for protecting
the data. In this paper a method is proposed to pro
tect
the data in faster way by using classical cryptogra
phy. The encryption and decryption are done in par
allel
using threads with the help of underlying hardware.
The time taken by sequential and parallel method i
s
analysed
IRJET- An Efficient Solitude Securing Ranked Keyword Search TechniqueIRJET Journal
This document proposes a new efficient solitude securing ranked keyword search technique for cloud computing. It presents a Multi-keyword Ranked Search over Encrypted Hierarchical Clustering Index (MRSE-HCI) architecture that uses quality hierarchical clustering to organize documents and support fast keyword searches in an encrypted cloud environment. The technique uses a minimum hash subtree to verify the correctness and freshness of search results to address integrity and security issues. Experimental results demonstrate that the proposed technique improves search efficiency, accuracy, and rank privacy compared to traditional methods.
Automatic Insider Threat Detection in E-mail System using N-gram TechniqueIRJET Journal
This document summarizes research on automatic insider threat detection in email systems using n-gram techniques. It discusses how n-grams can be used to classify documents and detect potential data leakage through email. The system would verify emails sent outside the network using SHA, n-grams and thresholds. If a threat is detected, the user would be blocked. The document also provides a literature review on 10 other papers related to insider threat detection using methods like user profiling, activity logs, data mining and visualization techniques. It describes how n-grams work by breaking words into character sequences and creating profiles based on frequency to classify documents.
This document discusses 6 different thesis abstracts on topics related to IT security:
1) The design and implementation of an environment to support security assessment method development. This includes a database solution to assist developers.
2) A risk analysis of an RFID system used for logistics that identifies vehicles. The analysis examines the RFID communication and database transmission security and risks.
3) Key topics for a database security course, including technologies, access control, vulnerabilities, privacy, and secure database models.
4) A case-based reasoning approach to understand constraints in information models written in EXPRESS, representing constraints at a higher level of abstraction.
5) The benefits of a consolidated network security solution over point
Applicability of Network Logs for Securing Computer SystemsIDES Editor
Logging the events occurring on the network has
become very essential and thus playing a major role in
monitoring the events in order to keep check over them so
that they doesn’t harm any resources of the system or the
system itself. The analysis of network logs are becoming the
beneficial security research oriented field which will be desired
in the computer era. Organizations are reluctant to expose
their logs due to risk of attackers stealing the sensitive
information from their respective logs. In this paper we are
defining architecture and the security measures that can be
applied for a particular network log.
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...dbpublications
This document summarizes a research paper that proposes a new cloud data security model using role-based access control, encryption, and genetic algorithms. The model uses Token Based Data Security Algorithm (TBDSA) combined with RSA and AES encryption to securely encode, encrypt, and forward cloud data. A genetic algorithm is used to generate encrypted passwords for cloud users. Role managers are assigned to control user roles and data access. The aim is to integrate encoding, encrypting, and forwarding for secure cloud storage while minimizing processing time.
The document proposes a system for multi-keyword ranked search over encrypted cloud data while preserving privacy. It addresses limitations in previous systems that allowed single keyword search or did not consider privacy. The proposed system uses asymmetric key encryption, a block-max index, and dynamic key generation to allow efficient retrieval of relevant encrypted data from the cloud without security breaches. It involves three parts: (1) a server that encrypts and stores data in the cloud and sends decryption keys; (2) a cloud server that handles search requests, ranks results, and responds; and (3) users that request data from the cloud server.
Efficient Similarity Search Over Encrypted DataIRJET Journal
1) The document discusses efficient similarity search over encrypted data stored in the cloud. It proposes using Locality Sensitive Hashing (LSH) to enable fast similarity searches of encrypted data without decrypting it first.
2) When a user uploads data, features are extracted and hashed using LSH to group similar documents into buckets. When performing a search, the user's query is hashed to identify matching buckets. Matches are identified by finding correlations between stored documents and the query.
3) The method allows similarity searches of encrypted cloud data efficiently by indexing and hashing documents during upload and generating query hashes to match documents during search, without decrypting the actual data. This addresses privacy and security issues of sensitive data stored
IRJET- Data Security in Cloud Computing using Cryptographic AlgorithmsIRJET Journal
This document discusses data security in cloud computing using cryptographic algorithms. It begins by introducing cloud computing and cryptography. Cryptography is used to securely store and transmit data in the cloud since the data is no longer under the user's direct control. The document then discusses how AES (Advanced Encryption Standard) can be used to encrypt data for secure storage and transmission in cloud computing. It provides an overview of the AES algorithm, including the encryption process which involves sub-processes like byte substitution, shift rows, mix columns and adding round keys over multiple rounds. The document also provides pseudocode for the AES encryption process and discusses how AES encryption provides stronger security than other algorithms like DES.
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...CloudTechnologies
We are the company providing Complete Solution for all Academic Final Year/Semester Student Projects. Our projects are suitable for B.E (CSE,IT,ECE,EEE), B.Tech (CSE,IT,ECE,EEE),M.Tech (CSE,IT,ECE,EEE) B.sc (IT & CSE), M.sc (IT & CSE), MCA, and many more..... We are specialized on Java,Dot Net ,PHP & Andirod technologies. Each Project listed comes with the following deliverable: 1. Project Abstract 2. Complete functional code 3. Complete Project report with diagrams 4. Database 5. Screen-shots 6. Video File
SERVICE AT CLOUDTECHNOLOGIES
IEEE, WEB, WINDOWS PROJECTS ON DOT NET, JAVA& ANDROID TECHNOLOGIES,EMBEDDED SYSTEMS,MAT LAB,VLSI DESIGN.
ME, M-TECH PAPER PUBLISHING
COLLEGE TRAINING
Thanks&Regards
cloudtechnologies
# 304, Siri Towers,Behind Prime Hospitals
Maitrivanam, Ameerpet.
Contact:-8121953811,8522991105.040-65511811
cloudtechnologiesprojects@gmail.com
http://cloudstechnologies.in/
Efficient Privacy Preserving Clustering Based Multi Keyword Search IRJET Journal
This document proposes an efficient privacy-preserving clustering-based multi-keyword search system. It uses hierarchical clustering to generate clusters of encrypted documents in the cloud. The system aims to improve search efficiency while maintaining security. It utilizes EM clustering, SHA-1 hashing for deduplication, and a user revocation method. Experimental results show the framework has advantages such as efficient memory and time utilization, secure search over encrypted data, secure data storage, and deduplication.
IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...IRJET Journal
This document discusses implementing a threshold-based cryptographic technique for data and key storage security over cloud computing. It proposes a system that encrypts data stored on the cloud to prevent unauthorized access and data attacks by the cloud service provider. The system uses a threshold-based cryptographic approach that distributes encryption keys among multiple users, requiring a threshold number of keys to decrypt the data. This prevents collusion attacks and ensures data remains secure even if some user keys are compromised. The implementation results show the system can effectively secure data on the cloud and protect legitimate users from cheating or attacks from the cloud service provider or other users.
A data estimation for failing nodes using fuzzy logic with integrated microco...IJECEIAES
Continuous data transmission in wireless sensor networks (WSNs) is one of the most important characteristics which makes sensors prone to failure. A backup strategy needs to co-exist with the infrastructure of the network to assure that no data is missing. The proposed system relies on a backup strategy of building a history file that stores all collected data from these nodes. This file is used later on by fuzzy logic to estimate missing data in case of failure. An easily programmable microcontroller unit is equipped with a data storage mechanism used as cost worthy storage media for these data. An error in estimation is calculated constantly and used for updating a reference “optimal table” that is used in the estimation of missing data. The error values also assure that the system doesn’t go into an incremental error state. This paper presents a system integrated of optimal data table, microcontroller, and fuzzy logic to estimate missing data of failing sensors. The adapted approach is guided by the minimum error calculated from previously collected data. Experimental findings show that the system has great potentials of continuing to function with a failing node, with very low processing capabilities and storage requirements.
IRJET - Efficient and Verifiable Queries over Encrypted Data in CloudIRJET Journal
This document proposes a scheme for efficient and verifiable queries over encrypted data stored in the cloud. It aims to allow an authorized user to query encrypted documents of interest while maintaining privacy. The scheme provides a verification mechanism to allow users to check the correctness of query results and identify any valid results omitted by a potentially untrustworthy cloud server. The document reviews related work on searchable encryption and verifiable queries. It then outlines the proposed approach to build secure verifiable queries for encrypted cloud data.
Privacy preserving multi-keyword ranked search over encrypted cloud dataIGEEKS TECHNOLOGIES
This document proposes a system called privacy-preserving multi-keyword ranked search over encrypted cloud data (MRSE). Existing searchable encryption systems only support single-keyword or boolean keyword search without result ranking. The proposed MRSE system allows a user to search for multiple keywords and returns documents ranked by relevance. It establishes privacy requirements and uses an efficient "coordinate matching" semantic to quantify document similarity based on keyword matches. The system architecture includes modules for data owners to encrypt and upload files, for users to search and download encrypted files, and for ranking search results.
Accurate and Efficient Secured Dynamic Multi-keyword Ranked SearchDakshineshwar Swain
The document presents a project on developing an "accurate and efficient dynamic multi-keyword ranked search scheme". It aims to address issues with searching encrypted data outsourced to the cloud. The proposed system uses a tree-based index structure and "Greedy Depth-first Search" algorithm to improve search efficiency. It also utilizes TF-IDF scores to rank documents by relevance to search terms. The project aims to provide a searchable encryption scheme that supports accurate, efficient multi-keyword searches and dynamic operations on document collections.
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...Pvrtechnologies Nellore
This document proposes a secure multi-keyword ranked search scheme over encrypted cloud data that allows for dynamic operations like deletion and insertion of documents. It constructs a special tree-based index structure and uses a "Greedy Depth-first Search" algorithm to provide efficient search. The scheme utilizes a secure kNN algorithm to encrypt index and query vectors while still allowing accurate relevance scoring between encrypted vectors. It aims to achieve sub-linear search time and flexible handling of document updates.
A PPLICATION OF C LASSICAL E NCRYPTION T ECHNIQUES FOR S ECURING D ATA -...IJCI JOURNAL
The process of protecting information by transformi
ng (encrypting) it into an unreadable format is cal
led
cryptography. Only those who possess secret key can
decipher (decrypt) the message into plain text.
Encrypted messages can sometimes be broken by crypt
analysis, also called code breaking, so there is a
need for strong and fast cryptographic methods for
securing the data from attackers. Although modern
cryptography techniques are virtually unbreakable,
sometimes they also tend to attack.
As the Internet, big data, cloud data storage and
other forms of electronic communication become more
prevalent, electronic security is becoming increasi
ngly important. Cryptography is used to protect e-m
ail
messages, credit card information, corporate data,
cloud data and big data so on... So there is a need
for
best and fast cryptographic methods for protecting
the data. In this paper a method is proposed to pro
tect
the data in faster way by using classical cryptogra
phy. The encryption and decryption are done in par
allel
using threads with the help of underlying hardware.
The time taken by sequential and parallel method i
s
analysed
IRJET- An Efficient Solitude Securing Ranked Keyword Search TechniqueIRJET Journal
This document proposes a new efficient solitude securing ranked keyword search technique for cloud computing. It presents a Multi-keyword Ranked Search over Encrypted Hierarchical Clustering Index (MRSE-HCI) architecture that uses quality hierarchical clustering to organize documents and support fast keyword searches in an encrypted cloud environment. The technique uses a minimum hash subtree to verify the correctness and freshness of search results to address integrity and security issues. Experimental results demonstrate that the proposed technique improves search efficiency, accuracy, and rank privacy compared to traditional methods.
Automatic Insider Threat Detection in E-mail System using N-gram TechniqueIRJET Journal
This document summarizes research on automatic insider threat detection in email systems using n-gram techniques. It discusses how n-grams can be used to classify documents and detect potential data leakage through email. The system would verify emails sent outside the network using SHA, n-grams and thresholds. If a threat is detected, the user would be blocked. The document also provides a literature review on 10 other papers related to insider threat detection using methods like user profiling, activity logs, data mining and visualization techniques. It describes how n-grams work by breaking words into character sequences and creating profiles based on frequency to classify documents.
This document discusses 6 different thesis abstracts on topics related to IT security:
1) The design and implementation of an environment to support security assessment method development. This includes a database solution to assist developers.
2) A risk analysis of an RFID system used for logistics that identifies vehicles. The analysis examines the RFID communication and database transmission security and risks.
3) Key topics for a database security course, including technologies, access control, vulnerabilities, privacy, and secure database models.
4) A case-based reasoning approach to understand constraints in information models written in EXPRESS, representing constraints at a higher level of abstraction.
5) The benefits of a consolidated network security solution over point
Applicability of Network Logs for Securing Computer SystemsIDES Editor
Logging the events occurring on the network has
become very essential and thus playing a major role in
monitoring the events in order to keep check over them so
that they doesn’t harm any resources of the system or the
system itself. The analysis of network logs are becoming the
beneficial security research oriented field which will be desired
in the computer era. Organizations are reluctant to expose
their logs due to risk of attackers stealing the sensitive
information from their respective logs. In this paper we are
defining architecture and the security measures that can be
applied for a particular network log.
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...dbpublications
This document summarizes a research paper that proposes a new cloud data security model using role-based access control, encryption, and genetic algorithms. The model uses Token Based Data Security Algorithm (TBDSA) combined with RSA and AES encryption to securely encode, encrypt, and forward cloud data. A genetic algorithm is used to generate encrypted passwords for cloud users. Role managers are assigned to control user roles and data access. The aim is to integrate encoding, encrypting, and forwarding for secure cloud storage while minimizing processing time.
With the tremendous growth in Internet-based applications, there has been a steady elevation in the cyber world based accommodations such as websites, email, web portals, portlets, API etc. With the increase in growth of Internet-based applications, the threat landscape has incremented manifolds. The number of attacks on IT infrastructure has also increased spontaneously. The increase in the infrastructure has posted the assessment of maleficent invasion as a major challenge. To amend the security ecosystem it is desirable to have a complete security solution that covers all verticals and horizontals of the threat landscape. This paper proposes to have a wholesome security ecosystem for the huge amount of websites from malignant attacks and threats, to increase knowledge about traffic patterns and trends and also to perform authentic time decision on maleficent traffic. Log analysis is the art of making sense out of computer-generated records (i.e logs). A technique is evolved for log aggregation and analysis in authentic time through a dashboard and terminal exhibit. It is performed with the help of user interactive displays, real-time alerts are generated based on conditions, and preventive actions can be taken based on those alerts.
IRJET- Providing Privacy in Healthcare Cloud for Medical Data using Fog Compu...IRJET Journal
This document proposes a method to improve healthcare data security when stored in the cloud. It discusses how currently, healthcare data stored entirely in the cloud loses user control and faces privacy risks. The proposed method uses "Split and Combine" (SaC) technique to split data between the cloud and a local server. 80% of the split data is encrypted and uploaded to the cloud, while the remaining 20% is stored locally. When a user requests data, it is downloaded from both sources, combined and decrypted. The document evaluates this method and finds it increases security by partially storing data locally compared to traditional cloud-only storage. It concludes the SaC technique enhances privacy but the local server must always be online to combine data.
Effective Information Flow Control as a Service: EIFCaaSIRJET Journal
This document presents a framework called Effective Information Flow Control as a Service (EIFCaaS) to detect vulnerabilities in Software as a Service (SaaS) applications in cloud computing environments. EIFCaaS analyzes application bytecode using static taint analysis to identify insecure information flows that could violate data confidentiality or integrity. The framework consists of four main components: a model generator, an information flow control engine, a vulnerability detector, and a result publisher. The framework was implemented as a prototype and evaluated on six open source applications, detecting SQL injection and NoSQL injection vulnerabilities. EIFCaaS aims to provide third-party security analysis and monitoring of SaaS applications as a cloud-based service.
Collecting and analyzing network-based evidenceCSITiaesprime
Since nearly the beginning of the Internet, malware has been a significant deterrent to productivity for end users, both personal and business related. Due to the pervasiveness of digital technologies in all aspects of human lives, it is increasingly unlikely that a digital device is involved as goal, medium or simply ‘witness’ of a criminal event. Forensic investigations include collection, recovery, analysis, and presentation of information stored on network devices and related to network crimes. These activities often involve wide range of analysis tools and application of different methods. This work presents methods that helps digital investigators to correlate and present information acquired from forensic data, with the aim to get a more valuable reconstructions of events or action to reach case conclusions. Main aim of network forensic is to gather evidence. Additionally, the evidence obtained during the investigation must be produced through a rigorous investigation procedure in a legal context.
Implementation of Secured Network Based Intrusion Detection System Using SVM ...IRJET Journal
This document discusses the implementation of a secured network-based intrusion detection system using the support vector machine (SVM) algorithm. It begins with an abstract that outlines hardening different intrusion detection implementations and proposals. The paper then discusses using naive Bayes, a classification method for intrusion detection, to analyze transmitted data for malicious content and block transmissions from corrupted hosts. It also discusses using flow correlation information to improve classification accuracy while minimizing effects on network performance.
IRJET- An Intrusion Detection and Protection System by using Data Mining ...IRJET Journal
This document proposes an Internal Intrusion Detection and Protection System (IIDPS) to detect insider attacks by analyzing system calls (SCs) using data mining and forensic techniques. The IIDPS creates personal profiles for each user to track their computer usage behaviors over time. When a user logs in, the IIDPS compares their current behaviors to the patterns in their personal profile to determine if they are the legitimate account holder or an unauthorized insider attacker. The IIDPS aims to more accurately authenticate users and detect insider threats compared to existing systems that rely only on usernames and passwords.
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
Building practical and efficient intrusion detection systems in computer network is important in industrial areas today and machine learning technique provides a set of effective algorithms to detect network
intrusion. To find out appropriate algorithms for building such kinds of systems, it is necessary to evaluate various types of machine learning algorithms based on specific criteria. In this paper, we propose a novel evaluation formula which incorporates 6 indexes into our comprehensive measurement, including precision, recall, root mean square error, training time, sample complexity and practicability, in order to
find algorithms which have high detection rate, low training time, need less training samples and are easy
to use like constructing, understanding and analyzing models. Detailed evaluation process is designed to
get all necessary assessment indicators and 6 kinds of machine learning algorithms are evaluated.
Experimental results illustrate that Logistic Regression shows the best overall performance.
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
This document proposes a novel evaluation approach to find lightweight machine learning algorithms for intrusion detection. It incorporates 6 evaluation indexes: precision, recall, root mean square error, training time, sample complexity, and practicability. The evaluation formula calculates a score for each algorithm based on F1 score and penalty values. The document defines penalty values for the practicability of 6 machine learning algorithms (decision tree, naive bayes, multilayer perceptron, radial basis function network, logistic regression, support vector machine). Experimental results on intrusion detection datasets will evaluate the algorithms based on the proposed approach.
Feature Selection using the Concept of Peafowl Mating in IDSIJCNCJournal
Cloud computing has high applicability as an Internet based service that relies on sharing computing resources. Cloud computing provides services that are Infrastructure based, Platform based and Software based. The popularity of this technology is due to its superb performance, high level of computing ability, low cost of services, scalability, availability and flexibility. The obtainability and openness of data in cloud environment make it vulnerable to the world of cyber-attacks. To detect the attacks Intrusion Detection System is used, that can identify the attacks and ensure information security. Such a coherent and proficient Intrusion Detection System is proposed in this paper to achieve higher certainty levels regarding safety in cloud environment. In this paper, the mating behavior of peafowl is incorporated into an optimization algorithm which in turn is used as a feature selection algorithm. The algorithm is used to reduce the huge size of cloud data so that the IDS can work efficiently on the cloud to detect intrusions. The proposed model has been experimented with NSL-KDD dataset as well as Kyoto dataset and have proved to be a better as well as an efficient IDS.
Feature Selection using the Concept of Peafowl Mating in IDSIJCNCJournal
Cloud computing has high applicability as an Internet based service that relies on sharing computing resources. Cloud computing provides services that are Infrastructure based, Platform based and Software based. The popularity of this technology is due to its superb performance, high level of computing ability, low cost of services, scalability, availability and flexibility. The obtainability and openness of data in cloud environment make it vulnerable to the world of cyber-attacks. To detect the attacks Intrusion Detection System is used, that can identify the attacks and ensure information security. Such a coherent and proficient Intrusion Detection System is proposed in this paper to achieve higher certainty levels regarding safety in cloud environment. In this paper, the mating behavior of peafowl is incorporated into an optimization algorithm which in turn is used as a feature selection algorithm. The algorithm is used to reduce the huge size of cloud data so that the IDS can work efficiently on the cloud to detect intrusions. The proposed model has been experimented with NSL-KDD dataset as well as Kyoto dataset and have proved to be a better as well as an efficient IDS.
The document summarizes various technologies used for cloud computing security. It discusses three main methods: data splitting, data anonymization, and cryptographic techniques.
Data splitting involves separating confidential data into fragments that are stored in different locations. Data anonymization irreversibly hides data to protect sensitive information while still allowing analysis. Cryptographic techniques like encryption can be used to encrypt data before outsourcing, but limit cloud capabilities unless advanced encryption methods are used.
The document compares the advantages and disadvantages of each method for security, overhead, functionality, and key criteria. It provides an overview of approaches for maintaining data security in cloud computing.
IRJET- Efficient Privacy-Preserving using Novel Based Secure Protocol in SVMIRJET Journal
This document presents a novel framework for improving privacy and efficiency in support vector machine (SVM) classification. The framework uses a lightweight multiparty random masking protocol to encrypt user data before it is sent to a server for SVM classification. The classification results are then stored in the cloud. A polynomial aggregation protocol is also used to prevent data leakage while maintaining privacy. The proposed approach is evaluated using two real datasets and is shown to achieve higher accuracy and efficiency compared to conventional methods, while ensuring user data privacy.
Intrusion Detection Systems By Anamoly-Based Using Neural NetworkIOSR Journals
To improve network security different steps has been taken as size and importance of the network has
increases day by day. Then chances of a network attacks increases Network is mainly attacked by some
intrusions that are identified by network intrusion detection system. These intrusions are mainly present in data
packets and each packet has to scan for its detection. This paper works to develop a intrusion detection system
which utilizes the identity and signature of the intrusion for identifying different kinds of intrusions. As network
intrusion detection system need to be efficient enough that chance of false alarm generation should be less,
which means identifying as a intrusion but actually it is not an intrusion. Result obtained after analyzing this
system is quite good enough that nearly 90% of true alarms are generated. It detect intrusion for various
services like Dos, SSH, etc by neural network
CONTEXT-AWARE SECURITY MECHANISM FOR MOBILE CLOUD COMPUTING IJNSA Journal
The use of mobile devices is common among people and something essential these days. These devices have limited resources which makes it critical to provide security without compromising user ergonomics, given the large number of cyberattacks that occur. This work proposes a context-aware security mechanism for Mobile Cloud Computing providing a security level of device data privacy from the analysis of the attributes of the network that is connected, available level of RAM, CPU, and battery at the time of communication. of data with the cloud. In addition, Transport Layer Security (TLS) technology is used to create a secure channel for sending data between the client and the server and implement the analysis of the mobile device context using Fuzzy logic. The impact of the proposed mechanism on mobile device performance was measured through stress tests. The proposed mechanism had a superior performance of 38% in the number of executions, 10% of memory, and 0.6% of CPU about the use of a single type of predefined symmetric algorithm for private network environment.
IRJET- Improving Cyber Security using Artificial IntelligenceIRJET Journal
This document discusses using artificial intelligence techniques like machine learning algorithms to improve cyber security. It proposes a methodology that uses Splunk to extract relevant fields from cybersecurity data, feeds that into a K-means clustering algorithm to form attack clusters, then sends those clusters to individual artificial neural networks (ANNs). The aggregated ANN results are then fed into a support vector machine (SVM) which classifies attacks as malicious, non-malicious, or benign. Testing this approach on a dataset achieved a classification accuracy of over 92% when using Splunk, K-means, ANNs, and SVM together.
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITYijgca
Using Grid Storage, users can remotely store their data and enjoy the on-demand high quality applications and services from a shared networks of configurable computing resources, without the burden of local data storage and maintenance. In this project based on the dynamic secrets proposed design an encryption scheme for SG wireless communication, named as dynamic secret-based encryption (DSE). Dynamic encryption key (DEK) is updated by XOR the previous DEK with current DS. In this project based on the dynamic secrets proposed design an encryption scheme for SG wireless communication, named as dynamic secret-based encryption (DSE). The basic idea of dynamic secrets is to generate a series of secrets from unavoidable transmission errors and other random factors in wireless communications In DSE, the previous packets are coded as binary values 0 and 1 according to whether they are retransmitted due to channel error. This 0/1 sequence is called as retransmission sequence (RS) which is applied to generate dynamic secret (DS). Dynamic encryption key (DEK) is updated by XOR the previous DEK with current DS.
Similar to FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACHE SPARK (20)
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Presentation of the OECD Artificial Intelligence Review of Germany
FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACHE SPARK
1. International Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
DOI: 10.5121/ijnsa.2017.9604 39
FEATURE EXTRACTION AND FEATURE SELECTION:
REDUCING DATA COMPLEXITY WITH APACHE
SPARK
Dimitrios Sisiaridis and Olivier Markowitch
QualSec Research Group, Departement d’Informatique, Université Libre de Bruxelles
ABSTRACT
Feature extraction and feature selection are the first tasks in pre-processing of input logs in order to detect
cyber security threats and attacks while utilizing machine learning. When it comes to the analysis of
heterogeneous data derived from different sources, these tasks are found to be time-consuming and difficult
to be managed efficiently. In this paper, we present an approach for handling feature extraction and
feature selection for security analytics of heterogeneous data derived from different network sensors. The
approach is implemented in Apache Spark, using its python API, named pyspark.
KEYWORDS
Machine learning, feature extraction, feature selection, security analytics, Apache Spark
1. INTRODUCTION
Today, a perimeter-only security model in communication system is insufficient. With the Bring
Your Own Device (BYOD) and IoT, data now move beyond the perimeter. For example, threats
to the intellectual property and generally to sensitive data of an organization, are related either to
insider attacks, outsider targeted attacks, combined forms of internal and external attacks or
attacks, performed over a long period. Adversaries can be either criminal organizations, care- less
employees, compromised employees, leaving employees or state-sponsored cyber espionage [6].
The augmentation of cyber security attacks during the last years emerges the need for automated
traffic log analysis over a long period of time at every level of the enterprise or organizations
information system. Unstructured, semi- structured or structured data in time-series with respect
to security-related events from users, services and the underlying network infrastructure usually
present a high level of large dimensionality and non-stationarity.
There is a plethora of examples in the literature as well as in open-source or commercial threat
detection tools where machine learning algorithms are used to correlate events and to apply
predictive analytics in the cyber security landscape.
Incident correlation refers to the process of comparing different events, often from multiple data
sources in order to identify patterns and relationships enabling identification of events belonging
to one attack or, indicative of broader malicious activity. It allows us to better understand the
nature of an event, to reduce the workload needed to handle incidents, to automate the
classification and forwarding of incidents only relevant to a particular consistency and to allow
analysis to identify and reduce potential false positives.
Predictive Analytics, using pattern analysis, deals with the prediction of future events based on
previously observed historical data, by applying methods such as Machine Learning [9]. For
example, a supervised learning method can build a predictive model from training data to make
predictions about new observations as it is presented in [7].
2. International Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
40
We need to build autonomous systems that could act in response to an attack in an early stage.
Intelligent machines could implement algorithms designed to identify patterns and behaviors
related to cyber threats in real time and provide an instantaneous response with respect to their
reliability, privacy, trust and overall security policy framework.
By utilizing Artificial Intelligence (AI) techniques leveraged by machine learning and data
mining methods, a learning engine enables the consumption of seemingly unrelated disparate
datasets, to discover correlated patterns that result in consistent outcomes with respect to the
access behavior of users, network devices and applications involved in risky abnormal actions,
and thus reducing the amount of security noise and false positives. Machine learning algorithms
can be used to examine, for example, statistical features or domain and IP reputation as it
proposed in [1] and [12].
Along with history- and user-related data, network log data are exploited to identify abnormal
behavior concerning targeted attacks against the underlying network infrastructure as well as
attack forms such as man-in-the-middle and DDoS attacks [3] [9].
Data acquisition and data mining methods, with respect to different types of attacks such as
targeted and indiscriminate attacks, provide a perspective of the threat landscape. It is crucial to
extract and select the right data for our analysis, among the plethora of information produced
daily by the information system of a company, enterprise or an organization [8]. Enhanced log
data are then analyzed for new attack patterns and the outcome, e.g. in the form of behavioral risk
scores and historical baseline profiles of normal behavior is forwarded to update the learning
engine. Any unusual or suspected behavior can then be identified as an anomaly or an outlier in
real or near real-time [11].
We propose an approach to automate the tasks of feature extraction and feature selection using
machine learning methods, as the first stages of a modular approach for the detection and/or
prediction of cybersecurity attacks. For the needs of our experiments we employed the Spark
framework and more specifically its python API, pyspark.
Section 2 explains the difficulties of these tasks especially while working with heterogeneous data
taken from different sources and of different formats. In section 3 we explain the difference
between two common approaches in feature extraction by utilizing machine learning techniques.
Section 4 deals with the task of extracting data from logs of increased data complexity. In section
5 we propose methods for the task of feature selection, while our conclusions are presented in
section 6.
2. EXTRACTING AND SELECTING FEATURES FROM HETEROGENEOUS DATA
In our experiments, we examine the case where we have logs of records derived as the result of an
integration of logs produced by different network tools and sensors (heterogeneous data from
different resources). Each one of them monitors and records a view of the system in the form of
records of different attributes and/or of different structure, implying thus an increased level of
interoperability problems in a multi-level, multi-dimensional feature space; each network
monitoring tool produces its own schema of attributes.
In such cases, it is typical that the number of attributes is not constant across the records, while
the number of complex attributes varies as well [8]. On the other hand, there are attributes, e.g.,
dates, expressed in several formats, or other attributes referred to the same piece of information
by using slightly different attribute names. Most of them are categorical, in a string format while
the inner datatype varies from nested dictionaries, linked lists or arrays of further complex
structure; each one of them may present its own multi-level structure which increases the level of
complexity. In such cases, a clear strategy has to be followed for feature extraction. Therefore, we
have to deal with flattening 1 and interoperability solving processes (Figure 1).
3. International Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
In our experiments, we used as input data an integrated log of recorded events produced by a
number of different network tools, applied on
stage was used a single server of 2xCPUs, 8cores/CPU, 64GB RAM, running an
installation v2.7 with Apache Spark v2.1.0
as a low-cost configuration for handling data exploration when dealing with huge amount of
inputs; raw data volumes, for batch analysis, were approximately
Figure 1. Logs from different input sources
3. GLOBAL FLATTENING V
The first question to be answered is related to the ability to define an optimal way to handle such
complex inputs. Potential solutions may include:
• use the full number of dimensions (i.e. all the available features in each record), defined
as global flattening
• decomposing initial logs into distinct baseline structures derived by each sensor/tool,
defined as local flattening
3.1. LOOKING FOR THREATS
In order to answer to these questions, we should
next steps. While working with the analysis of heterogeneous data taken from different sources,
pre-process procedures, such as
need to be carefully designed in order not to miss any security
tasks are usually time-consuming producing thus significant delays to the overall time of th
analysis.
That is our main motivation in this work: to reduce the time needed for feature extraction in data
exploration analysis by automating the process. In order to achieve it, we
onal Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
we used as input data an integrated log of recorded events produced by a
tools, applied on a telco network. For the pre-processing analysis
stage was used a single server of 2xCPUs, 8cores/CPU, 64GB RAM, running an Apache Hadoo
installation v2.7 with Apache Spark v2.1.0 as a Standalone Cluster Mode, which can be regarded
cost configuration for handling data exploration when dealing with huge amount of
; raw data volumes, for batch analysis, were approximately 16TBytes.
Figure 1. Logs from different input sources
VS. LOCAL FLATTENING
The first question to be answered is related to the ability to define an optimal way to handle such
complex inputs. Potential solutions may include:
the full number of dimensions (i.e. all the available features in each record), defined
decomposing initial logs into distinct baseline structures derived by each sensor/tool,
defined as local flattening
HREATS AND ATTACKS IN A KILL CHAIN
In order to answer to these questions, we should also take into account the rationale behind the
next steps. While working with the analysis of heterogeneous data taken from different sources,
edures, such as feature extraction, feature selection and feature transformation,
need to be carefully designed in order not to miss any security-related significant events. These
consuming producing thus significant delays to the overall time of th
That is our main motivation in this work: to reduce the time needed for feature extraction in data
exploration analysis by automating the process. In order to achieve it, we utilize the data model
onal Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
41
we used as input data an integrated log of recorded events produced by a
processing analysis
Apache Hadoop
, which can be regarded
cost configuration for handling data exploration when dealing with huge amount of
The first question to be answered is related to the ability to define an optimal way to handle such
the full number of dimensions (i.e. all the available features in each record), defined
decomposing initial logs into distinct baseline structures derived by each sensor/tool,
tionale behind the
next steps. While working with the analysis of heterogeneous data taken from different sources,
tion, feature selection and feature transformation,
nt events. These
consuming producing thus significant delays to the overall time of the data
That is our main motivation in this work: to reduce the time needed for feature extraction in data
the data model
4. International Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
42
abstractions and we keep to a minimum any access to the actual data. The key characteristics of
data inputs follow:
• logs derived from different sources
• heterogeneous data
• high-level of complexity
• information is usually hidden in multi-level complex structures
In the next stage, features will be transformed, indexed and scaled to overcome skewness, by
following usually a normal distribution under a common metric space, in the form of vectors. As
in our experiments we processed mainly un-labelled data (i.e. lack of any labels or any indication
of a suspicious threat/attack), clustering techniques will be used to define baseline behavioural
profiles and to detect outliers [1]. The latter may correspond to rare, sparse anomalies, that can be
found by either first-class detection of novelties,n-gram analysis of nested attributes and pattern
analysis using Indicators of Compromise (IoCs) [8] [6]. A survey on unsupervised learning
outlier detection algorithms is presented in [14]. Finally, semi-supervised or/and supervised
analysis can be further employed by using cluster labels, anomalous clusters, or experts feedback
(using active learning methods), in order to detect and/or predict threats and attacks in near- and
real-time analysis [3].
Outliers in time-series are expected to be found for a:
• single network sensor or pen-tester
• a subset of those, or
• by taking into account the complete available set of sensors and network monitoring tools
These time-series are defined in terms of either:
• time spaces as the contextual attributes
• date attributes will be decomposed to time windows such as year, month, day of a
week, hour and minute, following the approach proposed in [13]
• statistics will be calculated either for batch or online mode and then will be stored
in HIVE tables, or in temporary views for ad-hoc temporal real-time analysis.
• a single time space (e.g. a specific day)
• a stable window time space (e.g. all days for a specific month)
• a user-defined variable window time space
Our approach serves as an adaptation of the kill-chain model. The kill chain model [2] is an
intelligence-driven, threat-focused approach to study intrusions from the adversaries’ perspective.
The fundamental element is the indicator which corresponds to any piece of information that can
describe a threat or an attack. Indicators can be either atomic such as IP or email addresses,
computed such as hash values or regular expressions, or behavioral which are collections of
computed and atomic indicators such as statements.
Thus, in our proposal, contextual attributes represent either time-spaces in time-series, as the first
level of interest, single attributes (e.g. a specific network protocol, or a user or any other atomic
indicator), computed attributes (e.g. hash values or regular expressions), or even behavioral
attributes of inner structure (e.g. collections of single and computed attributes in the form of
statements or nested records). Then, outliers can be defined for multiple levels of interest for the
remain behavioral attributes, by looking into single vector values [4], or by looking for the
covariance and pairwise correlation (e.g. Pearson correlation) in a subset of the selected features
or the complete set of features [5].
5. International Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
43
Experiments with data received from different network monitoring tools regarding the system of a
telco enterprise, at the exploratory data stage revealed that the number of single feature attributes
in this log were between a range of 7 (the smallest number of attributes of a distinct feature space)
up to 99 attributes (corresponding to the total number of the overall available feature space). A
fact, that led us to carry on with feature extraction by focusing on flattening multi-nested records
separately for each different structure (under a number of 13 different baseline structures).
Thus, the main keys in the proposed approach for feature extraction are:
• extract the right data
• correlation of the ’right data’ can reveal long-term APTs
• re-usable patterns and trend lines as probabilities are indications of zero-day
attacks
• trend lines may also be used to detect DDoS attacks
• handle interoperability issues
• handle time inconsistencies, date formats, different names for the same piece of
information, by extending the NLP python library [2]
3.2. GLOBAL FLATTENING OF INPUT DATA
By following this approach, we achieve a full-view of entities behavior as each row is represented
by the full set of dimensions. On the other hand, the majority of the columns do not have a value
or it is set to Null. A candidate solution would be to use sparse vectors in the next stage of feature
transformation, which in turn demands special care for NaN and Null values (for example,
replace them either with the mean, the median, or with a special value). Most of the data in this
stage are categorical. We need to convert them into numerical in the next stages, as in Spark,
statistical analytics are available only for data in the form of a Vector or of the DoubleType.
This solution performs efficiently for a rather small number of dimensions while it suffers from
the well-known phenomenon of the curse of dimensionality for a high number of dimensions,
where data appear to be sparse and dissimilar in several ways, which prevents common data
modelling strategies from being efficient.
3.3. LOCAL FLATTENING OF INPUT DATA
By following this approach, we identify all the different schemas in input data. First, it is a
bottom-up analysis by re-sythesing results to answer to either simple of complex questions. In the
same time, we can define hypotheses to the full set of our input data (i.e. top-down analysis) thus,
it is a complete approach in data analytics, by allowing data to tell their story, in a concrete way,
following a minimum number of steps. In this way, we are able to:
• keep the number of assumptions to a minimum
• look for misconfigurations and data correlations into the abstract dataframes definitions
• keep access to the actual data to a minimum
• provide solutions in interoperability problems, such as:
• different representations of date attributes
• namespace inconsistencies (e.g. attributes with names such as prot, protocol,
connectionProtocol)
• cope with complex structures of different number of inner levels
• deal with event ordering and time-inconsistencies (as it is described in [10])
6. International Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
4. FEATURE EXTRACTION
In Apache Spark, data are organized
relational tables: there are columns (aka attributes or features or dimensions) and rows (i.e. events
recorded, for example, by a network sensor, or
corresponded datatypes define the schema of a dataframe. In each dataframe, its columns and
rows i.e. its schema is unchangeable. Thus, an example of a schema could be the following:
DataFrame[id: string, @timestamp: string, honeypot: string, payloadCommand: string]
A sample of recorded events of this dataframe schema is shown in Figure 2:
Figure 2. A sample of recorded events, having 4 columns/dimensions
The following steps refer to the case in which logs/datasets are ingested i
approach examines the data structures on their top
synthesis of previous and new dataframes, in an automatic way. Access to the actual data is only
taken place when there is a need to find schemas in
the records (thus, even if we have a dataframe of million/billions of events, we only examine the
schema of the first record/event).
data: a data frame column.
1) load the log file in a spark data
2) find and remove all single-valued attributes
section)
3) flatten complex structures
a) find and flatten all columns of complex structure (th
the lowest complex attribute of the hierarchy of complex attributes)
i) e.g. struct, nested dictionaries, linked
their value is of Row
b) remove all the original columns o
4) convert all time-columns into timestamps, using distinct time features in the data
5) integrate similar features in the list of data
Each attribute of a complex structure, such as of a
array, is handled in such way, which
the elements of an array, a dictionary, a list, etc.) will be
way, we manage to transform the schema of the
each one corresponding to a schema that refers to a single network sensor or other input data
source, as it is illustrated in the following figures (Figures 3, 4,
different cases follow:
• struct – RowType
• use the leaf column at the last level of this struct
column
• list: add list elements as new columns
onal Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
XTRACTION IN APACHE SPARK
organized in the form of dataframes, which resemble the well
relational tables: there are columns (aka attributes or features or dimensions) and rows (i.e. events
recorded, for example, by a network sensor, or a specific device). The list of columns and their
fine the schema of a dataframe. In each dataframe, its columns and
rows i.e. its schema is unchangeable. Thus, an example of a schema could be the following:
DataFrame[id: string, @timestamp: string, honeypot: string, payloadCommand: string]
corded events of this dataframe schema is shown in Figure 2:
Figure 2. A sample of recorded events, having 4 columns/dimensions
The following steps refer to the case in which logs/datasets are ingested in json
approach examines the data structures on their top-level, focusing on abstract schemas and re
synthesis of previous and new dataframes, in an automatic way. Access to the actual data is only
taken place when there is a need to find schemas in dictionaries and only by retrieving just one of
the records (thus, even if we have a dataframe of million/billions of events, we only examine the
vent). The words field, attribute or column refer to the same piece of
file in a spark data frame, in json format
valued attributes (this step applies also to the feature selectio
find and flatten all columns of complex structure (the steps are run recursively, down to
the lowest complex attribute of the hierarchy of complex attributes)
uct, nested dictionaries, linked lists, arrays, etc. (i.e. currently those which
their value is of Row Type)
remove all the original columns of complex structure
columns into timestamps, using distinct time features in the data
features in the list of data frames
Each attribute of a complex structure, such as of a struct, nested dictionary, linked
, is handled in such way, which ensures that all single attributes of the lowest data level (i.e.
the elements of an array, a dictionary, a list, etc.) will be flattened and expressed in 2
way, we manage to transform the schema of the original dataframe to a number of dataframes,
each one corresponding to a schema that refers to a single network sensor or other input data
source, as it is illustrated in the following figures (Figures 3, 4, 5 and 6). The steps, for
use the leaf column at the last level of this struct-column to add it as a new
add list elements as new columns
onal Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
44
, which resemble the well-known
relational tables: there are columns (aka attributes or features or dimensions) and rows (i.e. events
a specific device). The list of columns and their
fine the schema of a dataframe. In each dataframe, its columns and
rows i.e. its schema is unchangeable. Thus, an example of a schema could be the following:
json format. Our
level, focusing on abstract schemas and re-
synthesis of previous and new dataframes, in an automatic way. Access to the actual data is only
dictionaries and only by retrieving just one of
the records (thus, even if we have a dataframe of million/billions of events, we only examine the
refer to the same piece of
feature selection
e steps are run recursively, down to
lists, arrays, etc. (i.e. currently those which
columns into timestamps, using distinct time features in the data frames
linked list or an
ensures that all single attributes of the lowest data level (i.e.
flattened and expressed in 2-D.In this
original dataframe to a number of dataframes,
each one corresponding to a schema that refers to a single network sensor or other input data
The steps, for these
column to add it as a new
7. International Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
• array: split array’s elements and add the relevant new columns
• dictionary - steps:
• find all inner schemas for attributes of type Dict, as a list
• add the schemaType as an ind
• create a list of dataframes, where each one has its own distinct schema
• flatten all dictionary
the list of dataframes by adding them as new columns
In Figure 3, in the left-hand schema
value is given by the inner-level attribute
the actual date value can be searched in the inner
$oid and $date are extracted in the form of two new columns, named
original attributes _id and timestamp are then de
side, Schema#2. In this way, we achieved to reduce the complexity of the original input schema to
a new one of lower complexity.
Figure 3. Transforming complex fields (i)
In Figure 3, the exploratory analysis has revealed that the
dictionary as a list of multi-nested dictionaries; each one of the latter present a complex structure
with further levels. These different schemas found in payload are presented in Figure 4.
Figure 4
onal Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
split array’s elements and add the relevant new columns
find all inner schemas for attributes of type Dict, as a list
add the schemaType as an index to the original dataframe
create a list of dataframes, where each one has its own distinct schema
ionary attributes, according to their schemas, in each dataframe of
dataframes by adding them as new columns
hand schemaSchema#1, attributes _id is of the datatype struct
level attribute, $oid. The same stands for the outer attribute
the actual date value can be searched in the inner-level attribute $date. In both cases, attributes
are extracted in the form of two new columns, named _id_ and
original attributes _id and timestamp are then deleted, having thus a new schema on the right
. In this way, we achieved to reduce the complexity of the original input schema to
Figure 3. Transforming complex fields (i) – attributes _id and timestamp are of the datatype
In Figure 3, the exploratory analysis has revealed that the payload attribute represents actually a
nested dictionaries; each one of the latter present a complex structure
e different schemas found in payload are presented in Figure 4.
Figure 4. Different schemas in the payload attribute
onal Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
45
create a list of dataframes, where each one has its own distinct schema
each dataframe of
struct. The actual
ribute timestamp:
. In both cases, attributes
and dateOut; the
leted, having thus a new schema on the right-
. In this way, we achieved to reduce the complexity of the original input schema to
are of the datatype struct
represents actually a
nested dictionaries; each one of the latter present a complex structure
e different schemas found in payload are presented in Figure 4.
8. International Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
In Figure 5, we illustrate the new
schemas of the payload attribute in Figure 4. By following this approach, data are easier to be
handled: in the next stages, they will be cleaned, transformed from categorical to numerical and
then they will be further analyzed in order to detect anomalies in entities
The dataframe schema in Figure 6
payload attribute (Figure 5) into its inner
an array. By applying consecutive transformations automatically, we manage to extract all inner
attributes, which simplifies the process of correlating data in the next stage. Thus, by looking into
the raw_sig column, we identify inner v
new features derived by the inner levels, as it is depicted e.g. for column
be further split by leading to two new columns (e.g. with values
process is recursive and automated; special care is given how we name the new columns, in order
to follow the different paths of attributes decomposition.
Figure 5. The new-created dataframes which correspond
onal Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
In Figure 5, we illustrate the new-created dataframes schemas which correspond to the different
schemas of the payload attribute in Figure 4. By following this approach, data are easier to be
handled: in the next stages, they will be cleaned, transformed from categorical to numerical and
rther analyzed in order to detect anomalies in entities behavior.
The dataframe schema in Figure 6 is the second of the dataframes derived by flattening the
attribute (Figure 5) into its inner-level attributes. Here, feature raw_sig is in the form of
an array. By applying consecutive transformations automatically, we manage to extract all inner
attributes, which simplifies the process of correlating data in the next stage. Thus, by looking into
column, we identify inner values separated by ‘:’, which further are decomposed into
new features derived by the inner levels, as it is depicted e.g. for column attsCol5; the latter could
be further split by leading to two new columns (e.g. with values 1024 and 0, respectively), as
process is recursive and automated; special care is given how we name the new columns, in order
to follow the different paths of attributes decomposition.
created dataframes which correspond to the different schemas in payload
onal Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
46
schemas which correspond to the different
schemas of the payload attribute in Figure 4. By following this approach, data are easier to be
handled: in the next stages, they will be cleaned, transformed from categorical to numerical and
flattening the
is in the form of
an array. By applying consecutive transformations automatically, we manage to extract all inner
attributes, which simplifies the process of correlating data in the next stage. Thus, by looking into
, which further are decomposed into
; the latter could
, respectively), as this
process is recursive and automated; special care is given how we name the new columns, in order
payload
9. International Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
Figure 6. Transforming complex fields (ii
5. FEATURE SELECTION
The process of feature selection
3.1, our motivation in our approach is to
reduction of the time needed for applying security analytics in un
ultimately to detect anomalies as strong form
such as, to increase accuracy and
need to select the data that are more related to our questions.
significant role in complex event processing, especially when d
sources and different forms.
We present four methods to achieve this goal:
1. Leave-out single-value attributes
2. Namespace correlation
3. Data correlation using the actual values
4. FS in case of having a relative small number of
5.1. LEAVE-OUT SINGLE-VALUE
The first method is quite simple: all single
dataframe. For example, consider the
datatype Boolean takes the value
drop the relevant column, which leads to a new dataframe schema.
onal Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
. Transforming complex fields (iii) – attributes raw_sig is of the datatype array
The process of feature selection (FS) is crucial for the next analysis steps. As was explained in
3.1, our motivation in our approach is to reduce data complexity in parallel with a significant
reduction of the time needed for applying security analytics in un-labelled data. As we are aiming
ultimately to detect anomalies as strong form of outliers in order to improve quantitative metrics
accuracy and detection rates or to decrease security noise to a minimum
need to select the data that are more related to our questions. Dimensionality reduction can play a
significant role in complex event processing, especially when data are coming from different
We present four methods to achieve this goal:
value attributes
Data correlation using the actual values
FS in case of having a relative small number of categories
ALUE ATTRIBUTES
The first method is quite simple: all single-valued attributes are removed from the original
example, consider the dataframe schema in Figure 7. Attribute normalized
takes the value True for all the events in our integrated log and therefore we
drop the relevant column, which leads to a new dataframe schema.
onal Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
47
array
is crucial for the next analysis steps. As was explained in
reduce data complexity in parallel with a significant
labelled data. As we are aiming
in order to improve quantitative metrics
detection rates or to decrease security noise to a minimum, we
Dimensionality reduction can play a
ata are coming from different
valued attributes are removed from the original
normalized of
for all the events in our integrated log and therefore we
10. International Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
Figure 7. Attribute normalized
5.2. NAMESPACE CORRELATION
It is quite common when data inputs are coming from different sources to deal with entity
attributes which refer to the same piece of information although their names are slightly different.
For example, attributes proto and
communication channel. Different tools used by experts to monitor network traffic do not follow
a unified namespace scheme. This fact, could lead to misinterpretations, information redundancy
and misconfigurations in data modelling, among other obstacles in data exploration stage; all
these refer mainly to problems in interoperability, as can be seen in Figure 8. By solving such
inconsistencies, we achieve to further reduce data complexity as well as to reduce
for data analysis. In [10] we have presented an approach to
utilizing means derived by the theory of categories.
Figure 8. Attributes proto
5.3. USING PEARSON CORRELATION
As long as data inputs, in the for
into their corresponding numerical values, and before the process of forming the actual
vectors that will be used in clustering, by using data correlation, we are able to achieve a further
reduction of the dimensions that will be used for the actual security
The outcome of applying this technique, using
Attributes highly correlated may be omitted while defining the relevant clusters; the choice of the
particular attribute to be left out is strongly related to the actual research of interest. For example,
onal Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
normalized is left out as it presents a constant value of True in all records
ORRELATION
It is quite common when data inputs are coming from different sources to deal with entity
attributes which refer to the same piece of information although their names are slightly different.
and connection protocol refer to the actual protocol used in a
Different tools used by experts to monitor network traffic do not follow
a unified namespace scheme. This fact, could lead to misinterpretations, information redundancy
tions in data modelling, among other obstacles in data exploration stage; all
these refer mainly to problems in interoperability, as can be seen in Figure 8. By solving such
inconsistencies, we achieve to further reduce data complexity as well as to reduce the overa
] we have presented an approach to handle such interoperability issues by
utilizing means derived by the theory of categories.
proto and connection_protocol refer to the same piece of data
ORRELATION TO REDUCE THE NUMBER OF DIMENSIONS
in the form of dataframes, are cleaned, transformed, indexed and scaled
into their corresponding numerical values, and before the process of forming the actual
vectors that will be used in clustering, by using data correlation, we are able to achieve a further
reduction of the dimensions that will be used for the actual security analytics.
The outcome of applying this technique, using Pearson correlation, is presented in Figure 9.
Attributes highly correlated may be omitted while defining the relevant clusters; the choice of the
particular attribute to be left out is strongly related to the actual research of interest. For example,
onal Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
48
in all records
It is quite common when data inputs are coming from different sources to deal with entity
attributes which refer to the same piece of information although their names are slightly different.
refer to the actual protocol used in a
Different tools used by experts to monitor network traffic do not follow
a unified namespace scheme. This fact, could lead to misinterpretations, information redundancy
tions in data modelling, among other obstacles in data exploration stage; all
these refer mainly to problems in interoperability, as can be seen in Figure 8. By solving such
the overall time
handle such interoperability issues by
refer to the same piece of data
IMENSIONS
m of dataframes, are cleaned, transformed, indexed and scaled
into their corresponding numerical values, and before the process of forming the actual feature
vectors that will be used in clustering, by using data correlation, we are able to achieve a further
, is presented in Figure 9.
Attributes highly correlated may be omitted while defining the relevant clusters; the choice of the
particular attribute to be left out is strongly related to the actual research of interest. For example,
11. International Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
we are interested to monitor the
patterns of normal behavior.
Figure 9. Applying Pearson correlation to indexed and scaled data for feature selection
5.4. FEATURE SELECTION IN
CATEGORIES
In case where we deal with categorical attributes presenting a relative small number of categories,
i.e. numberOfCategories<= 4, we propose the following steps in order to achieve a further
feature reduction. We distinguish the case
a security-related event) and the case where some or all the labels are available
mention that in real scenarios,
highly-unbalanced data (i.e. where only few instances of the rare/anomalous class are available).
Working with un-labelled data:
• For the set of these features, select each one of them as the feature
either:
• Use a decision tree with a
number of the dimensions (by following one or more of the aforementioned techniques)
• Create 2n
sub-dataframes with respect to the number of categories
• Calculate features importance using a
• Use an ensemble technique in the form of a
classifier, running a combination of the above techniques to optimize results in the next
levels of the analysis (e.g. to further optimize detection rates)
onal Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
nitor the behavior in local hosts and to detect any anomalies deviate by
Figure 9. Applying Pearson correlation to indexed and scaled data for feature selection
N CASE OF HAVING A RELATIVE SMALL N
In case where we deal with categorical attributes presenting a relative small number of categories,
, we propose the following steps in order to achieve a further
feature reduction. We distinguish the cases where data are un-labelled (lack of any indication f
related event) and the case where some or all the labels are available.
mention that in real scenarios, usually we need to cope with either fully un-labelled data or
nced data (i.e. where only few instances of the rare/anomalous class are available).
For the set of these features, select each one of them as the feature-label attribute and then
Use a decision tree with a multi-class classification evaluator to further reduce the
number of the dimensions (by following one or more of the aforementioned techniques)
dataframes with respect to the number of categories
Calculate features importance using a Random Forest classifier
technique in the form of a Combiner e.g. a neural network
, running a combination of the above techniques to optimize results in the next
levels of the analysis (e.g. to further optimize detection rates)
onal Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
49
in local hosts and to detect any anomalies deviate by
Figure 9. Applying Pearson correlation to indexed and scaled data for feature selection
NUMBER OF
In case where we deal with categorical attributes presenting a relative small number of categories,
, we propose the following steps in order to achieve a further
labelled (lack of any indication for
. We need to
labelled data or
nced data (i.e. where only few instances of the rare/anomalous class are available).
label attribute and then
to further reduce the
number of the dimensions (by following one or more of the aforementioned techniques)
l network or a Bayes
, running a combination of the above techniques to optimize results in the next
12. International Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
50
Working with labelled data:
• Select features using the Chi-Square test of independencies. In our experiments, with
respect to the input data, we have used four different statistical methods, available in
Spark ML library:
• The number of top features
• A fraction of the top features
• p-values below a threshold to control the false positive rate
• p-values with false discovery rate below a threshold
6. CONCLUSIONS
We have presented an approach to handle efficiently the tasks of feature extraction and feature
selection while working with security analytics by utilizing machine learning techniques. It is an
automated solution to handle interoperability problems. It is based on a continuous transformation
of the abstract definitions of the data inputs.
In the case of feature extraction, access to the actual data is limited to a minimum read actions of
the first record of a dataframe and only when it is needed to extract the inner schema of a
dictionary-based attribute. In the case of feature selection, the actual data are accessed only to
find correlations between them, before we apply clustering or any other method for threat
detection.
By following the proposed approach, we managed to achieve our primary objectives: reduce
computational time, reduce data complexity and provide solutions to interoperability issues, while
analyzing vast amount of heterogeneous data from different sources.
The approach can be formalized in the next steps by utilizing novel structures derived from the
theory of categories as it has been presented in [10] towards an overall optimization, in terms of
quantitative and qualitative metrics.
REFERENCES
[1] C. C. Aggarwal and C. K. Reddy, Data clustering: algorithms and applications. CRC press, 2013.
[2] S. Bird, E. Klein, and E. Loper, Natural language processing with Python: analyzing text with the
naturallanguage toolkit. ” O’Reilly Media, Inc.”, 2009.
[3] P. Duessel, C. Gehl, U. Flegel, S. Dietrich, and M. Meier, “Detecting zero-day attacks using context-
aware anomaly detection at the application-layer,” International Journal of Information Security, pp.
1– 16, 2016.
[4] N. Goix, “How to evaluate the quality of unsupervised anomaly detection algorithms?” arXiv preprint
arXiv:1607.01152, 2016.
[5] E. M. Hutchins, M. J. Cloppert, and R. M. Amin, “Intelligence-driven computer network defense in-
formed by analysis of adversary campaigns and intrusion kill chains,” Leading Issues in Information
Warfare & Security Research, vol. 1, no. 1, p. 80, 2011.
[6] V. N. Inukollu, S. Arsi, and S. R. Ravuri, “Security issues associated with big data in cloud
computing,” International Journal of Network Security & Its Applications, vol. 6, no. 3, p. 45, 2014.
[7] S. P. Kasiviswanathan, P. Melville, A. Banerjee, and V. Sindhwani, “Emerging topic detection using
dictionary learning,” in Proceedings of the 20th ACM international conference on Information and
knowledge management, pp. 745–754, ACM, 2011.
[8] M. Lange, F. Kuhr, and R. M ̈oller, “Using a deep understanding of network activities for security
event management,” International Journal of Network Security & Its Applications (IJNSA), 2016.
13. International Journal of Network Security & Its Applications (IJNSA) Vol.9, No.6, November 2017
51
[9] M.-L. Shyu, Z. Huang, and H. Luo, “Efficient mining and detection of sequential intrusion patterns
for network intrusion detection systems,” in Machine Learning in Cyber Trust, pp. 133–154, Springer,
2009.
[10] D. Sisiaridis, V. Kuchta, and O. Markowitch, “A categorical approach in handling event-ordering in
distributed systems,” in Parallel and Distributed Systems (ICPADS), 2016 IEEE 22nd International
Conference on, pp. 1145–1150, IEEE, 2016.
[11] D. Sisiaridis, F. Carcillo, and O. Markowitch, “A framework for threat detection in communication
systems,” in Proceedings of the 20th Pan-Hellenic Conference on Informatics, p. 68, ACM, 2016.
[12] M. Talabis, R. McPherson, I. Miyamoto, and J. Martin, Information Security Analytics: Finding Secu-
rity Insights, Patterns, and Anomalies in Big Data. Syngress, 2014.
[13] K. Veeramachaneni, I. Arnaldo, V. Korrapati, C. Bassias, and K. Li, “Aiˆ 2: training a big data
machine to defend,” in Big Data Security on Cloud (BigDataSecurity), IEEE International Conference
on High Performance and Smart Computing (HPSC), and IEEE International Conference on
Intelligent Data and Security (IDS), 2016 IEEE 2nd International Conference on, pp. 49–54, IEEE,
2016.
[14] A. Zimek, E. Schubert, and H.-P. Kriegel, “A survey on unsupervised outlier detection in high-
dimensional numerical data,” Statistical Analysis and Data Mining: The ASA Data Science Journal,
vol. 5, no. 5, pp. 363–387, 2012.
AUTHORS
Dimitrios Sisiaridis received his BSc in Information Engineering by ATEI of
Thessaloniki. He received his MSc in Computing (advanced databases) and PhD in
applied maths and security by Northumbria University in Newcastle. He is a member of
the Qualsec Research Group, in Université Libre de Bruxelles, working as a researcher
in projects related to big data security analytics
Prof. Olivier Markowitch is associated Professor of the Departement d’ Informatique,
in Université Libre de Bruxelles. He is a member of the QualSec Research Group,
working on the design and analysis of security protocols and digital signature schemes