This document discusses verifying computations in cloud computing. It presents the RunTest approach, which randomly sends data along multiple processing paths and matches intermediate results to build an "attestation graph" showing node agreement. Nodes that are always inconsistent are identified as malicious. The Bron-Kerbosch algorithm finds the largest consistent clique to identify malicious nodes. The approach was evaluated on an IBM System S, detecting different attack patterns and assessing data quality. Issues discussed include the algorithm's complexity and scalability.
T AXONOMY OF O PTIMIZATION A PPROACHES OF R ESOURCE B ROKERS IN D ATA G RIDSijcsit
A novel taxonomy of replica selection techniques is proposed. We studied some data grid approaches
where the selection strategies of data management is different. The aim of the study is to determine the
common concepts and observe their performance and to compare their performance with our strategy
Active Content-Based Crowdsourcing Task SelectionCarsten Eickhoff
Crowdsourcing has long established itself as a viable alternative to corpus annotation by domain experts for tasks such as document relevance assessment. The crowdsourcing process traditionally relies on high degrees of label redundancy in order to mitigate the detrimental effects of individually noisy worker submissions. Such redundancy comes at the cost of increased label volume, and, subsequently, monetary requirements. In practice, especially as the size of datasets increases, this is undesirable.
In this paper, we focus on an alternate method that exploits document information instead, to infer relevance labels for unjudged documents. We present an active learning scheme for document selection that aims at maximising the overall relevance label prediction accuracy, for a given budget of available relevance judgements by exploiting system-wide estimates of label variance and mutual information. Our experiments are based on TREC 2011 Crowdsourcing Track data and show that our method is able to achieve state-of-the-art performance while requiring 17 – 25% less budget.
This paper has been accepted for presentation at the 25th ACM International Conference on Information and Knowledge Management (CIKM).
Analysis of different similarity measures: SimrankAbhishek Mungoli
SimRank exploits object-to-object relationships very well and finds out the similarity between two objects.
We have used it in our project to find similar reasearch papers from DBLP dataset (DBLP Dataset provides a comprehensive list of research papers in computer science domain).
SimRank is a generic approach and its basic idea can also be applied to other domain of interests as well.
PEARC17:A real-time machine learning and visualization framework for scientif...Feng Li
High-performance computing resources are currently widely used in science and engineering areas. Typical post-hoc approaches use persistent storage to save produced data from simulation, thus reading from storage to memory is required for data analysis tasks. For large-scale scientific simulations, such I/O operation will produce significant overhead. In-situ/in-transit approaches bypass I/O by accessing and processing in-memory simulation results directly, which suggests simulations and analysis applications should be more closely coupled. This paper constructs a flexible and extensible framework to connect scientific simulations with multi-steps machine learning processes and in-situ visualization tools, thus providing plugged-in analysis and visualization functionality over complex workflows at real time. A distributed simulation-time clustering method is proposed to detect anomalies from real turbulence flows.
T AXONOMY OF O PTIMIZATION A PPROACHES OF R ESOURCE B ROKERS IN D ATA G RIDSijcsit
A novel taxonomy of replica selection techniques is proposed. We studied some data grid approaches
where the selection strategies of data management is different. The aim of the study is to determine the
common concepts and observe their performance and to compare their performance with our strategy
Active Content-Based Crowdsourcing Task SelectionCarsten Eickhoff
Crowdsourcing has long established itself as a viable alternative to corpus annotation by domain experts for tasks such as document relevance assessment. The crowdsourcing process traditionally relies on high degrees of label redundancy in order to mitigate the detrimental effects of individually noisy worker submissions. Such redundancy comes at the cost of increased label volume, and, subsequently, monetary requirements. In practice, especially as the size of datasets increases, this is undesirable.
In this paper, we focus on an alternate method that exploits document information instead, to infer relevance labels for unjudged documents. We present an active learning scheme for document selection that aims at maximising the overall relevance label prediction accuracy, for a given budget of available relevance judgements by exploiting system-wide estimates of label variance and mutual information. Our experiments are based on TREC 2011 Crowdsourcing Track data and show that our method is able to achieve state-of-the-art performance while requiring 17 – 25% less budget.
This paper has been accepted for presentation at the 25th ACM International Conference on Information and Knowledge Management (CIKM).
Analysis of different similarity measures: SimrankAbhishek Mungoli
SimRank exploits object-to-object relationships very well and finds out the similarity between two objects.
We have used it in our project to find similar reasearch papers from DBLP dataset (DBLP Dataset provides a comprehensive list of research papers in computer science domain).
SimRank is a generic approach and its basic idea can also be applied to other domain of interests as well.
PEARC17:A real-time machine learning and visualization framework for scientif...Feng Li
High-performance computing resources are currently widely used in science and engineering areas. Typical post-hoc approaches use persistent storage to save produced data from simulation, thus reading from storage to memory is required for data analysis tasks. For large-scale scientific simulations, such I/O operation will produce significant overhead. In-situ/in-transit approaches bypass I/O by accessing and processing in-memory simulation results directly, which suggests simulations and analysis applications should be more closely coupled. This paper constructs a flexible and extensible framework to connect scientific simulations with multi-steps machine learning processes and in-situ visualization tools, thus providing plugged-in analysis and visualization functionality over complex workflows at real time. A distributed simulation-time clustering method is proposed to detect anomalies from real turbulence flows.
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...IEEEMEMTECHSTUDENTPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization of high-volume time series data have difficulty to cope with the hard latency requirements of interactive visualizations and dissipate a lot expensive network bandwidth. Current solutions for lowering the volume of time series data disregard the properties of the resulting visualization and achieve only poor visualization quality.
In this work, we introduce M4, a simple aggregation-based time series dimensionality reduction technique that is superior to existing approaches, in that it provides lower visualization errors at higher data reduction ratios. Focusing on the semantic of line charts, as the predominant form of time-series visualization, we explain in detail, why current data reduction techniques fail and how our approach achieves superiority by respecting the process of line rasterization. We describe how to incorporate the proposed aggregation model already at the query-level in a visualization-driven query-
rewriting system. Our approach is generic and applicable to any visualization system that relies on relational data sources. Using real world data sets from high tech manufacturing, stock markets, and engineering domains we demonstrate that our visualization-oriented data aggregation can reduce data volumes by up to two orders of magnitude, while preserving perfect visualizations.
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)Konstantinos Zagoris
H-KWS 2014 is the Handwritten Keyword Spotting Competition organized in conjunction with ICFHR 2014 conference. The main objective of the competition is to record current advances in keyword spotting algorithms using established performance evaluation measures frequently encountered in the information retrieval literature. The competition comprises two distinct tracks, namely, a segmentation-based and a segmentation- free track. Five (5) distinct research groups have participated in the competition with three (3) methods for the segmentation- based track and four (4) methods for the segmentation-free track. The benchmarking datasets that were used in the contest contain both historical and modern documents from multiple writers. In this paper, the contest details are reported including the evaluation measures and the performance of the submitted methods along with a short description of each method.
Indexing data on the web a comparison of schema level indices for data searchTill Blume
Indexing the Web of Data offers many opportunities, in particular, to find and explore data sources. One major design decision when indexing the Web of Data is to find a suitable index model, i.e., how to index and summarize data. Various efforts have been conducted to develop specific index models for a given task. With each index model designed, implemented, and evaluated independently, it remains difficult to judge whether an approach generalizes well to another task, set of queries, or dataset. In this work, we empirically evaluated six representative index models independent of their original application for a data search scenario.
Dynamic Two-Stage Image Retrieval from Large Multimodal DatabasesKonstantinos Zagoris
Content-based image retrieval (CBIR) with global features is notoriously noisy, especially for image queries with low percentages of relevant images in a collection. Moreover, CBIR typically ranks the whole collection, which is inefficient for large databases. We experiment with a method for image retrieval from multimodal databases, which improves both the effectiveness and efficiency of traditional CBIR by exploring secondary modalities. We perform retrieval in a two-stage fashion: first rank by a secondary modality, and then perform CBIR only on the top-K items. Thus, effectiveness is improved by performing CBIR on a ‘better’ subset. Using a relatively ‘cheap’ first stage, efficiency is also improved via the fewer CBIR operations performed. Our main novelty is that K is dynamic, i.e. estimated per query to optimize a predefined effectiveness measure. We show that such dynamic two-stage setups can be significantly more effective and robust than similar setups with static thresholds previously proposed.
COLOCATION MINING IN UNCERTAIN DATA SETS: A PROBABILISTIC APPROACHIJCI JOURNAL
In this paper we investigate colocation mining problem in the context of uncertain data. Uncertain data is a
partially complete data. Many of the real world data is Uncertain, for example, Demographic data, Sensor
networks data, GIS data etc.,. Handling such data is a challenge for knowledge discovery particularly in
colocation mining. One straightforward method is to find the Probabilistic Prevalent colocations (PPCs).
This method tries to find all colocations that are to be generated from a random world. For this we first
apply an approximation error to find all the PPCs which reduce the computations. Next find all the
possible worlds and split them into two different worlds and compute the prevalence probability. These
worlds are used to compare with a minimum probability threshold to decide whether it is Probabilistic
Prevalent colocation (PPCs) or not. The experimental results on the selected data set show the significant
improvement in computational time in comparison to some of the existing methods used in colocation
mining.
An Improved Differential Evolution Algorithm for Data Stream ClusteringIJECEIAES
A Few algorithms were actualized by the analysts for performing clustering of data streams. Most of these algorithms require that the number of clusters (K) has to be fixed by the customer based on input data and it can be kept settled all through the clustering process. Stream clustering has faced few difficulties in picking up K. In this paper, we propose an efficient approach for data stream clustering by embracing an Improved Differential Evolution (IDE) algorithm. The IDE algorithm is one of the quick, powerful and productive global optimization approach for programmed clustering. In our proposed approach, we additionally apply an entropy based method for distinguishing the concept drift in the data stream and in this way updating the clustering procedure online. We demonstrated that our proposed method is contrasted with Genetic Algorithm and identified as proficient optimization algorithm. The performance of our proposed technique is assessed and cr eates the accuracy of 92.29%, the precision is 86.96%, recall is 90.30% and F-measure estimate is 88.60%.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Text extraction using document structure features and support vector machinesKonstantinos Zagoris
In order to successfully locate and retrieve document images such as technical articles and newspapers, a text localization technique must be employed. The proposed method detects and extracts homogeneous text areas in document images indifferent to font types and size by using connected components analysis to detect blocks of foreground objects. Next, a descriptor that consists of a set of structural features is extracted from the merged blocks and used as input to a trained Support Vector Machines (SVM). Finally, the output of the SVM classifies the block as text or not.
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...Punit Sharnagat
OSMnx is a Python package to retrieve, model, analyze, and visualize street networks from OpenStreetMap.
OpenStreetMap (OSM) is a collaborative mapping project that provides a free and publicly editable map of the world.
OpenStreetMap provides a valuable crowd-sourced database of raw geospatial data for constructing models of urban street networks for scientific analysis
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
K-means Clustering Method for the Analysis of Log Dataidescitation
Clustering analysis method is one of the main
analytical methods in data mining; the method of clustering
algorithm will influence the clustering results directly. This
paper discusses the standard k-means clustering algorithm
and analyzes the shortcomings of standard k-means
algorithm. This paper also focuses on web usage mining to
analyze the data for pattern recognition. With the help of k-
means algorithm, pattern is identified.
SOAP is a simple and flexible messaging framework for transferring information specified in the form of an XML infoset between an initial SOAP sender and ultimate SOAP receiver.
SOAP IS:
Lightweight communication protocol
For communication between applicationsone-way, request/response, multicast, etc..
Designed to communicate via HTTP
Not tied to any component technology
Not tied to any programming language
Based on XML
Simple and extensible
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...IEEEMEMTECHSTUDENTPROJECTS
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Visual analysis of high-volume time series data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization of high-volume time series data have difficulty to cope with the hard latency requirements of interactive visualizations and dissipate a lot expensive network bandwidth. Current solutions for lowering the volume of time series data disregard the properties of the resulting visualization and achieve only poor visualization quality.
In this work, we introduce M4, a simple aggregation-based time series dimensionality reduction technique that is superior to existing approaches, in that it provides lower visualization errors at higher data reduction ratios. Focusing on the semantic of line charts, as the predominant form of time-series visualization, we explain in detail, why current data reduction techniques fail and how our approach achieves superiority by respecting the process of line rasterization. We describe how to incorporate the proposed aggregation model already at the query-level in a visualization-driven query-
rewriting system. Our approach is generic and applicable to any visualization system that relies on relational data sources. Using real world data sets from high tech manufacturing, stock markets, and engineering domains we demonstrate that our visualization-oriented data aggregation can reduce data volumes by up to two orders of magnitude, while preserving perfect visualizations.
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)Konstantinos Zagoris
H-KWS 2014 is the Handwritten Keyword Spotting Competition organized in conjunction with ICFHR 2014 conference. The main objective of the competition is to record current advances in keyword spotting algorithms using established performance evaluation measures frequently encountered in the information retrieval literature. The competition comprises two distinct tracks, namely, a segmentation-based and a segmentation- free track. Five (5) distinct research groups have participated in the competition with three (3) methods for the segmentation- based track and four (4) methods for the segmentation-free track. The benchmarking datasets that were used in the contest contain both historical and modern documents from multiple writers. In this paper, the contest details are reported including the evaluation measures and the performance of the submitted methods along with a short description of each method.
Indexing data on the web a comparison of schema level indices for data searchTill Blume
Indexing the Web of Data offers many opportunities, in particular, to find and explore data sources. One major design decision when indexing the Web of Data is to find a suitable index model, i.e., how to index and summarize data. Various efforts have been conducted to develop specific index models for a given task. With each index model designed, implemented, and evaluated independently, it remains difficult to judge whether an approach generalizes well to another task, set of queries, or dataset. In this work, we empirically evaluated six representative index models independent of their original application for a data search scenario.
Dynamic Two-Stage Image Retrieval from Large Multimodal DatabasesKonstantinos Zagoris
Content-based image retrieval (CBIR) with global features is notoriously noisy, especially for image queries with low percentages of relevant images in a collection. Moreover, CBIR typically ranks the whole collection, which is inefficient for large databases. We experiment with a method for image retrieval from multimodal databases, which improves both the effectiveness and efficiency of traditional CBIR by exploring secondary modalities. We perform retrieval in a two-stage fashion: first rank by a secondary modality, and then perform CBIR only on the top-K items. Thus, effectiveness is improved by performing CBIR on a ‘better’ subset. Using a relatively ‘cheap’ first stage, efficiency is also improved via the fewer CBIR operations performed. Our main novelty is that K is dynamic, i.e. estimated per query to optimize a predefined effectiveness measure. We show that such dynamic two-stage setups can be significantly more effective and robust than similar setups with static thresholds previously proposed.
COLOCATION MINING IN UNCERTAIN DATA SETS: A PROBABILISTIC APPROACHIJCI JOURNAL
In this paper we investigate colocation mining problem in the context of uncertain data. Uncertain data is a
partially complete data. Many of the real world data is Uncertain, for example, Demographic data, Sensor
networks data, GIS data etc.,. Handling such data is a challenge for knowledge discovery particularly in
colocation mining. One straightforward method is to find the Probabilistic Prevalent colocations (PPCs).
This method tries to find all colocations that are to be generated from a random world. For this we first
apply an approximation error to find all the PPCs which reduce the computations. Next find all the
possible worlds and split them into two different worlds and compute the prevalence probability. These
worlds are used to compare with a minimum probability threshold to decide whether it is Probabilistic
Prevalent colocation (PPCs) or not. The experimental results on the selected data set show the significant
improvement in computational time in comparison to some of the existing methods used in colocation
mining.
An Improved Differential Evolution Algorithm for Data Stream ClusteringIJECEIAES
A Few algorithms were actualized by the analysts for performing clustering of data streams. Most of these algorithms require that the number of clusters (K) has to be fixed by the customer based on input data and it can be kept settled all through the clustering process. Stream clustering has faced few difficulties in picking up K. In this paper, we propose an efficient approach for data stream clustering by embracing an Improved Differential Evolution (IDE) algorithm. The IDE algorithm is one of the quick, powerful and productive global optimization approach for programmed clustering. In our proposed approach, we additionally apply an entropy based method for distinguishing the concept drift in the data stream and in this way updating the clustering procedure online. We demonstrated that our proposed method is contrasted with Genetic Algorithm and identified as proficient optimization algorithm. The performance of our proposed technique is assessed and cr eates the accuracy of 92.29%, the precision is 86.96%, recall is 90.30% and F-measure estimate is 88.60%.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Text extraction using document structure features and support vector machinesKonstantinos Zagoris
In order to successfully locate and retrieve document images such as technical articles and newspapers, a text localization technique must be employed. The proposed method detects and extracts homogeneous text areas in document images indifferent to font types and size by using connected components analysis to detect blocks of foreground objects. Next, a descriptor that consists of a set of structural features is extracted from the merged blocks and used as input to a trained Support Vector Machines (SVM). Finally, the output of the SVM classifies the block as text or not.
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...Punit Sharnagat
OSMnx is a Python package to retrieve, model, analyze, and visualize street networks from OpenStreetMap.
OpenStreetMap (OSM) is a collaborative mapping project that provides a free and publicly editable map of the world.
OpenStreetMap provides a valuable crowd-sourced database of raw geospatial data for constructing models of urban street networks for scientific analysis
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
K-means Clustering Method for the Analysis of Log Dataidescitation
Clustering analysis method is one of the main
analytical methods in data mining; the method of clustering
algorithm will influence the clustering results directly. This
paper discusses the standard k-means clustering algorithm
and analyzes the shortcomings of standard k-means
algorithm. This paper also focuses on web usage mining to
analyze the data for pattern recognition. With the help of k-
means algorithm, pattern is identified.
SOAP is a simple and flexible messaging framework for transferring information specified in the form of an XML infoset between an initial SOAP sender and ultimate SOAP receiver.
SOAP IS:
Lightweight communication protocol
For communication between applicationsone-way, request/response, multicast, etc..
Designed to communicate via HTTP
Not tied to any component technology
Not tied to any programming language
Based on XML
Simple and extensible
The slides provide a major overview on SOAP protocol, and demonstrates a working example that uses SOAP for RPC. It uses WCF/visual studio and Apache Axis for the implementation.
Overview of web services, SOAP, WSDL and UDDI.
A web service provides a defined set of functionality on a machine-processable interface.
The web service interface is described in a formal language like WSDL that allows creating code to access the service thus simplifying web service consumer (client) and provider (server) development.
In big web services, the interface is typically described in WSDL while the access to the service makes use of the SOAP message protocol.
SOAP has its roots in remote object access but is now a general message based and asynchronous transport mechanism.
SOAP is typically carried in HTTP (HyperText Transmission Protocol), but other message based protocols like SMTP (Email) or plain TCP could be used as well.
WSDL provides a formalized description of an interface that is coarsely separated in an abstract service interface definition containing operations and data types, a transport binding that describes how the web service is accessed and finally a description of the location (address) under which a web service is accessible.
UDDI (Universal Description and Discovery Protocol) was meant to become the standard protocol for some kind of a public yellow pages where publicly accessible web services would be listed. Lack of industry interest, however, prevented UDDI to gain widespread use.
Performance evaluation of botnet detection using machine learning techniquesIJECEIAES
Cybersecurity is seriously threatened by Botnets, which are controlled networks of compromised computers. The evolving techniques used by botnet operators make it difficult for traditional methods of botnet identification to stay up. Machine learning has become increasingly effective in recent years as a means of identifying and reducing these hazards. The CTU-13 dataset, a frequently used dataset in the field of cybersecurity, is used in this study to offer a machine learning-based method for botnet detection. The suggested methodology makes use of the CTU-13, which is made up of actual network traffic data that was recorded in a network environment that had been attacked by a botnet. The dataset is used to train a variety of machine learning algorithms to categorize network traffic as botnet-related/benign, including decision tree, regression model, naïve Bayes, and neural network model. We employ a number of criteria, such as accuracy, precision, and sensitivity, to measure how well each model performs in categorizing both known and unidentified botnet traffic patterns. Results from experiments show how well the machine learning based approach detects botnet with accuracy. It is potential for use in actual world is demonstrated by the suggested system’s high detection rates and low false positive rates.
An Examination of the Bloom Filter and its Application in Preventing Weak Pas...Editor IJCATR
Choosing weak passwords is a common issue when a system uses username and password as credentials for
authentication. In fact, weak passwords may lead to system compromising. Lots of approaches have been proposed to
prevent user from selecting weak or guessable passwords. The common approach is to compare a selected password
against a list of unacceptable passwords. In this paper we will explain space-efficient method, which is called Bloom
Filter, of storing a dictionary passwords. The time complexity in this approach is constant time, no matter how many
passwords we have in our dictionary.
Development of 3D convolutional neural network to recognize human activities ...journalBEEI
Human activity recognition (HAR) is recently used in numerous applications including smart homes to monitor human behavior, automate homes according to human activities, entertainment, falling detection, violence detection, and people care. Vision-based recognition is the most powerful method widely used in HAR systems implementation due to its characteristics in recognizing complex human activities. This paper addresses the design of a 3D convolutional neural network (3D-CNN) model that can be used in smart homes to identify several numbers of activities. The model is trained using KTH dataset that contains activities like (walking, running, jogging, handwaving handclapping, boxing). Despite the challenges of this method due to the effectiveness of the lamination, background variation, and human body variety, the proposed model reached an accuracy of 93.33%. The model was implemented, trained and tested using moderate computation machine and the results show that the proposal was successfully capable to recognize human activities with reasonable computations.
A database application differs form regular applications in that some of its inputs may be database queries. The program will execute the queries on a database and may use any result values in its subsequent program logic. This means that a user-supplied query may determine the values that the application will use in subsequent branching conditions. At the same time, a new database application is often required to work well on a body of existing data stored in some large database. For systematic testing of database applications, recent techniques replace the existing database with carefully crafted mock databases. Mock databases return values that will trigger as many execution paths in the application as possible and thereby maximize overall code coverage of the database application.
In this paper we offer an alternative approach to database application testing. Our goal is to support software engineers in focusing testing on the existing body of data the application is required to work well on. For that, we propose to side-step mock database generation and instead generate queries for the existing database. Our key insight is that we can use the information collected during previous program executions to systematically generate new queries that will maximize the coverage of the application under test, while guaranteeing that the generated test cases focus on the existing data.
Wireless data broadcast is an efficient way of disseminating data to users in the mobile computing environments. From the server’s point of view, how to place the data items on channels is a crucial issue, with the objective of minimizing the average access time and tuning time. Similarly, how to schedule the data retrieval process for a given request at the client side such that all the requested items can be downloaded in a short time is also an important problem. In this paper, we investigate the multi-item data retrieval scheduling in the push-based multichannel broadcast environments. The most important issues in mobile computing are energy efficiency and query response efficiency. However, in data broadcast the objectives of reducing access latency and energy cost can be contradictive to each other. Consequently, we define a new problem named Minimum Cost Data Retrieval Problem (MCDR) and Large Number Data Retrieval (LNDR) Problem. We also develop a heuristic algorithm to download a large number of items efficiently. When there is no replicated item in a broadcast cycle, we show that an optimal retrieval schedule can be obtained in polynomial time
Scalable Rough C-Means clustering using Firefly algorithm..................................................................1
Abhilash Namdev and B.K. Tripathy
Significance of Embedded Systems to IoT................................................................................................. 15
P. R. S. M. Lakshmi, P. Lakshmi Narayanamma and K. Santhi Sri
Cognitive Abilities, Information Literacy Knowledge and Retrieval Skills of Undergraduates: A
Comparison of Public and Private Universities in Nigeria ........................................................................ 24
Janet O. Adekannbi and Testimony Morenike Oluwayinka
Risk Assessment in Constructing Horseshoe Vault Tunnels using Fuzzy Technique................................ 48
Erfan Shafaghat and Mostafa Yousefi Rad
Evaluating the Adoption of Deductive Database Technology in Augmenting Criminal Intelligence in
Zimbabwe: Case of Zimbabwe Republic Police......................................................................................... 68
Mahlangu Gilbert, Furusa Samuel Simbarashe, Chikonye Musafare and Mugoniwa Beauty
Analysis of Petrol Pumps Reachability in Anand District of Gujarat ....................................................... 77
Nidhi Arora
Real time intrusion detection in network traffic using adaptive and auto-scal...Gobinath Loganathan
Oral presentation of Real-time Intrusion Detection in Network Traffic Using Adaptive and Auto-scaling Stream Processor
at IEEE Global Communications Conference (Globecom 2018).
Abstract:
Advanced intrusion detection systems are beginning to utilize the power and flexibility offered by Complex Event Processing (CEP) engines. Adapting to new attacks and optimizing CEP rules are two challenges in this domain. Optimizing CEP rules requires a complete framework which can be ported to stream processors because a CEP rule cannot run without a stream processor. External dependencies of stream processors make CEP rule a black box which is hard to optimize. In this paper, we present a novel adaptive and functionally auto-scaling stream processor: "Wisdom" with a built-in hybrid optimizer developed using Particle Swarm Optimization, and Bisection algorithms to optimize CEP rule parameters. We show that an adaptive "Wisdom" rule tuned by the proposed optimization algorithm is able to detect selected attacks in CICIDS 2017 dataset with an average precision of 99.98% and an average recall of 93.42% while processing over 2.5 million events per second. The proposed distributed functionally auto-scaling deployment mode consumes significantly fewer system resources than the monolithic deployment of CEP rules.
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Eswar Publications
Recently machine learning has been introduced into the area of adaptive video streaming. This paper explores a novel taxonomy that includes six state of the art techniques of machine learning that have been applied to Dynamic Adaptive Streaming over HTTP (DASH): (1) Q-learning, (2) Reinforcement learning, (3) Regression, (4) Classification, (5) Decision Tree learning, and (6) Neural networks.
Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...IJCNCJournal
Intrusion detection poses a significant challenge within expansive and persistently interconnected environments. As malicious code continues to advance and sophisticated attack methodologies proliferate, various advanced deep learning-based detection approaches have been proposed. Nevertheless, the complexity and accuracy of intrusion detection models still need further enhancement to render them more adaptable to diverse system categories, particularly within resource-constrained devices, such as those embedded in edge computing systems. This research introduces a three-stage training paradigm, augmented by an enhanced pruning methodology and model compression techniques. The objective is to elevate the system's effectiveness, concurrently maintaining a high level of accuracy for intrusion detection. Empirical assessments conducted on the UNSW-NB15 dataset evince that this solution notably reduces the model's dimensions, while upholding accuracy levels equivalent to similar proposals.
Effective Multi-Stage Training Model for Edge Computing Devices in Intrusion ...IJCNCJournal
Intrusion detection poses a significant challenge within expansive and persistently interconnected environments. As malicious code continues to advance and sophisticated attack methodologies proliferate, various advanced deep learning-based detection approaches have been proposed. Nevertheless, the complexity and accuracy of intrusion detection models still need further enhancement to render them more adaptable to diverse system categories, particularly within resource-constrained devices, such as those embedded in edge computing systems. This research introduces a three-stage training paradigm, augmented by an enhanced pruning methodology and model compression techniques. The objective is to elevate the system's effectiveness, concurrently maintaining a high level of accuracy for intrusion detection. Empirical assessments conducted on the UNSW-NB15 dataset evince that this solution notably reduces the model's dimensions, while upholding accuracy levels equivalent to similar proposals.
Lecture01: Introduction to Security and Privacy in Cloud Computing
600.412.Lecture06
1. Security and Privacy in Cloud Computing Ragib HasanJohns Hopkins Universityen.600.412 Spring 2010 Lecture 6 03/22/2010
2. Verifying Computations in a Cloud 3/22/2010 Scenario User sends her data processing job to the cloud. Clouds provide dataflow operation as a service (e.g., MapReduce, Hadoop etc.) Problem: Users have no way of evaluating the correctness of results en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan 2
3. DataFlow Operations 3/22/2010 en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan 3 Properties High performance, in-memory data processing Each node performs a particular function Nodes are mostly independent of each other Examples MapReduce, Hadoop, System S, Dryad
6. To evaluate the quality of output dataDu et al., RunTest: Assuring Integrity of Dataflow Processing in Cloud Computing Infrastructures, AsiaCCS 2010
7. Possible Approaches Re-do the computation Check memory footprint of code execution Majority voting Hardware-based attestation Run-time attestation 3/22/2010 en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan 5
8. RunTest: Randomized Data Attestation Idea For some data inputs, send it along multiple dataflow paths Record and match all intermediate results from the matching nodes in the paths Build an attestation graph using node agreement Over time, the graph shows which node misbehave (always or time-to-time) 3/22/2010 en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan 6
9. Attack Model Data model: Input deterministic DataFlow (i.e., same input to a function will always produce the same output) Data processing is stateless (e.g., selection, filtering) Attacker: Malicious or compromised cloud nodes Can produce bad results always or some time Can collude with other malicious nodes to provide same bad result 3/22/2010 en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan 7
10. Attack Model (scenarios) Parameters b_i = probability of providing bad result c_i = probability of providing the same bad result as another malicious node Attack scenarios NCAM: b_i = 1, c_i = 0 NCPM: 0 < b_i <1, c_i = 0 FTFC: b_i = 1, c_i = 1 PTFC: 0< b_i < 1, c_i = 0 PTPC: 0< b_i < 1, 0 < c_i < 1 3/22/2010 en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan 8
11. Integrity Attestation Graph Definition: Vertices: Nodes in the DataFlow paths Edges: Consistency relationships. Edge weight: fraction of consistent output of all outputs generated from same data items 3/22/2010 en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan 9
12. Consistency Clique Complete subgraph of an attestation graph which has 2 or more nodes All nodes always agree with each other (i.e., all edge weights are 1) 3/22/2010 en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan 10 2 1 3 5 4
13. How to find malicious nodes Intuitions Honest nodes will always agree with each other to produce the same outputs, given the same data Number of malicious nodes is less than half of all nodes 3/22/2010 en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan 11
14. Finding Consistency Cliques: BK Algorithm Goal: find the maximal clique in the attestation graph Technique: Apply Bron-Kerbosch algorithm to find the maximal clique(s) (see better example at Wikipedia) Any node not in a maximal clique of size k/2 is a malicious node 3/22/2010 en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan 12 Note: BK algorithm is NP-Hard Authors proposed 2 optimizations to make it run quicker
15. Identifying attack patterns 3/22/2010 en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan 13 NCAM PTFC/NCPM PTFC FTFC
16. Inferring data quality Quality = 1 – (c/n) where n = total number of unique data items c = total number of duplicated data with inconsistent results 3/22/2010 en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan 14
17. Evaluation Extended IBM System S Experiments: Detection rate Sensitivity to parameters Comparison with majority voting 3/22/2010 en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan 15
18. Evaluation 3/22/2010 en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan 16 NCPM (b=0.2, c=0) Different misbehavior probabilities
19. Discussion Threat model High cost of Bron-Kerbosch algorithm (O(3n/3)) Results are for building attestation graphs per function Scalability Experimental evaluation 3/22/2010 en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan 17
20. 3/22/2010 18 en.600.412 Spring 2010 Lecture 6 | JHU | Ragib Hasan Further Reading TPM Reset Attack http://www.cs.dartmouth.edu/~pkilab/sparks/ Halderman et al., Lest We Remember: Cold Boot Attacks on Encryption Keys, USENIX Security 08, http://citp.princeton.edu/memory/