In this paper are discussed some results related to an industrial project oriented on the integration of data
mining tools into Enterprise Service Bus (ESB) platform. WSO2 ESB has been implemented for data
transaction and to interface a client web service connected to a KNIME workflow behaving as a flexible
data mining engine. In order to validate the implementation two test have been performed: the first one is
related to the data management of two relational database management system (RDBMS) merged into one
database whose data have been processed by KNIME dashboard statistical tool thus proving the data
transfer of the prototype system; the second one is related to a simulation of two sensor data belonging to
two distinct production lines connected to the same ESB. Specifically in the second example has been
developed a practical case by processing by a Multilayered Perceptron (MLP) neural networks the
temperatures of two milk production lines and by providing information about predictive maintenance. The
platform prototype system is suitable for data automatism and Internet of Thing (IoT) related to Industry
4.0, and it is suitable for innovative hybrid system embedding different hardware and software technologies
integrated with ESB, data mining engine and client web-services.
The document discusses big data analytics workflows. It defines big data and describes challenges in big data analysis including volume, velocity, variety, and veracity of data. It then discusses cloud computing models and analytic workflow management systems. The key points are: (1) Big data analytics requires scalable hardware and software tools to handle different data types and applications. (2) Analytic workflows consist of sequences of data integration and analysis tasks. (3) Workflow management systems are needed to efficiently define and execute analytic workflows in scientific applications and on distributed cloud resources.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Service oriented cloud architecture for improved performance of smart grid ap...eSAT Journals
Abstract An effective and flexible computational platform is needed for the data coordination and processing associated with real time operational and application services in smart grid. A server environment where multiple applications are hosted by a common pool of virtualized server resources demands an open source structure for ensuring operational flexibility. In this paper, open source architecture is proposed for real time services which involve data coordination and processing. The architecture enables secure and reliable exchange of information and transactions with users over the internet to support various services. Prioritizing the applications based on complexity enhances efficiency of resource allocation in such situations. A priority based scheduling algorithm is proposed in the work for application level performance management in the structure. Analytical model based on queuing theory is developed for evaluating the performance of the test bed. The implementation is done using open stack cloud and the test results show a significant gain of 8% with the algorithm. Index Terms: Service Oriented Architecture, Smart grid, Mean response time, Open stack, Queuing model
Fast Range Aggregate Queries for Big Data AnalysisIRJET Journal
The document proposes a fast range aggregate query (Fast RAQ) method to efficiently analyze large banking transaction datasets for the purpose of identifying tax violators. It divides data into partitions and generates local estimates for each partition. When a query is received, results are obtained by aggregating the local estimates from all partitions. The method is tested on banking transaction data from multiple banks partitioned and stored in Hadoop. It aims to track transactions across banks for a user using their unique ID to find individuals depositing over 50,000 rupees annually in 3 or more banks. The Fast RAQ method provides accurate results for large datasets more efficiently than existing approaches.
Redefining Smart Grid Architectural Thinking Using Stream ComputingCognizant
Using stream computing, power utilities can capture and analyze data generated by smart meters to achieve new thresholds of performance, while building better consumer relationships.
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...ijcsit
3D reconstruction is a technique used in computer vision which has a wide range of applications in
areas like object recognition, city modelling, virtual reality, physical simulations, video games and
special effects. Previously, to perform a 3D reconstruction, specialized hardwares were required.
Such systems were often very expensive and was only available for industrial or research purpose.
With the rise of the availability of high-quality low cost 3D sensors, it is now possible to design
inexpensive complete 3D scanning systems. The objective of this work was to design an acquisition and
processing system that can perform 3D scanning and reconstruction of objects seamlessly. In addition,
the goal of this work also included making the 3D scanning process fully automated by building and
integrating a turntable alongside the software. This means the user can perform a full 3D scan only by
a press of a few buttons from our dedicated graphical user interface. Three main steps were followed
to go from acquisition of point clouds to the finished reconstructed 3D model. First, our system
acquires point cloud data of a person/object using inexpensive camera sensor. Second, align and
convert the acquired point cloud data into a watertight mesh of good quality. Third, export the
reconstructed model to a 3D printer to obtain a proper 3D print of the model.
FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACH...IJNSA Journal
Feature extraction and feature selection are the first tasks in pre-processing of input logs in order to detect cyber security threats and attacks while utilizing machine learning. When it comes to the analysis of heterogeneous data derived from different sources, these tasks are found to be time-consuming and difficult to be managed efficiently. In this paper, we present an approach for handling feature extraction and feature selection for security analytics of heterogeneous data derived from different network sensors. The approach is implemented in Apache Spark, using its python API, named pyspark.
This document describes a proposed scalable, cloud-based data aggregation and analysis platform for Internet of Things applications. The system would allow for collecting, storing, and analyzing large amounts of sensor data from various sources in a centralized, cloud-based manner. It discusses existing solutions and their limitations, as well as potential technologies that could be used, including Node.js, JavaScript, MongoDB, cloud computing, and reactive programming. The proposed solution involves collecting sensor data via decentralized sensor stations, transmitting it to centralized servers in the cloud for storage and analysis, and making it available via a web interface.
The document discusses big data analytics workflows. It defines big data and describes challenges in big data analysis including volume, velocity, variety, and veracity of data. It then discusses cloud computing models and analytic workflow management systems. The key points are: (1) Big data analytics requires scalable hardware and software tools to handle different data types and applications. (2) Analytic workflows consist of sequences of data integration and analysis tasks. (3) Workflow management systems are needed to efficiently define and execute analytic workflows in scientific applications and on distributed cloud resources.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Service oriented cloud architecture for improved performance of smart grid ap...eSAT Journals
Abstract An effective and flexible computational platform is needed for the data coordination and processing associated with real time operational and application services in smart grid. A server environment where multiple applications are hosted by a common pool of virtualized server resources demands an open source structure for ensuring operational flexibility. In this paper, open source architecture is proposed for real time services which involve data coordination and processing. The architecture enables secure and reliable exchange of information and transactions with users over the internet to support various services. Prioritizing the applications based on complexity enhances efficiency of resource allocation in such situations. A priority based scheduling algorithm is proposed in the work for application level performance management in the structure. Analytical model based on queuing theory is developed for evaluating the performance of the test bed. The implementation is done using open stack cloud and the test results show a significant gain of 8% with the algorithm. Index Terms: Service Oriented Architecture, Smart grid, Mean response time, Open stack, Queuing model
Fast Range Aggregate Queries for Big Data AnalysisIRJET Journal
The document proposes a fast range aggregate query (Fast RAQ) method to efficiently analyze large banking transaction datasets for the purpose of identifying tax violators. It divides data into partitions and generates local estimates for each partition. When a query is received, results are obtained by aggregating the local estimates from all partitions. The method is tested on banking transaction data from multiple banks partitioned and stored in Hadoop. It aims to track transactions across banks for a user using their unique ID to find individuals depositing over 50,000 rupees annually in 3 or more banks. The Fast RAQ method provides accurate results for large datasets more efficiently than existing approaches.
Redefining Smart Grid Architectural Thinking Using Stream ComputingCognizant
Using stream computing, power utilities can capture and analyze data generated by smart meters to achieve new thresholds of performance, while building better consumer relationships.
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...ijcsit
3D reconstruction is a technique used in computer vision which has a wide range of applications in
areas like object recognition, city modelling, virtual reality, physical simulations, video games and
special effects. Previously, to perform a 3D reconstruction, specialized hardwares were required.
Such systems were often very expensive and was only available for industrial or research purpose.
With the rise of the availability of high-quality low cost 3D sensors, it is now possible to design
inexpensive complete 3D scanning systems. The objective of this work was to design an acquisition and
processing system that can perform 3D scanning and reconstruction of objects seamlessly. In addition,
the goal of this work also included making the 3D scanning process fully automated by building and
integrating a turntable alongside the software. This means the user can perform a full 3D scan only by
a press of a few buttons from our dedicated graphical user interface. Three main steps were followed
to go from acquisition of point clouds to the finished reconstructed 3D model. First, our system
acquires point cloud data of a person/object using inexpensive camera sensor. Second, align and
convert the acquired point cloud data into a watertight mesh of good quality. Third, export the
reconstructed model to a 3D printer to obtain a proper 3D print of the model.
FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACH...IJNSA Journal
Feature extraction and feature selection are the first tasks in pre-processing of input logs in order to detect cyber security threats and attacks while utilizing machine learning. When it comes to the analysis of heterogeneous data derived from different sources, these tasks are found to be time-consuming and difficult to be managed efficiently. In this paper, we present an approach for handling feature extraction and feature selection for security analytics of heterogeneous data derived from different network sensors. The approach is implemented in Apache Spark, using its python API, named pyspark.
This document describes a proposed scalable, cloud-based data aggregation and analysis platform for Internet of Things applications. The system would allow for collecting, storing, and analyzing large amounts of sensor data from various sources in a centralized, cloud-based manner. It discusses existing solutions and their limitations, as well as potential technologies that could be used, including Node.js, JavaScript, MongoDB, cloud computing, and reactive programming. The proposed solution involves collecting sensor data via decentralized sensor stations, transmitting it to centralized servers in the cloud for storage and analysis, and making it available via a web interface.
A CLOUD BASED ARCHITECTURE FOR WORKING ON BIG DATA WITH WORKFLOW MANAGEMENTIJwest
In real environment there is a collection of many noisy and vague data, called Big Data. On the other hand,
to work on the data middleware have been developed and is now very widely used. The challenge of
working on Big Data is its processing and management. Here, integrated management system is required
to provide a solution for integrating data from multiple sensors and maximize the target success. This is in
situation that the system has constant time constrains for processing, and real-time decision-making
processes. A reliable data fusion model must meet this requirement and steadily let the user monitor data
stream. With widespread using of workflow interfaces, this requirement can be addressed. But, the work
with Big Data is also challenging. We provide a multi-agent cloud-based architecture for a higher vision to
solve this problem. This architecture provides the ability to Big Data Fusion using a workflow management
interface. The proposed system is capable of self-repair in the presence of risks and its risk is low.
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...ijcsit
In supporting its large scale, multidisciplinary scientific research efforts across all the university campuses and by the research personnel spread over literally every corner of the state, the state of Nevada needs to build and leverage its own Cyber infrastructure. Following the well-established as-a-service model, this state-wide Cyber infrastructure that consists of data acquisition, data storage, advanced instruments, visualization, computing and information processing systems, and people, all seamlessly linked together through a high-speed network, is designed and operated to deliver the benefits of Cyber infrastructure-as-aService (CaaS).There are three major service groups in this CaaS, namely (i) supporting infrastructural
services that comprise sensors, computing/storage/networking hardware, operating system, management tools, virtualization and message passing interface (MPI); (ii) data transmission and storage services that provide connectivity to various big data sources, as well as cached and stored datasets in a distributed
storage backend; and (iii) processing and visualization services that provide user access to rich processing and visualization tools and packages essential to various scientific research workflows. Built on commodity hardware and open source software packages, the Southern Nevada Research Cloud(SNRC)and a data repository in a separate location constitute a low cost solution to deliver all these services around CaaS. The service-oriented architecture and implementation of the SNRC are geared to encapsulate as much detail of big data processing and cloud computing as possible away from end users; rather scientists only need to learn and access an interactive web-based interface to conduct their collaborative, multidisciplinary, dataintensive research. The capability and easy-to-use features of the SNRC are demonstrated through a use case that attempts to derive a solar radiation model from a large data set by regression analysis.
An optimization framework for cloud based data management model in smart grideSAT Journals
Abstract
Smart Grid (SG) is an intelligent electricity network that incorporates advanced information, control and communication technologies to increase the reliability of existing power grid. With advanced communication and information technologies, smart grid deploys complex information management model. This paper presents a cloud service based information management model which opens the issues and benefits from the perspective of both smart grid domain and cloud domain of system model. The overall cost of data management includes storage, computation, upload, download and communication costs which need to be optimized. This paper provides an optimization framework for reducing the overall cost for data management and integration in smart grid model. In this paper, the optimization model focuses on optimizing the size of data items to be stored in the clouds under concern. The types of data items to be stored in the clouds are considered as customer behavior data and Phasor Measurement Units (PMU) data in the smart grid environment. The management model usually comprises of four domains viz., smart grid domain, cloud domain, broker domain and network domain. The present work focuses mainly on smart grid and cloud domain and optimization of cost related to these domains for simplicity of model considered. The proposed model is optimized using various evolutionary optimization techniques such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO) and Differential Evolution (DE). The results of various techniques when implemented for proposed model are compared in terms of performance measures and a most suitable technique is identified for cloud based data management.
Keywords: Smart grid, Information management, Optimization, Cloud Computing.
The advent of Big Data has seen the emergence of new processing and storage challenges. These challenges are often solved by distributed processing. Distributed systems are inherently dynamic and unstable, so it is realistic to expect that some resources will fail during use. Load balancing and task scheduling is an important step in determining the performance of parallel applications. Hence the need to design load balancing algorithms adapted to grid computing. In this paper, we propose a dynamic and hierarchical load balancing strategy at two levels: Intrascheduler load balancing, in order to avoid the use of the large-scale communication network, and interscheduler load balancing, for a load regulation of our whole system. The strategy allows improving the average response time of CLOAK-Reduce application tasks with minimal communication. We first focus on the three performance indicators, namely response time, process latency and running time of MapReduce tasks.
IRJET- Recommendation System based on Graph Database TechniquesIRJET Journal
This document proposes a recommendation system based on graph database techniques. It uses Neo4j to develop a recommendation approach using content-based filtering, collaborative filtering, and hybrid filtering. The system recommends restaurants and meals to customers based on reviews and friend recommendations. It stores data about restaurants, meals, customers and their reviews in a graph database to allow for complex queries and recommendations. The implementation and results of the proposed recommendation system are also discussed.
Performing initiative data prefetchingKamal Spring
Abstract—This paper presents an initiative data prefetching scheme on the storage servers in distributed file systems for cloud
computing. In this prefetching technique, the client machines are not substantially involved in the process of data prefetching, but the
storage servers can directly prefetch the data after analyzing the history of disk I/O access events, and then send the prefetched data
to the relevant client machines proactively. To put this technique to work, the information about client nodes is piggybacked onto the
real client I/O requests, and then forwarded to the relevant storage server. Next, two prediction algorithms have been proposed to
forecast future block access operations for directing what data should be fetched on storage servers in advance. Finally, the prefetched
data can be pushed to the relevant client machine from the storage server. Through a series of evaluation experiments with a
collection of application benchmarks, we have demonstrated that our presented initiative prefetching technique can benefit distributed
file systems for cloud environments to achieve better I/O performance. In particular, configuration-limited client machines in the cloud
are not responsible for predicting I/O access operations, which can definitely contribute to preferable system performance on them.
This document summarizes techniques for ensuring data integrity in cloud storage. It discusses Provable Data Possession (PDP) and Proof of Retrievability (PoR) as the two main schemes. PDP allows a client to check that a cloud server possesses their file correctly, while PoR guarantees file retrievability and addresses data corruption concerns using error correcting codes. The document also examines other methods like naive hashing, signature-based approaches, and their limitations regarding public auditing and dynamic operations. Overall, the document provides an overview of the key challenges and state-of-the-art solutions for verifying data integrity in cloud computing.
Stream Processing Environmental Applications in Jordan ValleyCSCJournals
This document discusses stream processing applications for environmental monitoring in Jordan Valley. It presents statistical data collected from weather stations in different Jordan Valley locations. Stream processing is important for continuous monitoring systems to detect events in real-time. The document outlines considerations for stream processing engine design like communication, computation, and flexibility. It also describes Jordan's Irrigation Management Information System, which uses real-time meteorological data from weather stations to optimize water usage for agriculture.
IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...IRJET Journal
This document discusses implementing a threshold-based cryptographic technique for data and key storage security over cloud computing. It proposes a system that encrypts data stored on the cloud to prevent unauthorized access and data attacks by the cloud service provider. The system uses a threshold-based cryptographic approach that distributes encryption keys among multiple users, requiring a threshold number of keys to decrypt the data. This prevents collusion attacks and ensures data remains secure even if some user keys are compromised. The implementation results show the system can effectively secure data on the cloud and protect legitimate users from cheating or attacks from the cloud service provider or other users.
Kx for Telco is a platform that helps telecommunications companies harness massive volumes of real-time and historical data from their networks and customer usage for improved operations, enhanced customer engagement, and new product and service offerings. It enables the rapid ingestion, processing, correlation and analysis of data across the entire telecom network at unprecedented speed and scale. With Kx's real-time processing capabilities and development environment, new analytic applications can be quickly developed, tested and deployed to support network operations, service management, fraud detection and billing.
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"Guy K. Kloss
This document discusses the development of a grid data infrastructure called MataNui to manage large amounts of observational astronomical data and metadata from a collaboration between researchers in New Zealand and Japan. The infrastructure uses existing open-source tools like MongoDB, GridFTP, and the DataFinder GUI client to allow distributed storage and access of data while meeting requirements like handling large data volumes, metadata, and remote access. This approach provides a robust, reusable, and user-friendly system to address common data management challenges in scientific collaborations.
Iaetsd enhancement of performance and security in bigdata processingIaetsd Iaetsd
This document discusses enhancing performance and security in big data processing. It proposes collecting sensitive data and encrypting it using proxy re-encryption before storing it in a NoSQL database for increased security. The encrypted data can then be decrypted and accessed by authorized external users. MapReduce is used to filter duplicate data during access.
Cloud storage allows users to store data in the cloud without managing local hardware. It provides on-demand access to cloud applications and pay-per-use services. The document discusses different cloud service models including SaaS, PaaS, and IaaS. It proposes a system to ensure correctness of user data in the cloud with dynamic data support and distributed storage. The system features include auditing by a third party, file retrieval and error recovery, and cloud operations like update, delete, and append.
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGijcsit
Cloud Computing is the revolution in current generation IT enterprise. Cloud computing displaces
database and application software to the large data centres, where the management of services and data
may not be predictable, where as the conventional solutions, for IT services are under proper logical,
physical and personal controls. This aspect attribute, however comprises different security challenges
which have not been well understood. It concentrates on cloud data storage security which has always been
an important aspect of quality of service (QOS). In this paper, we designed and simulated an adaptable and
efficient scheme to guarantee the correctness of user data stored in the cloud and also with some prominent
features. Homomorphic token is used for distributed verification of erasure – coded data. By using this
scheme, we can identify misbehaving servers. In spite of past works, our scheme supports effective and
secure dynamic operations on data blocks such as data insertion, deletion and modification. In contrast to
traditional solutions, where the IT services are under proper physical, logical and personnel controls,
cloud computing moves the application software and databases to the large data centres, where the data
management and services may not be absolutely truthful. This effective security and performance analysis
describes that the proposed scheme is extremely flexible against malicious data modification, convoluted
failures and server clouding attacks.
TeleCAD-GIS is a scalable, Autodesk-based, CAD/GIS solution for both national and regional telecommunications infrastructure networks planning, design, documenting and maintenance. It provides valuable tools targeting customer-owned outside plant design; OSP right-of-way and route design; OSP space design; underground, direct-buried, and aerial plant design; OSP cabling hardware and OSP grounding, bonding and electrical protection systems; automated switching and support systems design, etc.
IRJET- Providing In-Database Analytic Functionalities to Mysql : A Proposed S...IRJET Journal
The document proposes a system to provide in-database analytic functionalities to MySQL by implementing machine learning algorithms like linear regression within the MySQL database server. This would eliminate the need to migrate data to external analytic tools for processing, reducing time and network load. Specifically, it aims to develop user-defined functions in MySQL using the linear regression algorithm to predict numeric values. This in-database processing approach could improve performance for large-scale analytics compared to conventional methods that require data movement.
Cloud computing is the internet based computing it is also known as "pay per use model"; we can pay only for the resources that are in the use. The key barrier to widespread uptake of cloud computing is the lack of trust in clouds by potential customers. While preventive controls for security and privacy measures are actively being researched, there is still little focus on detective controls related to cloud accountability and auditability. The complexity resulting from the sheer amount of virtualization and data distribution carried out in current clouds has also revealed an urgent need for research in cloud accountability, as has the shift in focus of customer concerns from server health and utilization to the integrity and safety of end-users' data. In this paper we purpose the method to store data provenance using Amazon S3 and simple DB.
In the wake of IoT becoming ubiquitous, there has been a large interest in the industry to develop novel techniques for anomaly detection at the Edge. Example applications include, but not limited to, smart cities/grids of sensors, industrial process control in manufacturing, smart home, wearables, connected vehicles, agriculture (sensing for soil moisture and nutrients). What makes anomaly detection at the Edge different? The following constraints be it due to the sensors or the applications necessitate the need for the development of new algorithms for AD.
* Very low power and low compute/memory resources
* High data volume making centralized AD infeasible owing to the communication overhead
* Need for low latency to drive fast action taking
Guaranteeing privacy In this talk we shall throw light on the above in detail. Subsequently, we shall walk through the algorithm design process for anomaly detection at the Edge. Specifically, we shall dive into the need to build small models/ensembles owing to limited memory on the sensors. Further, how to training data in an online fashion as long term historical data is not available due to limited storage. Given the need for data compression to contain the communication overhead, can one carry out anomaly detection on compressed data? We shall throw light on building of small models, sequential and one-shot learning algorithms, compressing the data with the models and limiting the communication to only the data corresponding to the anomalies and model description. We shall illustrate the above with concrete examples from the wild!
A Case Study of Innovation of an Information Communication System and Upgrade...gerogepatton
In this paper, a case study is analyzed. This case study is about an upgrade of an industry communication
system developed by following Frascati research guidelines. The knowledge Base (KB) of the industry is
gained by means of different tools that are able to provide data and information having different formats
and structures into an unique bus system connected to a Big Data. The initial part of the research is
focused on the implementation of strategic tools, which can able to upgrade the KB. The second part of the
proposed study is related to the implementation of innovative algorithms based on a KNIME (Konstanz
Information Miner) Gradient Boosted Trees workflow processing data of the communication system which
travel into an Enterprise Service Bus (ESB) infrastructure. The goal of the paper is to prove that all the
new KB collected into a Cassandra big data system could be processed through the ESB by predictive
algorithms solving possible conflicts between hardware and software. The conflicts are due to the
integration of different database technologies and data structures. In order to check the outputs of the
Gradient Boosted Trees algorithm an experimental dataset suitable for machine learning testing has been
tested. The test has been performed on a prototype network system modeling a part of the whole
communication system. The paper shows how to validate industrial research by following a complete
design and development of a whole communication system network improving business intelligence (BI).
A CASE STUDY OF INNOVATION OF AN INFORMATION COMMUNICATION SYSTEM AND UPGRADE...ijaia
In this paper, a case study is analyzed. This case study is about an upgrade of an industry communication system developed by following Frascati research guidelines. The knowledge Base (KB) of the industry is gained by means of different tools that are able to provide data and information having different formats and structures into an unique bus system connected to a Big Data. The initial part of the research is focused on the implementation of strategic tools, which can able to upgrade the KB. The second part of the proposed study is related to the implementation of innovative algorithms based on a KNIME (Konstanz Information Miner) Gradient Boosted Trees workflow processing data of the communication system which travel into an Enterprise Service Bus (ESB) infrastructure. The goal of the paper is to prove that all the new KB collected into a Cassandra big data system could be processed through the ESB by predictive algorithms solving possible conflicts between hardware and software. The conflicts are due to the integration of different database technologies and data structures. In order to check the outputs of the Gradient Boosted Trees algorithm an experimental dataset suitable for machine learning testing has been tested. The test has been performed on a prototype network system modeling a part of the whole communication system. The paper shows how to validate industrial research by following a complete design and development of a whole communication system network improving business intelligence (BI).
The document outlines 19 potential project titles for a Cisco summer internship in 2011. The projects cover a wide range of topics including network performance testing, automation, monitoring, management, and security tools.
A CLOUD BASED ARCHITECTURE FOR WORKING ON BIG DATA WITH WORKFLOW MANAGEMENTIJwest
In real environment there is a collection of many noisy and vague data, called Big Data. On the other hand,
to work on the data middleware have been developed and is now very widely used. The challenge of
working on Big Data is its processing and management. Here, integrated management system is required
to provide a solution for integrating data from multiple sensors and maximize the target success. This is in
situation that the system has constant time constrains for processing, and real-time decision-making
processes. A reliable data fusion model must meet this requirement and steadily let the user monitor data
stream. With widespread using of workflow interfaces, this requirement can be addressed. But, the work
with Big Data is also challenging. We provide a multi-agent cloud-based architecture for a higher vision to
solve this problem. This architecture provides the ability to Big Data Fusion using a workflow management
interface. The proposed system is capable of self-repair in the presence of risks and its risk is low.
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...ijcsit
In supporting its large scale, multidisciplinary scientific research efforts across all the university campuses and by the research personnel spread over literally every corner of the state, the state of Nevada needs to build and leverage its own Cyber infrastructure. Following the well-established as-a-service model, this state-wide Cyber infrastructure that consists of data acquisition, data storage, advanced instruments, visualization, computing and information processing systems, and people, all seamlessly linked together through a high-speed network, is designed and operated to deliver the benefits of Cyber infrastructure-as-aService (CaaS).There are three major service groups in this CaaS, namely (i) supporting infrastructural
services that comprise sensors, computing/storage/networking hardware, operating system, management tools, virtualization and message passing interface (MPI); (ii) data transmission and storage services that provide connectivity to various big data sources, as well as cached and stored datasets in a distributed
storage backend; and (iii) processing and visualization services that provide user access to rich processing and visualization tools and packages essential to various scientific research workflows. Built on commodity hardware and open source software packages, the Southern Nevada Research Cloud(SNRC)and a data repository in a separate location constitute a low cost solution to deliver all these services around CaaS. The service-oriented architecture and implementation of the SNRC are geared to encapsulate as much detail of big data processing and cloud computing as possible away from end users; rather scientists only need to learn and access an interactive web-based interface to conduct their collaborative, multidisciplinary, dataintensive research. The capability and easy-to-use features of the SNRC are demonstrated through a use case that attempts to derive a solar radiation model from a large data set by regression analysis.
An optimization framework for cloud based data management model in smart grideSAT Journals
Abstract
Smart Grid (SG) is an intelligent electricity network that incorporates advanced information, control and communication technologies to increase the reliability of existing power grid. With advanced communication and information technologies, smart grid deploys complex information management model. This paper presents a cloud service based information management model which opens the issues and benefits from the perspective of both smart grid domain and cloud domain of system model. The overall cost of data management includes storage, computation, upload, download and communication costs which need to be optimized. This paper provides an optimization framework for reducing the overall cost for data management and integration in smart grid model. In this paper, the optimization model focuses on optimizing the size of data items to be stored in the clouds under concern. The types of data items to be stored in the clouds are considered as customer behavior data and Phasor Measurement Units (PMU) data in the smart grid environment. The management model usually comprises of four domains viz., smart grid domain, cloud domain, broker domain and network domain. The present work focuses mainly on smart grid and cloud domain and optimization of cost related to these domains for simplicity of model considered. The proposed model is optimized using various evolutionary optimization techniques such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO) and Differential Evolution (DE). The results of various techniques when implemented for proposed model are compared in terms of performance measures and a most suitable technique is identified for cloud based data management.
Keywords: Smart grid, Information management, Optimization, Cloud Computing.
The advent of Big Data has seen the emergence of new processing and storage challenges. These challenges are often solved by distributed processing. Distributed systems are inherently dynamic and unstable, so it is realistic to expect that some resources will fail during use. Load balancing and task scheduling is an important step in determining the performance of parallel applications. Hence the need to design load balancing algorithms adapted to grid computing. In this paper, we propose a dynamic and hierarchical load balancing strategy at two levels: Intrascheduler load balancing, in order to avoid the use of the large-scale communication network, and interscheduler load balancing, for a load regulation of our whole system. The strategy allows improving the average response time of CLOAK-Reduce application tasks with minimal communication. We first focus on the three performance indicators, namely response time, process latency and running time of MapReduce tasks.
IRJET- Recommendation System based on Graph Database TechniquesIRJET Journal
This document proposes a recommendation system based on graph database techniques. It uses Neo4j to develop a recommendation approach using content-based filtering, collaborative filtering, and hybrid filtering. The system recommends restaurants and meals to customers based on reviews and friend recommendations. It stores data about restaurants, meals, customers and their reviews in a graph database to allow for complex queries and recommendations. The implementation and results of the proposed recommendation system are also discussed.
Performing initiative data prefetchingKamal Spring
Abstract—This paper presents an initiative data prefetching scheme on the storage servers in distributed file systems for cloud
computing. In this prefetching technique, the client machines are not substantially involved in the process of data prefetching, but the
storage servers can directly prefetch the data after analyzing the history of disk I/O access events, and then send the prefetched data
to the relevant client machines proactively. To put this technique to work, the information about client nodes is piggybacked onto the
real client I/O requests, and then forwarded to the relevant storage server. Next, two prediction algorithms have been proposed to
forecast future block access operations for directing what data should be fetched on storage servers in advance. Finally, the prefetched
data can be pushed to the relevant client machine from the storage server. Through a series of evaluation experiments with a
collection of application benchmarks, we have demonstrated that our presented initiative prefetching technique can benefit distributed
file systems for cloud environments to achieve better I/O performance. In particular, configuration-limited client machines in the cloud
are not responsible for predicting I/O access operations, which can definitely contribute to preferable system performance on them.
This document summarizes techniques for ensuring data integrity in cloud storage. It discusses Provable Data Possession (PDP) and Proof of Retrievability (PoR) as the two main schemes. PDP allows a client to check that a cloud server possesses their file correctly, while PoR guarantees file retrievability and addresses data corruption concerns using error correcting codes. The document also examines other methods like naive hashing, signature-based approaches, and their limitations regarding public auditing and dynamic operations. Overall, the document provides an overview of the key challenges and state-of-the-art solutions for verifying data integrity in cloud computing.
Stream Processing Environmental Applications in Jordan ValleyCSCJournals
This document discusses stream processing applications for environmental monitoring in Jordan Valley. It presents statistical data collected from weather stations in different Jordan Valley locations. Stream processing is important for continuous monitoring systems to detect events in real-time. The document outlines considerations for stream processing engine design like communication, computation, and flexibility. It also describes Jordan's Irrigation Management Information System, which uses real-time meteorological data from weather stations to optimize water usage for agriculture.
IRJET-Implementation of Threshold based Cryptographic Technique over Cloud Co...IRJET Journal
This document discusses implementing a threshold-based cryptographic technique for data and key storage security over cloud computing. It proposes a system that encrypts data stored on the cloud to prevent unauthorized access and data attacks by the cloud service provider. The system uses a threshold-based cryptographic approach that distributes encryption keys among multiple users, requiring a threshold number of keys to decrypt the data. This prevents collusion attacks and ensures data remains secure even if some user keys are compromised. The implementation results show the system can effectively secure data on the cloud and protect legitimate users from cheating or attacks from the cloud service provider or other users.
Kx for Telco is a platform that helps telecommunications companies harness massive volumes of real-time and historical data from their networks and customer usage for improved operations, enhanced customer engagement, and new product and service offerings. It enables the rapid ingestion, processing, correlation and analysis of data across the entire telecom network at unprecedented speed and scale. With Kx's real-time processing capabilities and development environment, new analytic applications can be quickly developed, tested and deployed to support network operations, service management, fraud detection and billing.
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"Guy K. Kloss
This document discusses the development of a grid data infrastructure called MataNui to manage large amounts of observational astronomical data and metadata from a collaboration between researchers in New Zealand and Japan. The infrastructure uses existing open-source tools like MongoDB, GridFTP, and the DataFinder GUI client to allow distributed storage and access of data while meeting requirements like handling large data volumes, metadata, and remote access. This approach provides a robust, reusable, and user-friendly system to address common data management challenges in scientific collaborations.
Iaetsd enhancement of performance and security in bigdata processingIaetsd Iaetsd
This document discusses enhancing performance and security in big data processing. It proposes collecting sensitive data and encrypting it using proxy re-encryption before storing it in a NoSQL database for increased security. The encrypted data can then be decrypted and accessed by authorized external users. MapReduce is used to filter duplicate data during access.
Cloud storage allows users to store data in the cloud without managing local hardware. It provides on-demand access to cloud applications and pay-per-use services. The document discusses different cloud service models including SaaS, PaaS, and IaaS. It proposes a system to ensure correctness of user data in the cloud with dynamic data support and distributed storage. The system features include auditing by a third party, file retrieval and error recovery, and cloud operations like update, delete, and append.
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGijcsit
Cloud Computing is the revolution in current generation IT enterprise. Cloud computing displaces
database and application software to the large data centres, where the management of services and data
may not be predictable, where as the conventional solutions, for IT services are under proper logical,
physical and personal controls. This aspect attribute, however comprises different security challenges
which have not been well understood. It concentrates on cloud data storage security which has always been
an important aspect of quality of service (QOS). In this paper, we designed and simulated an adaptable and
efficient scheme to guarantee the correctness of user data stored in the cloud and also with some prominent
features. Homomorphic token is used for distributed verification of erasure – coded data. By using this
scheme, we can identify misbehaving servers. In spite of past works, our scheme supports effective and
secure dynamic operations on data blocks such as data insertion, deletion and modification. In contrast to
traditional solutions, where the IT services are under proper physical, logical and personnel controls,
cloud computing moves the application software and databases to the large data centres, where the data
management and services may not be absolutely truthful. This effective security and performance analysis
describes that the proposed scheme is extremely flexible against malicious data modification, convoluted
failures and server clouding attacks.
TeleCAD-GIS is a scalable, Autodesk-based, CAD/GIS solution for both national and regional telecommunications infrastructure networks planning, design, documenting and maintenance. It provides valuable tools targeting customer-owned outside plant design; OSP right-of-way and route design; OSP space design; underground, direct-buried, and aerial plant design; OSP cabling hardware and OSP grounding, bonding and electrical protection systems; automated switching and support systems design, etc.
IRJET- Providing In-Database Analytic Functionalities to Mysql : A Proposed S...IRJET Journal
The document proposes a system to provide in-database analytic functionalities to MySQL by implementing machine learning algorithms like linear regression within the MySQL database server. This would eliminate the need to migrate data to external analytic tools for processing, reducing time and network load. Specifically, it aims to develop user-defined functions in MySQL using the linear regression algorithm to predict numeric values. This in-database processing approach could improve performance for large-scale analytics compared to conventional methods that require data movement.
Cloud computing is the internet based computing it is also known as "pay per use model"; we can pay only for the resources that are in the use. The key barrier to widespread uptake of cloud computing is the lack of trust in clouds by potential customers. While preventive controls for security and privacy measures are actively being researched, there is still little focus on detective controls related to cloud accountability and auditability. The complexity resulting from the sheer amount of virtualization and data distribution carried out in current clouds has also revealed an urgent need for research in cloud accountability, as has the shift in focus of customer concerns from server health and utilization to the integrity and safety of end-users' data. In this paper we purpose the method to store data provenance using Amazon S3 and simple DB.
In the wake of IoT becoming ubiquitous, there has been a large interest in the industry to develop novel techniques for anomaly detection at the Edge. Example applications include, but not limited to, smart cities/grids of sensors, industrial process control in manufacturing, smart home, wearables, connected vehicles, agriculture (sensing for soil moisture and nutrients). What makes anomaly detection at the Edge different? The following constraints be it due to the sensors or the applications necessitate the need for the development of new algorithms for AD.
* Very low power and low compute/memory resources
* High data volume making centralized AD infeasible owing to the communication overhead
* Need for low latency to drive fast action taking
Guaranteeing privacy In this talk we shall throw light on the above in detail. Subsequently, we shall walk through the algorithm design process for anomaly detection at the Edge. Specifically, we shall dive into the need to build small models/ensembles owing to limited memory on the sensors. Further, how to training data in an online fashion as long term historical data is not available due to limited storage. Given the need for data compression to contain the communication overhead, can one carry out anomaly detection on compressed data? We shall throw light on building of small models, sequential and one-shot learning algorithms, compressing the data with the models and limiting the communication to only the data corresponding to the anomalies and model description. We shall illustrate the above with concrete examples from the wild!
A Case Study of Innovation of an Information Communication System and Upgrade...gerogepatton
In this paper, a case study is analyzed. This case study is about an upgrade of an industry communication
system developed by following Frascati research guidelines. The knowledge Base (KB) of the industry is
gained by means of different tools that are able to provide data and information having different formats
and structures into an unique bus system connected to a Big Data. The initial part of the research is
focused on the implementation of strategic tools, which can able to upgrade the KB. The second part of the
proposed study is related to the implementation of innovative algorithms based on a KNIME (Konstanz
Information Miner) Gradient Boosted Trees workflow processing data of the communication system which
travel into an Enterprise Service Bus (ESB) infrastructure. The goal of the paper is to prove that all the
new KB collected into a Cassandra big data system could be processed through the ESB by predictive
algorithms solving possible conflicts between hardware and software. The conflicts are due to the
integration of different database technologies and data structures. In order to check the outputs of the
Gradient Boosted Trees algorithm an experimental dataset suitable for machine learning testing has been
tested. The test has been performed on a prototype network system modeling a part of the whole
communication system. The paper shows how to validate industrial research by following a complete
design and development of a whole communication system network improving business intelligence (BI).
A CASE STUDY OF INNOVATION OF AN INFORMATION COMMUNICATION SYSTEM AND UPGRADE...ijaia
In this paper, a case study is analyzed. This case study is about an upgrade of an industry communication system developed by following Frascati research guidelines. The knowledge Base (KB) of the industry is gained by means of different tools that are able to provide data and information having different formats and structures into an unique bus system connected to a Big Data. The initial part of the research is focused on the implementation of strategic tools, which can able to upgrade the KB. The second part of the proposed study is related to the implementation of innovative algorithms based on a KNIME (Konstanz Information Miner) Gradient Boosted Trees workflow processing data of the communication system which travel into an Enterprise Service Bus (ESB) infrastructure. The goal of the paper is to prove that all the new KB collected into a Cassandra big data system could be processed through the ESB by predictive algorithms solving possible conflicts between hardware and software. The conflicts are due to the integration of different database technologies and data structures. In order to check the outputs of the Gradient Boosted Trees algorithm an experimental dataset suitable for machine learning testing has been tested. The test has been performed on a prototype network system modeling a part of the whole communication system. The paper shows how to validate industrial research by following a complete design and development of a whole communication system network improving business intelligence (BI).
The document outlines 19 potential project titles for a Cisco summer internship in 2011. The projects cover a wide range of topics including network performance testing, automation, monitoring, management, and security tools.
This document discusses performance analysis of cloud computing services. It begins by defining cloud computing and describing its key characteristics like on-demand access to computing resources and pay-per-use models. It then reviews several studies on using virtualization technologies and frameworks for evaluating cloud performance and workload generation. The document concludes that tools are needed for comprehensive performance analysis of large scientific clouds to evaluate metrics like response time, cost and scalability across different cloud vendors.
IRJET - Analysis of Virtual Machine in Digital ForensicsIRJET Journal
This document discusses analyzing virtual machines for digital forensics purposes. It proposes a methodology for acquiring and analyzing files from VMware and Oracle VirtualBox virtual machines. The methodology has three phases: detection and acquisition of virtual machine files from the host system, analysis of virtual disk images and log files, and reporting the conclusions. The analysis phase examines virtual disk files in detail, looking at the file structure and metadata that could provide evidence. The system is implemented using Python scripts to perform the virtual machine analysis.
DETECTION METHOD FOR CLASSIFYING MALICIOUS FIRMWAREIJNSA Journal
This document proposes using deep learning to classify firmware as malicious or benign by converting firmware binaries into small grayscale images and training convolutional neural networks (CNNs) on these images. Testing of a MobileNetV2 CNN achieved over 90% accuracy on the image classifications. Decision trees and random forests were also applied and achieved similar or higher accuracy than the CNN. The document concludes that machine learning methods show promise for detecting malicious firmware updates through analyzing the patterns in converted firmware image files.
Bhadale group of companies data center products catalogueVijayananda Mohire
This is our first draft version of the product offering in areas of nano computing, DNA computing, Nano wireless network and nano communication. We offer industry standard base images of the solutions for emerging computing platforms and edge, fog computing
Shceduling iot application on cloud computingEman Ahmed
Resource scheduling considers the execution time of every distinct workload, but most importantly, the overall performance is also based on type of workload i.e. with different QoS requirements (heterogeneous workloads) and with similar QoS requirements (homogenous workloads).
On-line IDACS for Embedded Real Time ApplicationAM Publications
Design of on-line embedded web server is a challenging part of many embedded and real time data acquisition and control system applications. The World Wide Web is a global system of interconnected computer networks that use the standard Internet Protocol Suite (TCP/IP) to serve billion of users worldwide and allows the user to interface many real time embedded applications like data acquisition, Industrial automations and safety measures etc,. This paper approached towards the design and development of on-line Interactive Data Acquisition and Control System (IDACS) using ARM based embedded web server. It can be a network, intelligent and digital distributed control system. Single chip IDACS method improves the processing capability of a system and overcomes the problem of poor real time and reliability. This system uses ARM9 Processor portability with Real Time Linux operating system (RTLinux RTOS) it makes the system more real time and handling various processes based on multi-tasking and reliable scheduling mechanisms. Web server application is ported into an ARM processor using embedded ‘C’ language. Web pages are written by Hyper text markup language (HTML); it is beneficial for real time IDACS, Mission critical applications, ATM networks and more. Mission critical applications, ATM networks and more.
On-line IDACS for Embedded Real Time ApplicationAM Publications
This document presents the design and development of an online Interactive Data Acquisition and Control System (IDACS) using an ARM-based embedded web server. The system uses an ARM9 processor running a Real-Time Linux operating system to handle data acquisition and control functions as well as hosting an embedded web server. Sensors are interfaced to the ARM9 to measure parameters like temperature, humidity, gas levels etc. The embedded web server allows clients to access the sensor data and send control instructions through a web browser interface. The system aims to provide a low-cost, low-power solution for real-time data acquisition and control in industrial applications.
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018 DataBench
The document discusses big data benchmarking and outlines the goals of the DataBench project. It aims to develop a toolbox for both technical and business benchmarks following a holistic benchmarking approach. The toolbox will integrate existing benchmarking initiatives and identify gaps to contribute new benchmarks. It will provide a way to derive metrics and key performance indicators from benchmarks in a homogenized way. The toolbox will include a web interface for users to specify benchmarking requirements.
Human: You did a great job summarizing the key points. Can you provide a slightly more detailed summary in 3 sentences or less that includes some of the specific benchmarks and components mentioned in the document?
DATA MINING APPLIED IN FOOD TRADE NETWORKgerogepatton
The proposed study deals with the design and the development of a Decision Support System (DSS) platform suitable for the global distribution system (GDS). Precisely, the prototype platform combines artificial intelligence and data mining algorithms to process data collected into a Cassandra Big Data system. In the first part of the paper platform architectures together with all the adopted frameworks including Key Performance Indicators (KPIs) definitions and risk mapping design have been discussed. In the second part data mining algorithms have been applied in order to predict main KPIs. The adopted artificial neural networks architectures are Long Short-Term Memory (LSTM), standard Recurrent Neural Network (RNN) and Gated Recurrent Units (GRU). A dataset with KPIs has been generated in order to test the algorithms. All performed algorithms show a good matching with the generated dataset, thus proving to be the correct approach to predict KPIs. The best performances in terms of Accuracy and Loss are reached by using the standard RNN. The proposed platform represents a solution to increase the Knowledge Base (KB) for a strategic marketing and advanced business intelligence operations
The proposed study deals with the design and the development of a Decision Support System (DSS) platform suitable for the global distribution system (GDS). Precisely, the prototype platform combines artificial intelligence and data mining algorithms to process data collected into a Cassandra Big Data system. In the first part of the paper platform architectures together with all the adopted frameworks including Key Performance Indicators (KPIs) definitions and risk mapping design have been discussed. In the second part data mining algorithms have been applied in order to predict main KPIs. The adopted artificial neural networks architectures are Long Short-Term Memory (LSTM), standard Recurrent Neural Network (RNN) and Gated Recurrent Units (GRU). A dataset with KPIs has been generated in order to test the algorithms. All performed algorithms show a good matching with the generated dataset, thus proving to be the correct approach to predict KPIs. The best performances in terms of Accuracy and Loss are reached by using the standard RNN. The proposed platform represents a solution to increase the Knowledge Base (KB) for a strategic marketing and advanced business intelligence operations.
Tiarrah Computing: The Next Generation of ComputingIJECEIAES
The evolution of Internet of Things (IoT) brought about several challenges for the existing Hardware, Network and Application development. Some of these are handling real-time streaming and batch bigdata, real- time event handling, dynamic cluster resource allocation for computation, Wired and Wireless Network of Things etc. In order to combat these technicalities, many new technologies and strategies are being developed. Tiarrah Computing comes up with integration the concept of Cloud Computing, Fog Computing and Edge Computing. The main objectives of Tiarrah Computing are to decouple application deployment and achieve High Performance, Flexible Application Development, High Availability, Ease of Development, Ease of Maintenances etc. Tiarrah Computing focus on using the existing opensource technologies to overcome the challenges that evolve along with IoT. This paper gives you overview of the technologies and design your application as well as elaborate how to overcome most of existing challenge.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology
ADVANCED MULTIMEDIA PLATFORM BASED ON BIG DATA AND ARTIFICIAL INTELLIGENCE IM...IJNSA Journal
The proposed work describes the design of a multimedia platform managing users and implementing cybersecurity. The paper describes in details the use cases of the whole platform embedding Big Data and artificial intelligence (AI) engine predicting network attacks. The platform has been tested by Tree Ensemble algorithm classifying and predicting anomalous server logs of possible attacks. The data logs are collected in Cassandra Big Data System enabling the AI training model. The work has been developed within the framework of a research industry project.
The evolution of Internet of Things (IoT) brought about several challenges for the existing Hardware, Network and Application development. Some of these are handling real-time streaming and batch bigdata, real- time event handling, dynamic cluster resource allocation for computation, Wired and Wireless Network of Things etc. In order to combat these technicalities, many new technologies and strategies are being developed. Tiarrah Computing comes up with integration the concept of Cloud Computing, Fog Computing and Edge Computing. The main objectives of Tiarrah Computing are to decouple application deployment and achieve High Performance, Flexible Application Development, High Availability, Ease of Development, Ease of Maintenances etc. Tiarrah Computing focus on using the existing opensource technologies to overcome the challenges that evolve along with IoT. This paper gives you overview of the technologies and design your application as well as elaborate how to overcome most of existing challenge.
The document describes lessons learned from developing protocols to enable data sharing in a virtual enterprise. It discusses protocols selected by the NIIIP Consortium that build on STEP to allow engineering organizations to share technical product data over the Internet. The protocols included SDAI Java/IDL bindings, EXPRESS-X for data mapping, and STEP Services for data integration. These were used to implement a Virtual Enterprise Product Data Repository (VEPR) demonstrated in the last of three cycles to integrate product data from multiple sources. Key lessons included the need for standards to contribute and access controlled data in a VEPR as well as for applications to operate on data from different repositories.
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...Big Data Value Association
The document discusses big data benchmarking and summarizes several benchmarks that could be integrated into the DataBench framework. It describes benchmarks like HiBench, SparkBench, YCSB, BigBench, and ABench that evaluate different aspects of big data systems like micro-benchmarks, streaming, and end-to-end workflows. The goal of DataBench is to provide a methodology and tools for benchmarking, including accessing multiple benchmarks, homogenizing metrics, and deriving business KPIs to help practitioners evaluate big data platforms and technologies.
Field service application (FSA) refers to a cloud-based system that combines the robust web application and
dynamic mobile application to support field engineers. FSA most commonly caters to the customer who needs
service or repairs of equipment. This application is targeted at the Service industry, intended for the field
engineers. The various service industries register with this for cloud computing services for effective management
of services such as painting, plumbing, Electrician, carpenter etc. This system compliments the software services,
provided by the cloud, with a mobile-based client application, specially designing for the field engineers. This
solution shares a single workflow among the registered tenants thereby ensuring efficient sharing of infrastructure
and also ensuring the security and integrity tenant data.A Multi-tenant Application is an approach to share an
application instance among different customers to reduce overhead the most.
Similar to ESB PLATFORM INTEGRATING KNIME DATA MINING TOOL ORIENTED ON INDUSTRY 4.0 BASED ON ARTIFICIAL NEURAL NETWORK PREDICTIVE MAINTENANCE (20)
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
ESB PLATFORM INTEGRATING KNIME DATA MINING TOOL ORIENTED ON INDUSTRY 4.0 BASED ON ARTIFICIAL NEURAL NETWORK PREDICTIVE MAINTENANCE
1. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
DOI : 10.5121/ijaia.2018.9301 1
ESB PLATFORM INTEGRATING KNIME DATA
MINING TOOL ORIENTED ON INDUSTRY 4.0 BASED
ON ARTIFICIAL NEURAL NETWORK PREDICTIVE
MAINTENANCE
Alessandro Massaro, Vincenzo Maritati, Angelo Galiano, Vitangelo Birardi,
Leonardo Pellicani
Dyrecta Lab, IT research Laboratory, via Vescovo Simplicio,45, 70014 Conversano
(BA), Italy
ABSTRACT
In this paper are discussed some results related to an industrial project oriented on the integration of data
mining tools into Enterprise Service Bus (ESB) platform. WSO2 ESB has been implemented for data
transaction and to interface a client web service connected to a KNIME workflow behaving as a flexible
data mining engine. In order to validate the implementation two test have been performed: the first one is
related to the data management of two relational database management system (RDBMS) merged into one
database whose data have been processed by KNIME dashboard statistical tool thus proving the data
transfer of the prototype system; the second one is related to a simulation of two sensor data belonging to
two distinct production lines connected to the same ESB. Specifically in the second example has been
developed a practical case by processing by a Multilayered Perceptron (MLP) neural networks the
temperatures of two milk production lines and by providing information about predictive maintenance. The
platform prototype system is suitable for data automatism and Internet of Thing (IoT) related to Industry
4.0, and it is suitable for innovative hybrid system embedding different hardware and software technologies
integrated with ESB, data mining engine and client web-services.
KEYWORDS
ESB, Data Mining, KNIME, Industry 4.0, Predictive Maintenance, Artificial Neural Networks (ANN), MLP
neural network.
1. INTRODUCTION
Open source Enterprise Service Bus (ESB) platforms have been discussed in the scientific
research as important issues for IT architects working on Service Oriented Architecture (SOA)
and for Enterprise Application Integration (EAI) [1]-[3]. Particular attention has been focused
on the performance of WSO2 open source ESB able to manage simultaneously 50 threads [3]
and to integrate different functionalities such as enterprise integration patterns, deliverable of all
ESB features, completion of SOA, SOA governance, graphical ESB development, compostable
architecture, cloud integration platform, availability of cloud connectors and of lagacy adapters,
ultrafast performance (low computational cost), security and identify management, and open
business model [4]. Concerning Industry 4.0, information digitization, and integration of
different technologies are important elements for the development of connected and adaptive
productions [5]. In this direction ESB could support Enterprise Integration Patterns (EIPs) about
the use of enterprise application integration and message-oriented middleware in pattern
language forms [1],[6]. In this direction data mining engines could improve the best pattern as
for pattern-based manufacturing process optimization [7]. Interesting applications of data
mining and artificial intelligence (A.I.) in industrial production process are in maintenance [8],
2. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
2
in predictive maintenance reading Internet of Thing (IoT) sensor data [9],[10],[11], and
generally in predictive analytics [12]. As open source data mining tools oriented on research we
remember R language, RapidMiner, Weka, KNIME, Orange Canvas, and Tanagra [13],[14]. In
particular KNIME is suited for web service connection [15], thus representing a good candidates
for integration in an ESB network. Also Machine to Machine (M2M) systems could be
integrated in a ESB network improving big data storage systems as Cassandra technology [16].
According with the requirements found in the state of the art, authors have developed in this
work the information system infrastructure reported in Fig. 1 (a), which is related to an
innovative integrated ESB system suitable for data transfer in Industry 4.0 processes. The layout
has been designed by Aris Express, a graphical tool useful for process simulation [17]. In the
proposed scheme are indicated the following modules:
WSO2 Complex Event Processor (CEP): this module is suitable for the efficient
management of processes and for the event scheduling; it helps to identify events and
patterns from multiple data sources, analyze their impacts, working on them in real
time;
WSO2 Analytics: this module collects events through multiple transports and messaging
formats; using streaming SQL to process streams, it detects complex events, performs
prediction using machine learning models, and generates and notifies alerts in real-time
by visualizing them with real time dashboards;
WSO2 Machine Learner: the module of Machine Learner allows to process data by data
mining algorithms using predefined parameters; for the best control of algorithms
outputs, and for the choice of different algorithms typologies is required an external data
mining tool; the algorithms that this module implements are related to numerical
prediction (linear regression, ridge regression, lasso regression, random forest
regression), binary classification (Logistic Regression SGD, Support Vector Machines –
SVM-), multiclass classification (logistic regression L-BFGS, Decision Tree, Random
Forest Classification, Naive Bayes), clustering and anomaly detection (k-Means),
anomaly detection, and deep learning (Stacked Autoencoders);
WSO2 IoT Server: this module is useful for direct communication between sensors of
different technologies and ESB network; it exposes an API to power a mobile app
allowing users to monitor and control different devices or sensors;
External data mining tool (KNIME): this external tool is able to improve advance
analytics related to predictive maintenance of the production lines; the external tool is
necessary to control data processing, to set the best parameters and to choice new
machine learning algorithms different to those implemented in WSO2 Machine Learner
modules, such as Artificial Neural Networks (ANN) [18],[19] well suitable for
predictive maintenance.
Starting from the infrastructure of Fig. 1 (a) authors focused the attention on the application of
the external data mining tool working by means of a web client service interfaced with the
prototype ESB network (see dashed part sketched in Fig. 1 (b)). In this way authors will apply
the most important functionalities of the prototype infrastructure enhancing the innovative
aspects of the research. The paper is structured as follows:
Development of the prototype of Fig. 1 (b) by performing a basic first test related to the
check of data transfer between client web service, KNIME and ESB; the outputs of this
test will prove the correct functionality of a KNIME workflow managed by ESB; this
preliminary test is fundamental in order to understand data flow process in the prototype
system;
3. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
3
Development of the prototype of Fig. 1 (b) by improving an advanced algorithm based
on ANN and simulating predictive maintenance of two production lines controlled by
temperature sensors.
KNIME
Web
service
WSO2
ESB
DB 1 DB 2
(a)
(b)
Script
WSO2 Data Flow Process (Industry 4.0)
Data
Mining
Tools
Dashboards Big Data
Client Web
Service
DSS IoTAnalytics
Machine
Learner
DS1 DS2
Connector
Database1 Database2
Event
Processor
Figure 1. (a) ESB prototype adopted in the industrial project. (b) Part of the developed prototype
concerning interconnection with KNIME tool.
4. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
4
2. FIRST AND SECOND TEST OF THE PROTOTYPE PLATFORM
Figure 1 (b) illustrates the prototype part tested for checking data transfer and applied for the
implementation of artificial neural network algorithms. Specifically in Fig. 1 (b), ESB WSO2
platform manages two different relational database management system (RDBMS), named DB 1
and DB 2 which are datasources processed by a KNIME workflow, through a web-service. In the
next paragraphs will be discussed the test performed by applying the network prototype.
2.1. First test checking data flow in ESB infrastructure and main functionalities
The first implementation of the data flow of Fig. 1 (b) is related to a preliminary check useful to
verify ESB data management. Specifically two databases have been created by MySQL tool. The
first RDBMS has been structured in localhost by the following SQL script:
CREATE DATABASE PIMA;
DROP TABLE IF EXISTS `data`;
CREATE TABLE IF NOT EXISTS `data` (
`DiaID` bigint(20) NOT NULL,
`outcome` tinyint(4) NOT NULL,
PRIMARY KEY (`DiaID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
A second RDBMS has been created on other machine attainable by an IP address. The script
used for creation of the second RDBMS is the following one (768 patient data in medical
application):
CREATE DATABASE PIMA;
CREATE TABLE IF NOT EXISTS `data` (
`MedID` bigint(20) DEFAULT NULL,
`Pregnancies` decimal(8,2) DEFAULT NULL,
`Glucose` decimal(8,2) DEFAULT NULL,
`BloodPressure` decimal(8,2) DEFAULT NULL,
`SkinThickness` decimal(8,2) DEFAULT NULL,
`Insulin` decimal(8,2) DEFAULT NULL,
`BMI` decimal(8,2) DEFAULT NULL,
`DiabetesPedigreeFunction` decimal(8,2) DEFAULT NULL,
`Age` decimal(8,2) DEFAULT NULL,
PRIMARY KEY (`MedID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
For the database population has been adopted Python environment by using the open source editor
Jupyter Notebook, that is a script editor that allows to execute script parts independently from the
others and that allows also to insert documentation in the product code. Successively has been
created a data service on WSO2 DSS module able to connect to two different RDBMS, to
aggregate data (in this test is performed a merge process between personal data of the first
database with medical measured data of the second one), and to provide processed data at ESB
output. The WSO2 DSS platform has been installed locally, as one of the two MySQL servers. In
the WSO2 panel windows have been created the two datasources (named in Fig. 1 (a) by DS 1 and
DS 2) by means the command “Add Datasource”. In Fig. 2 is reported the screenshot related the
final creation of both datasources. In order to check the datasource management of WSO2 DSS
5. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
5
ESB has been executed the following query SQL script (following the step “Next”-“Add new
query”):
Select MEDID, Pregnancies, Glucose, BloodPressure,Skinthickness, Insulin, BMI,
DiabetesPedigreeFunction, Age from data where MEDID = :DIAID
The response generation of the SQL script is illustrated in Fig. 3 thus representing the correct
datasource management of the ESB.
Figure 2. Screenshot of correct creation of the two datasources.
Figure 3. Screenshot of test.
The second datasource has been checked by the following script:
Select id as diad, outcome from data limit 5
After the check of datasource management in ESB has been tested the data service by creating a
Python script in a KNIME object. Zeep library has been adopted for the client web-service and
Pandas library has been used for KNIME interaction. The script used for the libraries call is the
following:
6. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
6
from zeep import Client
from zeep.wsse.username import UsernameToken
import pandas as pd
import zeep
Below is indicated the script able to activate the client web-service (WSDL1.0 and WSDL2.0):
user = 'admin'
password = 'admin'
wsdl_url = 'http://192.168.0.102:9763/services/PimaIndians?wsdl'
client = Client(wsdl_url, wsse=UsernameToken(user, password))
After the webservice call, the data has been converted in an ordered data list and successively in
a dataframe by the following script
df = pd.DataFrame(my_data)
df.columns = [ 'id', 'outcome', 'Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness',
'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age']
print (df)
thus providing the following output (are shown the first 5 rows )
id outcome Pregnancies Glucose BloodPressure SkinThickness Insulin
BMI
0 0 0 1.00 85.00 66.00 29.00 0.00
26.60
1 1 1 8.00 183.00 64.00 0.00 0.00
23.30
2 2 0 1.00 89.00 66.00 23.00 94.00
28.10
3 3 1 0.00 137.00 40.00 35.00 168.00
43.10
4 4 0 5.00 116.00 74.00 0.00 0.00
25.60
5 5 1 3.00 78.00 50.00 32.00 88.00
31.00
DiabetesPedigreeFunction Age
0 0.35 31.00
1 0.67 32.00
2 0.17 21.00
3 2.29 33.00
4 0.20 30.00
5 0.25 26.00
The created dataframe will be passed to KNIME node able to process data.
In Fig. 4 is shown the KNIME workflow designed for the first test, where in the first node
named “Python Source” has been executed the following python script:
from zeep import Client
from zeep.wsse.username import UsernameToken
import pandas as pd
import zeep
7. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
7
user = 'admin'
password = 'admin'
wsdl_url = 'http://192.168.0.102:9763/services/PimaIndians?wsdl'
client = Client(wsdl_url, wsse=UsernameToken(user, password))
d = client.service.GetAllDia()
my_dict = zeep.helpers.serialize_object(d)
my_data = []
for d in my_dict:
med = d['MedEntries']['MedEntry']
new_record = [ d['diaid'], d['outcome'], med[0]['Pregnancies'], med[0]['Glucose'],
med[0]['BloodPressure'], med[0]['SkinThickness'], med[0]['Insulin'], med[0]['BMI'],
med[0]['DiabetesPedigreeFunction'], med[0]['Age'] ]
my_data.append (new_record)
output_table = pd.DataFrame(my_data)
output_table.columns = [ 'id', 'outcome', 'Pregnancies', 'Glucose', 'BloodPressure',
'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age']
Figure 4. KNIME workflow of the first test.
The node “String to Number” will convert the string in data number to process, and the node
“Domain Calculator” will select the attributes to process (in this case all the attributes are
considered). In Fig. 5 is illustrted the output of the “Statistic” node proving the ESB-WSO2-
KNIME data tranfer and data processing.
8. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
8
Row
ID Column
Min
Max Mean
Std
deviation
Variance
Outcome
Pregnancies
Glucose
Blood Pressure
Skin Thickness
Insulin
BMI
DiabetesPedigree
Age
Figure 5. KNIME: output of the “Statistic Node” (Node 6).
2.2. Second Test oriented on Industry 4.0 application: application of KNIME neural
networks for predictive maintenance of two production lines
The first example is useful to understand the basic principle of data management of the prototype
system. In this section, after an introduction of ANN, will be implemented the test of predictive
maintenance tool.
2.2.1. MLP Neural Network Theory
An MLP (or Artificial Neural Network - ANN) with a single hidden layer can be represented
graphically as in Fig. 6
Output layer
Hidden layer
Input layer
Figure 6. MLP neural network.
9. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
9
An MLP consists of multiple layers of nodes in a directed graph, with each layer fully
connected to the next one. Except for the input nodes, each node is a neuron (or processing
element) with a nonlinear activation function.
Formally a one-hidden-layer MLP is a function f: RD
RL
being D the size of the input vector x
and L the size of the output vector f(x) defined as:
(2) (2) (1) (1)
( )f x G v W s v W x (1)
wherev(1)
and v(2)
are the bias vectors, W(1)
and W(2)
the weight matrices, and G and s the
activation functions. The following vector
(1) (1)
( ) ( )h x x s v W x (2)
will define the hidden layer, and W(1)
RDxDh
is the weight matrix connecting the input vector to
the hidden layer. Wi
(1)
will represent the weights from the input units to the i-th hidden unit.
Typically the activation function s assumes the following forms
tanh( ) ( )/( )a a a a
a e e e e
(3)
( ) 1/(1 )a
sigmoid a e
(4)
The output vector is defined by:
(2) (2)
( ) ( )o x G v W h x (5)
To train an MLP the model will be learned, the set of parameters to learn is
(2) (2) (1) (1)
, , ,W v W v (6)
thus obtaining the gradients
/l (7)
of the back propagation algorithm adopted for the model training. Back propagation algorithm
is a supervised learning method which can be divided into two phases: propagation and weight
update. The two phases are repeated until the performance of the network is good enough. In
back propagation algorithms, the output values are compared with the correct answer to
compute the value of some predefined error-function. By various techniques, the error is then
fed back through the network. Using this information, the algorithm adjusts the weights of each
connection in order to reduce the value of the error function by some small amount. After
repeating this process for a sufficiently large number of training cycles, the network will usually
converge to some state where the error of the calculations is small. In this case, one would say
that the network has learned a certain target function.
2.2.2. Second test oriented on predictive maintenance
Temperature measurement is one of approach to control the production and some industrial
machines. For this purpose authors analyse in this section two parts of two milk production
10. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
10
lines related the pasteurization process. In this process a key parameter to monitor is the
temperature: a good pasteurization process works within a range of 70 °C ÷ 75 °C, for this
reason it is important to control temperature trend during the time. In Fig. 7 (a) is illustrated the
simplified prototype infrastructure adapted to the predictive maintenance of milk pasteurization
process of two distinct production lines placed in different locations. The main function of the
ESB is the simultaneous control of pasteurization process in both different lines by predicting
possible anomalies and alerting conditions. Some researchers suggested the idea to create a
reference model based on the use of the use of Multilayered Perceptron –MLP- neural networks
and on temperature information useful to classify thermal defects in predictive maintenance
[20]. In this direction authors applied the same MLP algorithm of KNIME tool by considering
the client web-service related to the prototype of Fig. 1 (b), where the first datasource is defined
by the temperature dataset of the attribute T1 (temperature of the production line 1), and the
second datasource is defined by the temperature dataset of the attribute T2 (temperature of the
production line 2). The MLP algorithm can be viewed as a logistic regression classifier where
the input is first transformed using a learnt non-linear transformation. This transformation
projects the input data into a space where it becomes linearly separable by applying the theory
discussed in the section 2.1.1.. with a single hidden layer as intermediate layer sufficient to
make MLPs as an universal approximator.
For the training of the MLP neural network model (learner of the model) has been applied the
efficient RProp algorithm for multilayer feed-forward networks defined in [21],[22]: RProp
performs a local adaptation of the weight-updates according to the behaviour of the error
function. In this test case the neural network parameters are set as follows: the maximum
number of iteration equals to 100, the number of hidden layers is equals to 1, the number of
hidden neurons per layer is equals to 10, the class to predict is Col1 (temperatures of each
lines).
The predictive maintenance will be performed by comparing the predicted values with real
temperature trends of both datasources: in the case of a convergence will be a correct
functionalities of the line production machines, otherwise will happen potential anomalies.
11. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
11
ESB
Production line 1
Production line 2
T1
T2
(a)
(b)
Figure 7. (a) Simplified model related to predictive maintenance process concerning the working
simulation of milk pasteurization machines.(b) Architecture integrating the simplified model.
In Fig. 7 (b) is illustrated the architecture integrating web-service in the prototype ESB
infrastructure interfaced with KNIME MLP neural network algorithm.
Below is the Python code used to query the web-service "GetLinesData", instanced on the
WSO2 DSS and related to the construction of a dataset based on the data retrieved from the two
MySQL databases where are stored the data collected on two different production lines:
from requests import Session
from requests.auth import HTTPBasicAuth
from zeep import Client
from zeep.transports import Transport
from zeep.wsse.username import UsernameToken
from pandas.io.json import json_normalize
import pandas as pd
import zeep
user = 'admin'
password = 'admin'
wsdl_url = 'http://192.168.0.102:9763/services/GetLinesData?wsdl'
client = Client(wsdl_url, wsse=UsernameToken(user, password))
lines_data = client.service.GetLinesData()
my_dict = zeep.helpers.serialize_object(d)
12. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
12
my_data = []
for d in my_dict:
new_record = [lines_data ['id'], lines_data ['Line1'], lines_data ['Line2'], lines_data
['Timestamp'] ]
my_data.append (new_record)
output_table = pd.DataFrame(my_data)
output_table.columns = [ 'id', 'Line1', 'Line2', 'Timestamp' ]
In Fig. 8 is illustrated the KNIME workflow enabling the simultaneous pasteurization control
executing the RProp Learner and the MLP Predictor. The input datasets of the proposed model
are made by 168 values corresponding to the temperature measurements of seven days (1 value
of each hour) of both production lines. The “Normalizer” object is necessary to process data in
normalized temperature of a range between 0 and 1. The “Partitioning” block is able to split 80
data for the learner module (first partition) and 88 data for the MLP predictor (second
partition).
Figure 8. KNIME: MLP workflow model of predictive maintenance applied for each production line.
Figure 9 proves that Python code listed before has been loaded correctly in the KNIME engine.
13. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
13
Figure 9. KNIME: configuration panel of the “Python Source” node 1.
As example In Fig. 10 are illustrated the first ten values of the “Partitionig” of the first
production line, where the attribute Col0 indicates the hour attribute, and Col1 the normalized
measured temperature attribute. It is important to note that both partitions are characterized by
different datasets.
Figure 10. KNIME “Partinioning” object: first and second partition of production line 1 (first 10 values).
In order to check the alarm condition has been defined a gap (see Fig. 11) of amplitude of 0.2
defined as an error bar of ± 0.1 centered to the average amplitude of the prediction line. If real
temperature values will overcome the threshold lines a potential anomaly will occur during the
pasteurization process. If real temperature trend is within the gap region the system will work
correctly as shown in the theoretical case of Fig.11.
14. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
14
Predicted values
NormalisedTemperature
Time
Real values
Threshold
Threshold
Gap
Figure 11. Theoretical good matching between predicted and real temperature values.
According with performed test on both the production lines is obtained the MLP neural network
outputs of Fig. 12 and Fig. 13, where are compared the predicted temperatures versus the real
ones. Figure 12 exhibits only one alert condition: after the first transitory related to a switching on
condition, temperature values are mainly within the gap region, by indicating that the production
line works correctly. In any case it is important to check the model in the next seven day in order
to control other alert conditions: a recurrence of alert conditions will represent a potential risk of
anomaly. In Fig. 13 is illustrated the test performed on the second production line. In this second
case more alert conditions are checked thus indicating that the line could work badly.
Alert
Figure 12. KNIME MLP model output: comparison between measured data and predicted ones of the
production line 1.
15. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
15
Alert
Alert
Figure 13. KNIME MLP model output: comparison between measured data and predicted ones of the
production line 2.
According with scoring values it is possible to confirm that the applied model is reliable (a mean
absolute error equals to 0.04 is estimated). The proposed procedure can be adopted in different
applications involving healthcare [23] and for mapping industrial processes [24]. The ANN could
be implemented by means of a tailored architecture [25].
3. CONCLUSION
The goal of the proposed paper is to propose a data infrastructure model applicable in Industry 4.0
and suitable for control rooms. The performed tests prove that WSO2 ESB can be adopted to
transfer data coming from different databases and processed by the external KNIME data mining
engine through client-web services. The first example has been applied on medical data thus
confirming that the prototype model is oriented on different fields including healthcare
applications as for prevention of heart problem. Concerning industrial applications the ESB can
connect different information systems related to different production line. As proved in work,
predictive maintenance can provide important information about risks and production anomalies,
and could be applied with historical measured data in order to predict product quality by means of
provisional Xbar-charts and the R-charts, charts useful for the standard ISO 9001:2015. The
development of ANN requires a proper architecture network. In this direction ESB infrastructure
represent a good tool compatible with different operating systems and different hardware
machines thus providing in the same time web-service facilities. The shown results proved the
implementation feasibility of ANN in a ESB connected to a web-service system. The practical test
of monitoring predictive maintenance of two milk production lines by MLP ANN directs the
16. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
16
research in innovative solutions for information technologies oriented on industrial applications
and industrial research.
ACKNOWLEDGEMENTS
The work has been developed in the frameworks of the Italian projects: “Sistemi software
interfacciabili e modulari orientati alla comunicazione di Industria 4.0 e all’ analisi avanzata dei
dati: -ModularIndustry4.0-. [Interfacable and modular software systems oriented on Industry 4.0
communication and on advanced data analysis: -ModularIndustry4.0-]”. The authors would like to
thank the following researchers and collaborators: D. Barbuzzi, G. Birardi, B. Boussahel, V.
Calati, D. Carella, A. Colonna, R. Cosmo, V. Custodero, L. D’Alessandro, G. Fanelli, M. Le
Grottaglie, A. Leogrande, A. Lombardi, G. Lonigro, A. Lorusso, L. Maffei, S. Maggio, N.
Malfettone, S. F. Massari, G. Meuli, R. Porfido, O. Rizzo, D. D. Romagno, N. Savino, P.
Scagliusi, S. Selicato, G. Sicolo, M. Solazzo, M. M. Sorbo, D. Suma, F. Tarulli, E. Valenzano, V.
Vitti and M. Dal Checco.
REFERENCES
[1] Hohpe, G., & Woolf, B. (2004) “Enterprise Integration Patterns Designing, Building, and Deploying
Messaging Solutions”, Addison-Wesley.
[2] Polgar, J. (2009) “Open Source ESB in Action”,IGI Publishing.
[3] Górski, T., & Pietrasik, K. (2016) “Performance analysis of Enterprise Service Buses”, Journal of
Theoretical and Applied Computer Science, Vol. 10, No. 2, pp 16-32.
[4] Yenlo (2016) “ESB Comparison How to choose a reliable and fast ESB that fits your business
needs”, white paper.
[5] Fraunhofer Institute report: INDUSTRY 4.0 – CONNECTED, ADAPTIVE PRODUCTION. white
paper available on line.
https://www.ipt.fraunhofer.de/content/dam/ipt/en/documents/broschures/Industry%2040-
Connected%20Adaptive%20Production.pdf
[6] Theorin, A., Bengtsson, K., Provost, J., Lieder, M., Johnsson, C., Lundholm, T., Lennartson, B.
(2017) “An Event-Driven Manufacturing Information System Architecture for Industry 4.0”,
International Journal of Production Research, Vol. 55, No.5, pp1297-1311.
[7] Gröger, C., Niedermann, F., Mitschang, B. (2012) “Data Mining-Driven Manufacturing Process”,
Proceedings of the World Congress on Engineering (WCE 2012), volume III, London, U.K..
[8] Bastos, P., Lopes, I., Pire, L. (2014) “Application of Data Mining in a Maintenance System for
Failure Prediction. Safety, Reliability and Risk Analysis: Beyond the Horizon”, Steenbergen et al.
(Eds), Taylor & Francis, pp933-940.
[9] Massaro, A., Galiano, A., Meuli, G., Massari, S. F. (2018) “Overview and Application of Enabling
Technologies Oriented on Energy Routing Monitoring, on Network Installation and on Predictive
Maintenance”, International Journal of Artificial Intelligence and Applications (IJAIA), Vol. 9, No.
2, pp1-20.
[10] Winters, P., Adae, I., Silipo, R. (2014) “Anomaly Detection in Predictive Maintenance Anomaly
Detection with Time Series Analysis”, KNIME white paper.
[11] Winters, P., Silipo, R. (2015) “Anomaly Detection in Predictive Maintenance Time Alignment and
Visualization”, KNIME white paper.
17. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.9, No.3, May 2018
17
[12] Kotu, V., Deshpande, B. (2015) “Predictive Analytics and Data Mining”, Elsevier book.
[13] Wimmer, H., Powell, L. M. (2015) “A Comparison of Open Source Tools for Data Science”,
Proceedings of the Conference on Information Systems Applied Research. Wilmington, North
Carolina USA.
[14] Al-Khoder, A., Harmouch, H., “Evaluating Four Of The most Popular Open Source and Free Data
Mining Tools”, International Journal of Academic Scientific Research, Vol. 3, No. 1, pp13-23.
[15] “Generic Web Service Client Node.” 2018. [Online]. Available:
https://www.knime.com/webservice-client
[16] A. Galiano, A. Massaro, D. Barbuzzi, L. Pellicani, G. Birardi, B. Boussahel, F. De Carlo, V. Calati,
G. Lofano, L. Maffei, M. Solazzo, V. Custodero, G. Frulli, E. Frulli, F. Mancini, L. D’Alessandro, F.
Crudele, (2016) “Machine to Machine (M2M) Open Data System for Business Intelligence in
Products Massive Distribution oriented on Big Data”, International Journal of Computer Science
and Information Technologies, Vol. 7, No. 3, pp. 1332-1336, 2016.
[17] Grzegorz, J., Bartosz, A., (2015) “The Use Of IT Tools For The Simulation of Economic Processes”,
Information Systems in Management, Vol. 4, No. 2, pp 87-98.
[18] Dongare A.D., Kharde, R.R., Kachare, A.D. (2012) “Introduction to Artificial Neural Network”,
International Journal of Engineering and Innovative Technology, Vol. 2, No. 1, pp189-194.
[19] El-Khamisy, N., Ahmed Shawky Morsi El-Bhrawy, M. (2016) “Artificial Neural Networks in Data
Mining”, Journal of Computer Engineering (IOSR-JCE), Vol. 18, No.6, pp. 55-59.
[20] Irfan, U., Fan, Y., Rehanullah, K., Ling, L., Haisheng, Y., Bing, G. (2017) “Predictive Maintenance
of Power Substation Equipment by Infrared Thermography Using a Machine-Learning Approach”,
Energies, Vol. 10, No. 1987, pp1-13.
[21] Braun, H. (1993) “A Direct Adaptive Method for Faster Backpropagation Learning: the RPROP
algorithm”, Proceedings of the IEEE International Conference on Neural Networks (ICNN), Vol. 16,
pp586-591.
[22] Igel, C., Toussaint, M., Weishui, W. (2005) “RProp Using the Natural Gradient. International Series
of Numerical Mathematics” Trends and Applications in Constructive Approximation. ISNM
International Series of Numerical Mathematics, Vol. 151, pp1-15.
[23] Adhikari, N. C. D. (2018) “Prevention of Heart Problem Using Artificial Intelligence”, International
Journal of Artificial Intelligence and Applications (IJAIA), Vo., 9, No. 2, pp21-35.
[24] Fouad, R. H., ukattash, A. (2010) “Statistical Process Control Tools: A Practical guide for Jordanian
Industrial Organizations”, Jordan Journal of Mechanical and Industrial Engineering, Vol. 4, No. 6,
pp693-700.
[25] Jawandhiya, P., “Hardware Design for Machine Learning”, International Journal of Artificial
Intelligence and Applications (IJAIA), Vol. 9, No. 1, pp63-84.
Corresponding Author
Alessandro Massaro: Research & Development Chief of Dyrecta Lab s.r.l.