1) The document discusses using artificial intelligence and data analytics techniques to develop digital models from ship engine data. The data is clustered into three groups representing different engine modes.
2) Each cluster is represented as a linear model or piecewise linear approximation of vessel and system behavior. Advanced analytics are used to detect and recover data anomalies.
3) Predictive analytics use the digital models to forecast vessel navigation and system conditions. Visualization of the models supports decision making around key performance indicators.
Advanced utility data management and analytics for improved situational awar...Power System Operation
Introduction
Data analytics techniques for operation support
Applications of Data Analytics Techniques in Power Systems
Data Integration and Modeling
Data Quality and Validation
Summary and Conclusions
Multi-Dimensional Features Reduction of Consistency Subset Evaluator on Unsup...CSCJournals
This paper presents the application of multi dimensional feature reduction of Consistency Subset Evaluator (CSE) and Principal Component Analysis (PCA) and Unsupervised Expectation Maximization (UEM) classifier for imaging surveillance system. Recently, research in image processing has raised much interest in the security surveillance systems community. Weapon detection is one of the greatest challenges facing by the community recently. In order to overcome this issue, application of the UEM classifier is performed to focus on the need of detecting dangerous weapons. However, CSE and PCA are used to explore the usefulness of each feature and reduce the multi dimensional features to simplified features with no underlying hidden structure. In this paper, we take advantage of the simplified features and classifier to categorize images object with the hope to detect dangerous weapons effectively. In order to validate the effectiveness of the UEM classifier, several classifiers are used to compare the overall accuracy of the system with the compliment from the features reduction of CSE and PCA. These unsupervised classifiers include Farthest First, Densitybased Clustering and k-Means methods. The final outcome of this research clearly indicates that UEM has the ability in improving the classification accuracy using the extracted features from the multi-dimensional feature reduction of CSE. Besides, it is also shown that PCA is able to speed-up the computational time with the reduced dimensionality of the features compromising the slight decrease of accuracy.
This document summarizes various techniques for data collection in densely populated wireless sensor networks that improve energy efficiency. It discusses plain data collection, in-network aggregation, query-based collection, multipath collection, feedback-based collection, and optimal and suboptimal aggregation techniques. The key goal of these techniques is to reduce the number of transmissions needed to collect sensor data in order to prolong the network lifetime by minimizing energy consumption during data transfer.
Parametric comparison based on split criterion on classification algorithmIAEME Publication
This document presents a comparison of different attribute selection criteria for classification algorithms in stream data mining. It analyzes two common criteria - information gain and Gini index - and evaluates their impact on classification accuracy using different datasets. The results show that information gain generally achieves higher accuracy than Gini index, especially for larger data sizes. The document aims to improve the performance of stream data classification algorithms by optimizing the split criterion selection approach.
The delay in the transporting packets or data from one point to the other has become a big problem in communication network. This can be surmounted by characterizing a data network with a view in finding out the throughput performance, modeling a dynamic routing algorithm that provides paths that change dynamically in response to network traffic and congestion, thereby increasing network performance because data travel less congested paths, simulating the intelligence routing algorithm using Ant net that has properties like learning, reasoning and decision making with respect to packet transmission in a data network using MATLAB SIMULINK as a tool and comparing the performance of the model to existing routing algorithm Aneke Israel Chinagolum | Chineke Amaechi Hyacenth | Udeh Chukwuma Callistus. W "Intelligent Routing Algorithm Using Antnet" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-1 , December 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18990.pdf
http://www.ijtsrd.com/computer-science/artificial-intelligence/18990/intelligent-routing-algorithm-using-antnet/aneke-israel-chinagolum
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
Reverse Engineering Approach for System Condition Monitoring under Big Data a...Lokukaluge Prasad Perera
This document proposes a reverse engineering approach using advanced data analytics to monitor ship engine condition from big data. It involves clustering raw sensor data into different engine operational modes. Each cluster represents a linear "digital model" of the system. Descriptive analytics identify data anomalies, which diagnostic analytics then recover or remove. Predictive analytics forecast behavior by connecting digital models. Visual analytics visualize relationships between parameters. Together this creates knowledge for decision analytics using key performance indicators. The approach aims to help industrial digitization by allowing data structures to self-learn, self-clean, self-compress/expand, self-visualize, and develop intelligent models through reverse engineering raw data into system components.
1) The document discusses using artificial intelligence and data analytics techniques to develop digital models from ship engine data. The data is clustered into three groups representing different engine modes.
2) Each cluster is represented as a linear model or piecewise linear approximation of vessel and system behavior. Advanced analytics are used to detect and recover data anomalies.
3) Predictive analytics use the digital models to forecast vessel navigation and system conditions. Visualization of the models supports decision making around key performance indicators.
Advanced utility data management and analytics for improved situational awar...Power System Operation
Introduction
Data analytics techniques for operation support
Applications of Data Analytics Techniques in Power Systems
Data Integration and Modeling
Data Quality and Validation
Summary and Conclusions
Multi-Dimensional Features Reduction of Consistency Subset Evaluator on Unsup...CSCJournals
This paper presents the application of multi dimensional feature reduction of Consistency Subset Evaluator (CSE) and Principal Component Analysis (PCA) and Unsupervised Expectation Maximization (UEM) classifier for imaging surveillance system. Recently, research in image processing has raised much interest in the security surveillance systems community. Weapon detection is one of the greatest challenges facing by the community recently. In order to overcome this issue, application of the UEM classifier is performed to focus on the need of detecting dangerous weapons. However, CSE and PCA are used to explore the usefulness of each feature and reduce the multi dimensional features to simplified features with no underlying hidden structure. In this paper, we take advantage of the simplified features and classifier to categorize images object with the hope to detect dangerous weapons effectively. In order to validate the effectiveness of the UEM classifier, several classifiers are used to compare the overall accuracy of the system with the compliment from the features reduction of CSE and PCA. These unsupervised classifiers include Farthest First, Densitybased Clustering and k-Means methods. The final outcome of this research clearly indicates that UEM has the ability in improving the classification accuracy using the extracted features from the multi-dimensional feature reduction of CSE. Besides, it is also shown that PCA is able to speed-up the computational time with the reduced dimensionality of the features compromising the slight decrease of accuracy.
This document summarizes various techniques for data collection in densely populated wireless sensor networks that improve energy efficiency. It discusses plain data collection, in-network aggregation, query-based collection, multipath collection, feedback-based collection, and optimal and suboptimal aggregation techniques. The key goal of these techniques is to reduce the number of transmissions needed to collect sensor data in order to prolong the network lifetime by minimizing energy consumption during data transfer.
Parametric comparison based on split criterion on classification algorithmIAEME Publication
This document presents a comparison of different attribute selection criteria for classification algorithms in stream data mining. It analyzes two common criteria - information gain and Gini index - and evaluates their impact on classification accuracy using different datasets. The results show that information gain generally achieves higher accuracy than Gini index, especially for larger data sizes. The document aims to improve the performance of stream data classification algorithms by optimizing the split criterion selection approach.
The delay in the transporting packets or data from one point to the other has become a big problem in communication network. This can be surmounted by characterizing a data network with a view in finding out the throughput performance, modeling a dynamic routing algorithm that provides paths that change dynamically in response to network traffic and congestion, thereby increasing network performance because data travel less congested paths, simulating the intelligence routing algorithm using Ant net that has properties like learning, reasoning and decision making with respect to packet transmission in a data network using MATLAB SIMULINK as a tool and comparing the performance of the model to existing routing algorithm Aneke Israel Chinagolum | Chineke Amaechi Hyacenth | Udeh Chukwuma Callistus. W "Intelligent Routing Algorithm Using Antnet" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-1 , December 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18990.pdf
http://www.ijtsrd.com/computer-science/artificial-intelligence/18990/intelligent-routing-algorithm-using-antnet/aneke-israel-chinagolum
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
Reverse Engineering Approach for System Condition Monitoring under Big Data a...Lokukaluge Prasad Perera
This document proposes a reverse engineering approach using advanced data analytics to monitor ship engine condition from big data. It involves clustering raw sensor data into different engine operational modes. Each cluster represents a linear "digital model" of the system. Descriptive analytics identify data anomalies, which diagnostic analytics then recover or remove. Predictive analytics forecast behavior by connecting digital models. Visual analytics visualize relationships between parameters. Together this creates knowledge for decision analytics using key performance indicators. The approach aims to help industrial digitization by allowing data structures to self-learn, self-clean, self-compress/expand, self-visualize, and develop intelligent models through reverse engineering raw data into system components.
Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...Lokukaluge Prasad Perera
A novel mathematical framework to support industrial digitization of shipping is presented in this study. The framework supports a data flow path, i.e. from Industrial IoT (i.e. with Big Data) to Predictive Analytics, where digital models with advanced data analytics are introduced. The digital models are derived from ship performance and navigation data sets and a combination of such models facilitates towards the proposed Predictive Analytics. Since the respective data sets are used to derive the Predictive Analytics, this mathematical framework is also categorized as a reverse engineering approach. Furthermore, a data anomaly detection and recover procedure that is associated with the same framework to improve the respective data quality are also described in this study.
Role of Big Data Analytics in Power System Application Ravi v angadi asst. pr...PresidencyUniversity
This document discusses the role of big data analytics in power systems. It begins by introducing big data and its growth across various sectors. It then discusses sources of big data in power systems, including measurements from smart meters, phasor measurement units, and weather data. The document outlines characteristics of big data in power systems including volume, velocity, variety, and veracity. It identifies important applications of big data analytics for power systems like performance analysis, load management, and forecasting. Finally, it presents machine learning and analytics techniques that can be used for big data in power systems and provides a flow diagram of big data analytics applications in power systems.
This document discusses various data mining functionalities including classification, clustering, association rule mining, and numeric prediction. It provides examples of each functionality using sample datasets. Classification techniques discussed include decision trees, rules, neural networks, naive Bayes, and support vector machines. Clustering is described as an unsupervised technique to group similar instances. Association rule mining is used to find frequent patterns and correlations in transactional data. Numeric prediction extends classification to predict numeric rather than categorical targets.
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Luigi Vanfretti
The document discusses model-simulation-and-measurement-based systems engineering of power system synchrophasor systems. It outlines the speaker's background and research interests in modeling and simulation technologies for cyber-physical power systems. The talk motivates the need for these technologies to enable applications like wide-area control systems using synchronized phasor measurements. It also discusses challenges in developing smart grids as complex cyber-physical systems and the roles that modeling and simulation can play in addressing these challenges.
This document discusses big data mining and the Internet of Things. It first presents challenges with big data mining including modeling big data characteristics, identifying key challenges, and issues with statistical analysis of IoT data. It then describes an architecture called IOT-StatisticDB that provides a generalized schema for storing sensor data from IoT devices and a distributed system for parallel computing and statistical analysis of IoT big data. The system includes query operators for data retrieval and statistical analysis of IoT data in areas like transportation networks.
This document discusses big data mining and the Internet of Things. It first presents challenges with big data mining including modeling big data characteristics, identifying key challenges, and issues with statistical analysis of IoT data. It then describes an architecture called IOT-StatisticDB that provides a generalized schema for storing sensor data from IoT devices and a distributed system for parallel computing and statistical analysis of IoT big data. The system includes query operators for data retrieval and statistical analysis of IoT data in areas like transportation networks.
1) The document discusses how physical industries are becoming more data-driven as physical assets are increasingly instrumented and interconnected, generating large amounts of data.
2) It argues that both data-driven analytical approaches and traditional modeling approaches are needed to gain insights from data, and provides examples of hybrid approaches that integrate the two.
3) Successfully applying insights from data requires not just building analytical models, but also integrating findings into business processes - an area most organizations currently struggle with.
Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker
This document provides an overview and comparison of three classification algorithms: K-Nearest Neighbors (KNN), Decision Trees, and Bayesian Networks. It discusses each algorithm, including how KNN classifies data based on its k nearest neighbors. Decision Trees classify data based on a tree structure of decisions, and Bayesian Networks classify data based on probabilities of relationships between variables. The document conducts an analysis of these three algorithms to determine which has the best performance and lowest time complexity for classification tasks based on evaluating a mock dataset over 24 months.
Analyst’s Nightmare or Laundering Massive SpreadsheetsPyData
By Feyzi Bagirov
PyData New York City 2017
Poor data quality frequently invalidates data analysis when performed on Excel data that underwent transformations, imputations, and manual manipulations. In this talk we will use Pandas to walk through Excel data analysis and illustrate several common pitfalls that make this analysis invalid.
— The healthcare industry is considered one of the
largest industry in the world. The healthcare industry is same as
the medical industries having the largest amount of health related
and medical related data. This data helps to discover useful
trends and patters that can be used in diagnosis and decision
making. Clustering techniques like K-means, D-streams,
COBWEB, EM have been used for healthcare purposes like heart
disease diagnosis, cancer detection etc. This paper focuses on the
use of K-means and D-stream algorithm in healthcare. This
algorithms were used in healthcare to determine whether a
person is fit or unfit and this fitness decision was taken based on
his/her historical and current data. Both the clustering
algorithms were analyzed by applying them on patients current
biomedical historical databases, this analysis depends on the
attributes like peripheral blood oxygenation, diastolic arterial
blood pressure, systolic arterial blood pressure, heart rate,
heredity, obesity, and this fitness decision was taken based on
his/her historical and current data. Both the clustering
algorithms were analyzed by applying them on patients current
biomedical historical databases, this analysis depends on the
attributes like peripheral blood oxygenation, diastolic arterial
blood pressure, systolic arterial blood pressure, heart rate,
heredity, obesity, cigarette smoking. By analyzing both the
algorithm it was found that the Density-based clustering
algorithm i.e. the D-stream algorithm proves to give more
accurate results than K-means when used for cluster formation of
historical biomedical data. D-stream algorithm overcomes
drawbacks of K-means algorithm
Wanted!: Open M&S Standards and Technologies for the Smart Grid - Introducing...Luigi Vanfretti
Title:
Wanted! - Open M&S Standards and Technologies for the Smart Grid
Subtitle:
Introducing the Open Source iTesla Power Systems Modelica Library and the RaPId Toolbox for Model Identification and Validation
Abstract:
Modeling and Simulation (M&S) technologies have a broad set of applications in power systems, from infrastructure planning, through real-time testing of components, and even for training operators to use decision support systems. However, power system M&S technologies face a great challenge to meet when designing, testing, operating and controlling cyber-physical and sustainable electrical energy systems and components, a.k.a “Smart Grids”.
The speaker claims that open M&S standards can have a large role to play in the development of Smart Grids. This claim will be justified with three examples.
The first example describes the experience gained during the EU FP7 iTesla project where the iTesla Power Systems Modelica Library (iPSL) was designed using the Modelica language. The Modelica language, being standardized and equation-based, has proven valuable for the project for model exchange, and even simulation of actual power networks.
Within the iTesla project, the KTH SmarTS Lab research group has been also applying the FMI standard for model exchange in order to develop a software prototype called RaPId. The RaPId Toolbox aims to provide a “virtual laboratory” to solve parameter identification and model validation problems for any kind of model represented in an FMU, but specifically, for power systems.
The third example comes from a collaboration with Xogeny. It will be shown how it is possible to exploit the FMI to decouple the model from the simulator tool, and thus, exploit the model in unforeseen ways. This shows that is possible develop customized and stand-alone analysis tools using web technologies, giving analyst more time for “analysis”. This approach has an enormous potential for typical analysis applications, but even more, for education.
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
Clustering is the process of grouping a set of objects
into classes of similar objects. Dynamic clustering comes in a
new research area that is concerned about dataset with dynamic
aspects. It requires updates of the clusters whenever new data
records are added to the dataset and may result in a change of
clustering over time. When there is a continuous update and
huge amount of dynamic data, rescan the database is not
possible in static data mining. But this is possible in Dynamic
data mining process. This dynamic data mining occurs when
the derived information is present for the purpose of analysis
and the environment is dynamic, i.e. many updates occur.
Since this has now been established by most researchers and
they will move into solving some of the problems and the
research is to concentrate on solving the problem of using data
mining dynamic databases. This paper gives some
investigation of existing work done in some papers related with
dynamic clustering and incremental data clustering
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...IRJET Journal
This document proposes a hybrid model for medical data mining that uses unsupervised filtering followed by ant colony optimization and multiclass support vector machines. It first discusses data mining and describes ant colony optimization, random forests, and ant colony decision trees. It then explains the proposed hybrid model, which applies unsupervised filtering techniques to raw medical data before using ant colony optimization to build a decision tree. Finally, it briefly introduces multiclass support vector machines as the final component of the hybrid model. The overall goal is to extract useful information and patterns from medical data using this combined approach.
Concept Drift Identification using Classifier Ensemble Approach IJECEIAES
Abstract:-In Internetworking system, the huge amount of data is scattered, generated and processed over the network. The data mining techniques are used to discover the unknown pattern from the underlying data. A traditional classification model is used to classify the data based on past labelled data. However in many current applications, data is increasing in size with fluctuating patterns. Due to this new feature may arrive in the data. It is present in many applications like sensornetwork, banking and telecommunication systems, financial domain, Electricity usage and prices based on its demand and supplyetc .Thus change in data distribution reduces the accuracy of classifying the data. It may discover some patterns as frequent while other patterns tend to disappear and wrongly classify. To mine such data distribution, traditionalclassification techniques may not be suitable as the distribution generating the items can change over time so data from the past may become irrelevant or even false for the current prediction. For handlingsuch varying pattern of data, concept drift mining approach is used to improve the accuracy of classification techniques. In this paper we have proposed ensemble approach for improving the accuracy of classifier. The ensemble classifier is applied on 3 different data sets. We investigated different features for the different chunk of data which is further given to ensemble classifier. We observed the proposed approach improves the accuracy of classifier for different chunks of data.
Clustering for Stream and Parallelism (DATA ANALYTICS)DheerajPachauri
The document summarizes information about a group project involving data stream clustering. It lists the group members and then discusses key concepts related to data stream clustering like requirements for algorithms, common algorithm types and steps, prototypes and windows. It also touches on outliers and applications of clustering.
This document discusses crowd density estimation using baseline filtering. It begins with an abstract describing the challenges of detecting and tracking objects in crowded scenes due to occlusions. It then reviews related works on component-based people detection, Bayesian tracking using shape models, and neural network-based people counting. The implementation section describes extracting foreground from background, computing crowd density as a function of foreground pixels, and estimating head counts to determine the total number of people. Screenshots show results of segmentation, preprocessing, tracking, and counting frames. It concludes that the proposed method estimates crowd density using movement, size, and height features with particle filtering and clustering.
IRJET- Fault Detection and Prediction of Failure using Vibration AnalysisIRJET Journal
This document discusses fault detection and prediction of failures in rotating equipment using vibration analysis. It begins by introducing vibration analysis as a method to monitor machines and detect faults in rotating components that may cause failures. It then discusses how motor vibration is measured and analyzed using techniques like spectrum analysis to identify faults like unbalance, bearing issues, or broken rotor bars. The document proposes decomposing vibration signals using intrinsic mode functions and calculating the Gabor representation's frequency marginal to identify fault types using classifiers like support vector machines or random forests. It provides context on data mining techniques relevant to this type of fault prediction problem.
The document provides an overview of UiT's autonomous ship program, which aims to develop ship intelligence and autonomous navigation capabilities using machine learning and deep neural networks. It discusses 1) using sensors and deep learning to capture navigator expertise, 2) developing a ship intelligence framework with key components like DNNs and safety systems, and 3) conducting experiments in bridges and at sea to advance situation awareness for autonomous ships.
More Related Content
Similar to Data Driven Industrial Digitalization through Reverse Engineering of Systems
Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...Lokukaluge Prasad Perera
A novel mathematical framework to support industrial digitization of shipping is presented in this study. The framework supports a data flow path, i.e. from Industrial IoT (i.e. with Big Data) to Predictive Analytics, where digital models with advanced data analytics are introduced. The digital models are derived from ship performance and navigation data sets and a combination of such models facilitates towards the proposed Predictive Analytics. Since the respective data sets are used to derive the Predictive Analytics, this mathematical framework is also categorized as a reverse engineering approach. Furthermore, a data anomaly detection and recover procedure that is associated with the same framework to improve the respective data quality are also described in this study.
Role of Big Data Analytics in Power System Application Ravi v angadi asst. pr...PresidencyUniversity
This document discusses the role of big data analytics in power systems. It begins by introducing big data and its growth across various sectors. It then discusses sources of big data in power systems, including measurements from smart meters, phasor measurement units, and weather data. The document outlines characteristics of big data in power systems including volume, velocity, variety, and veracity. It identifies important applications of big data analytics for power systems like performance analysis, load management, and forecasting. Finally, it presents machine learning and analytics techniques that can be used for big data in power systems and provides a flow diagram of big data analytics applications in power systems.
This document discusses various data mining functionalities including classification, clustering, association rule mining, and numeric prediction. It provides examples of each functionality using sample datasets. Classification techniques discussed include decision trees, rules, neural networks, naive Bayes, and support vector machines. Clustering is described as an unsupervised technique to group similar instances. Association rule mining is used to find frequent patterns and correlations in transactional data. Numeric prediction extends classification to predict numeric rather than categorical targets.
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Luigi Vanfretti
The document discusses model-simulation-and-measurement-based systems engineering of power system synchrophasor systems. It outlines the speaker's background and research interests in modeling and simulation technologies for cyber-physical power systems. The talk motivates the need for these technologies to enable applications like wide-area control systems using synchronized phasor measurements. It also discusses challenges in developing smart grids as complex cyber-physical systems and the roles that modeling and simulation can play in addressing these challenges.
This document discusses big data mining and the Internet of Things. It first presents challenges with big data mining including modeling big data characteristics, identifying key challenges, and issues with statistical analysis of IoT data. It then describes an architecture called IOT-StatisticDB that provides a generalized schema for storing sensor data from IoT devices and a distributed system for parallel computing and statistical analysis of IoT big data. The system includes query operators for data retrieval and statistical analysis of IoT data in areas like transportation networks.
This document discusses big data mining and the Internet of Things. It first presents challenges with big data mining including modeling big data characteristics, identifying key challenges, and issues with statistical analysis of IoT data. It then describes an architecture called IOT-StatisticDB that provides a generalized schema for storing sensor data from IoT devices and a distributed system for parallel computing and statistical analysis of IoT big data. The system includes query operators for data retrieval and statistical analysis of IoT data in areas like transportation networks.
1) The document discusses how physical industries are becoming more data-driven as physical assets are increasingly instrumented and interconnected, generating large amounts of data.
2) It argues that both data-driven analytical approaches and traditional modeling approaches are needed to gain insights from data, and provides examples of hybrid approaches that integrate the two.
3) Successfully applying insights from data requires not just building analytical models, but also integrating findings into business processes - an area most organizations currently struggle with.
Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker
This document provides an overview and comparison of three classification algorithms: K-Nearest Neighbors (KNN), Decision Trees, and Bayesian Networks. It discusses each algorithm, including how KNN classifies data based on its k nearest neighbors. Decision Trees classify data based on a tree structure of decisions, and Bayesian Networks classify data based on probabilities of relationships between variables. The document conducts an analysis of these three algorithms to determine which has the best performance and lowest time complexity for classification tasks based on evaluating a mock dataset over 24 months.
Analyst’s Nightmare or Laundering Massive SpreadsheetsPyData
By Feyzi Bagirov
PyData New York City 2017
Poor data quality frequently invalidates data analysis when performed on Excel data that underwent transformations, imputations, and manual manipulations. In this talk we will use Pandas to walk through Excel data analysis and illustrate several common pitfalls that make this analysis invalid.
— The healthcare industry is considered one of the
largest industry in the world. The healthcare industry is same as
the medical industries having the largest amount of health related
and medical related data. This data helps to discover useful
trends and patters that can be used in diagnosis and decision
making. Clustering techniques like K-means, D-streams,
COBWEB, EM have been used for healthcare purposes like heart
disease diagnosis, cancer detection etc. This paper focuses on the
use of K-means and D-stream algorithm in healthcare. This
algorithms were used in healthcare to determine whether a
person is fit or unfit and this fitness decision was taken based on
his/her historical and current data. Both the clustering
algorithms were analyzed by applying them on patients current
biomedical historical databases, this analysis depends on the
attributes like peripheral blood oxygenation, diastolic arterial
blood pressure, systolic arterial blood pressure, heart rate,
heredity, obesity, and this fitness decision was taken based on
his/her historical and current data. Both the clustering
algorithms were analyzed by applying them on patients current
biomedical historical databases, this analysis depends on the
attributes like peripheral blood oxygenation, diastolic arterial
blood pressure, systolic arterial blood pressure, heart rate,
heredity, obesity, cigarette smoking. By analyzing both the
algorithm it was found that the Density-based clustering
algorithm i.e. the D-stream algorithm proves to give more
accurate results than K-means when used for cluster formation of
historical biomedical data. D-stream algorithm overcomes
drawbacks of K-means algorithm
Wanted!: Open M&S Standards and Technologies for the Smart Grid - Introducing...Luigi Vanfretti
Title:
Wanted! - Open M&S Standards and Technologies for the Smart Grid
Subtitle:
Introducing the Open Source iTesla Power Systems Modelica Library and the RaPId Toolbox for Model Identification and Validation
Abstract:
Modeling and Simulation (M&S) technologies have a broad set of applications in power systems, from infrastructure planning, through real-time testing of components, and even for training operators to use decision support systems. However, power system M&S technologies face a great challenge to meet when designing, testing, operating and controlling cyber-physical and sustainable electrical energy systems and components, a.k.a “Smart Grids”.
The speaker claims that open M&S standards can have a large role to play in the development of Smart Grids. This claim will be justified with three examples.
The first example describes the experience gained during the EU FP7 iTesla project where the iTesla Power Systems Modelica Library (iPSL) was designed using the Modelica language. The Modelica language, being standardized and equation-based, has proven valuable for the project for model exchange, and even simulation of actual power networks.
Within the iTesla project, the KTH SmarTS Lab research group has been also applying the FMI standard for model exchange in order to develop a software prototype called RaPId. The RaPId Toolbox aims to provide a “virtual laboratory” to solve parameter identification and model validation problems for any kind of model represented in an FMU, but specifically, for power systems.
The third example comes from a collaboration with Xogeny. It will be shown how it is possible to exploit the FMI to decouple the model from the simulator tool, and thus, exploit the model in unforeseen ways. This shows that is possible develop customized and stand-alone analysis tools using web technologies, giving analyst more time for “analysis”. This approach has an enormous potential for typical analysis applications, but even more, for education.
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
Clustering is the process of grouping a set of objects
into classes of similar objects. Dynamic clustering comes in a
new research area that is concerned about dataset with dynamic
aspects. It requires updates of the clusters whenever new data
records are added to the dataset and may result in a change of
clustering over time. When there is a continuous update and
huge amount of dynamic data, rescan the database is not
possible in static data mining. But this is possible in Dynamic
data mining process. This dynamic data mining occurs when
the derived information is present for the purpose of analysis
and the environment is dynamic, i.e. many updates occur.
Since this has now been established by most researchers and
they will move into solving some of the problems and the
research is to concentrate on solving the problem of using data
mining dynamic databases. This paper gives some
investigation of existing work done in some papers related with
dynamic clustering and incremental data clustering
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...IRJET Journal
This document proposes a hybrid model for medical data mining that uses unsupervised filtering followed by ant colony optimization and multiclass support vector machines. It first discusses data mining and describes ant colony optimization, random forests, and ant colony decision trees. It then explains the proposed hybrid model, which applies unsupervised filtering techniques to raw medical data before using ant colony optimization to build a decision tree. Finally, it briefly introduces multiclass support vector machines as the final component of the hybrid model. The overall goal is to extract useful information and patterns from medical data using this combined approach.
Concept Drift Identification using Classifier Ensemble Approach IJECEIAES
Abstract:-In Internetworking system, the huge amount of data is scattered, generated and processed over the network. The data mining techniques are used to discover the unknown pattern from the underlying data. A traditional classification model is used to classify the data based on past labelled data. However in many current applications, data is increasing in size with fluctuating patterns. Due to this new feature may arrive in the data. It is present in many applications like sensornetwork, banking and telecommunication systems, financial domain, Electricity usage and prices based on its demand and supplyetc .Thus change in data distribution reduces the accuracy of classifying the data. It may discover some patterns as frequent while other patterns tend to disappear and wrongly classify. To mine such data distribution, traditionalclassification techniques may not be suitable as the distribution generating the items can change over time so data from the past may become irrelevant or even false for the current prediction. For handlingsuch varying pattern of data, concept drift mining approach is used to improve the accuracy of classification techniques. In this paper we have proposed ensemble approach for improving the accuracy of classifier. The ensemble classifier is applied on 3 different data sets. We investigated different features for the different chunk of data which is further given to ensemble classifier. We observed the proposed approach improves the accuracy of classifier for different chunks of data.
Clustering for Stream and Parallelism (DATA ANALYTICS)DheerajPachauri
The document summarizes information about a group project involving data stream clustering. It lists the group members and then discusses key concepts related to data stream clustering like requirements for algorithms, common algorithm types and steps, prototypes and windows. It also touches on outliers and applications of clustering.
This document discusses crowd density estimation using baseline filtering. It begins with an abstract describing the challenges of detecting and tracking objects in crowded scenes due to occlusions. It then reviews related works on component-based people detection, Bayesian tracking using shape models, and neural network-based people counting. The implementation section describes extracting foreground from background, computing crowd density as a function of foreground pixels, and estimating head counts to determine the total number of people. Screenshots show results of segmentation, preprocessing, tracking, and counting frames. It concludes that the proposed method estimates crowd density using movement, size, and height features with particle filtering and clustering.
IRJET- Fault Detection and Prediction of Failure using Vibration AnalysisIRJET Journal
This document discusses fault detection and prediction of failures in rotating equipment using vibration analysis. It begins by introducing vibration analysis as a method to monitor machines and detect faults in rotating components that may cause failures. It then discusses how motor vibration is measured and analyzed using techniques like spectrum analysis to identify faults like unbalance, bearing issues, or broken rotor bars. The document proposes decomposing vibration signals using intrinsic mode functions and calculating the Gabor representation's frequency marginal to identify fault types using classifiers like support vector machines or random forests. It provides context on data mining techniques relevant to this type of fault prediction problem.
Similar to Data Driven Industrial Digitalization through Reverse Engineering of Systems (20)
The document provides an overview of UiT's autonomous ship program, which aims to develop ship intelligence and autonomous navigation capabilities using machine learning and deep neural networks. It discusses 1) using sensors and deep learning to capture navigator expertise, 2) developing a ship intelligence framework with key components like DNNs and safety systems, and 3) conducting experiments in bridges and at sea to advance situation awareness for autonomous ships.
This document discusses ship collision avoidance for autonomous ships. It proposes a Ship Intelligence Framework (SIF) that uses deep neural networks trained on real-world ship navigation data to mimic human navigator behavior. The SIF has two phases: a training phase where networks are trained to estimate collision risk and determine avoidance decisions, and an execution phase where networks generate avoidance actions. It also discusses how collision risk is estimated and how avoidance should comply with Collision Regulations (COLREGs). The framework aims to develop autonomous ship technology while ensuring regulatory compliance.
This document discusses a proposed framework for an intelligent collision avoidance system for autonomous ships. It begins with introductions to transportation systems, ship maneuvering challenges, and the potential for artificial intelligence (AI) solutions. It then presents the Ship Intelligence Framework (SIF), which uses deep neural networks trained on real-world shipping data to mimic human navigator behavior. The framework estimates collision risk and determines avoidance decisions, which are then executed through ship control systems. The goal is for AI to overcome challenges of ship navigation by cloning human decision-making.
This document discusses the UiT Autonomous Ship Program and its research on technologies to support autonomous maritime transportation systems. It proposes a ship intelligence framework (SIF) that uses deep neural networks (DNNs) trained on large datasets to mimic human ship navigator behavior. The goals are to overcome issues with ship controllability and replace human navigators. A decision support system would provide an adequate safety buffer to help DNNs handle unexpected situations. The framework is conceptualized based on factors behind successful self-driving cars, and aims to train DNNs using real-world ship navigation data to achieve accurate autonomous control.
AUTONOMOUS SHIP NAVIGATION UNDER DEEP LEARNING AND THE CHALLENGES IN COLREGSLokukaluge Prasad Perera
This document discusses challenges and a proposed framework for autonomous ship navigation using deep learning. It outlines several key points:
1) Future autonomous ships will be agent-based systems with distributed intelligence and decision-making abilities to navigate autonomously. Deep learning shows promise in capturing helmsman behavior for ship intelligence.
2) Additional decision support is needed for collision avoidance and situation awareness. A framework is proposed using various maritime technologies to achieve autonomy.
3) Evaluating autonomous ship behavior and compliance with regulations poses challenges, such as regulatory failures and limitations in controlling underactuated vessels. Testable systems are proposed to evaluate ship encounter situations under different conditions.
Digitalization of Sea going Vessels under High Dimensional Data Driven Models...Lokukaluge Prasad Perera
Digital models are being developed to handle large datasets collected from Internet of Things sensors on ships. These digital models can help overcome challenges in areas like model uncertainty, erroneous data, and high computational needs. The models are created using machine learning algorithms to identify clusters in ship performance and navigation parameters. They represent the relationships between factors like engine output, propeller characteristics, and vessel trim. Digital models have advantages like being self-learning, self-cleaning, and able to visualize vessel operations. They may help identify sensor faults, reduce data dimensions, and support efficient data handling frameworks.
Intelligent Decision Making Framework for Ship Collision Avoidance based on C...Lokukaluge Prasad Perera
The document summarizes research on developing an intelligent decision-making framework for autonomous ship collision avoidance. It presents a framework with modules for vessel traffic monitoring, collision detection, parallel fuzzy-logic based decision making, and sequential Bayesian action formulation. Computational simulations and experiments with a scaled autonomous ship model demonstrate its ability to detect collision risks and generate avoidance maneuvers in accordance with international regulations. The framework shows potential to reduce human errors causing maritime accidents by providing intelligent guidance for autonomous navigation and collision avoidance.
The document discusses handling big data in ship performance and navigation monitoring. It presents a data handling framework that includes developing digital models from data clusters, using principal component analysis to analyze the clusters, and extracting information to reduce parameters while preserving information. The framework allows for data projection, sensor fault detection, integrity verification from other sources, and data visualization to support decision making. The talk outlines developing these techniques to better handle large scale data from ships.
Various industrial challenges in full scale data handling situations in shipping are considered in this study. These large scale data handling approaches are often categorized as "Big Data" challenges; therefore various solutions to overcome such situations are identified. The proposed approach consists of a marine engine centered data flow path with various data handling layers to address the same challenges. These layers are categorized as: sensor fault detection, data classification, data compression, data transmission and receiver, data expansion, integrity verification, and data regression. The functionalities of each data handling layer with respect to ship performance and navigation information of a selected vessel are discussed and additional challenges that are encountered during this process are also summarized. Hence, these results can be used to develop data analytics that are related to energy efficiency and system reliability applications of shipping.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Data Driven Industrial Digitalization through Reverse Engineering of Systems
1. Data Driven Industrial Digitalization
through Reverse Engineering of Systems
Lokukaluge Prasad Perera
September 12th, 2019
1st Workshop on Digitalization and Blockchain – Managerial and Organizational Implications
University of South-Eastern Norway, Campus Vestfold.
2. Outline
• Industrial Digitalization.
• Data Analytics.
• From Newton's Law to Deep Neural Networks
: Autoencoders.
• Shipping Industrial Example.
• Conclusions.
3. Industrial Digitalization in Shipping
" Poor data quality costs…
the US economy around US$ 3.1 trillions per year"*
"1 in 3 Business Leaders don't trust…
the information they use to make decisions "*
"27% of the respondents in a survey…
were unsure of how much of their data sets are inaccurate"*
*IBM, The four V's of Big Data, URL:http://www.ibmbigdatahub.com/infographic/four-vs-big-data, 2017.
4. Why Data Analytics ..?
• Conventional Models
– Various Conventional Models in the shipping industry for various applications.
– Some challenges in handling Big Data : erroneous data conditions, system-model
uncertainty, estimation algorithm failures, data visualization challenges and high
computational power.
• Machine Learning & Statistical Analysis
– Machine Learning (ML) & Statistical Analysis will play an important role in
analyzing such data sets.
– Statistical Analysis will guide ML Algorithms.
– Towards Data Driven Models: Digital Models.
– It is Geometrical…
• Domain Knowledge
– Ship Dynamics & Hydrodynamics.
– Automation & Navigation Systems.
5. Data Analytics Types
– Digital model consists of data driven relationships.
– Descriptive analytics identifies various data anomalies.
– Diagnostic analytics recovers/removes such data anomalies.
– Predictive analytics forecasts vessel and ship system behavior.
– Visual analytics visualizes the same information.
– The information creates Advanced Knowledge and that will lead to Industrial Intelligence.
– Both advanced knowledge and industrial intelligence support Decision Analytics.
– Decision analytics may consist of appropriate Key Performance Indicators (i.e. KPIs)
6. Newton's Laws of Motion
– "A body at rest will remain at rest, and a body
in motion will remain in motion unless it is
acted upon by an external force."
– "The force acting on an object is equal to the
mass of that object times its acceleration.“
F = ma
where F is force, m is mass, and a is acceleration.
– "For every action, there is an equal and
opposite reaction."
Source: https://www.livescience.com
7. Newton's Laws of Motion
F = ma
F is force,
m is mass
a is acceleration.
16. Singular Values & Vectors
– The structure of each data cluster is denoted by several vectors: singular vectors
(i.e. associated with the respective singular values)
– Singular values and vectors represent the building blocks of electrical and
mechanical systems
– System behavior, i.e. system information, can be accommodated into the same.
17. Digital Models
– Three-dimensional vector space with the right-hand coordinate system.
– Three data clusters, i.e. system states, with the respective mean vectors.
– Each data cluster consists of local operational information of the respective
system.
– The structure of each data cluster is denoted by several vectors: singular vectors
(i.e. associated with the respective singular values).
– Each singular vector consists of local operational information of the respective
systems.
– Each cluster is a linear model, i.e. Piecewise linearization: the best approximation
of a nonlinear function as a piecewise linear function.
– The system can jump from one state to another state in a high dimensional data
space.
– Some data clusters may relate to data anomalies or system abnormal events.
18. Model Complexity
– From Components to System of Systems
– Various Model Levels
– High dimensional Digital Models
– From Big Data to Low Level Models
19. Ship Performance & Navigation Data Set
Parameter Mini. Max.
1. Avg. draft (m) 0 15
2. STW (Knots)- Speed through water 3 20
3. Engine power (kW) 1000 8000
4. Shaft speed (rpm) 20 120
5. Engine fuel cons. (Tons/day) 1 40
6. SOG (Knots) – Speed over ground 0 20
7. Trim (m) -2 6
8. Rel. wind speed (m/s) 0 25
9. Rel. wind direction (deg) 2 360
10. Aux. engine fuel cons. (Tons/day) 0 8
20. Ship Engine Data
in Histograms
– The vessel is a bulk carrier with ship
length: 225 (m) and beam: 32.29 (m)
– Three parameters: engine speed,
power and fuel consumption
– Engine data are clustered around
three Gaussian type distributions
– Three engine modes of this vessel
– Ship performance and navigation
data sets are often clustered in a
high dimensional space
– Those clusters relate to vessel
navigation and ship system
operational conditions
21. Ship Engine Data
– Two parameters: engine speed and
power.
– Combined kernel density estimation
(multivariate KDE) with the
respective univariate KDEs.
– Engine data are clustered around
three Gaussian type distributions.
– Three engine modes of this vessel.
– Ship performance and navigation
data sets are often clustered in a
high dimensional space.
– That introduce the discreteness (i.e.
digital-ness) into the proposed
models.
22. Digital Models
– Each cluster is a linear model, i.e. Piecewise linearization: the best
approximation of a nonlinear function as a piecewise linear function.
– The vessel & ship system can jump from one state to another state in a high
dimensional data space.
– Some data clusters may relate to data anomalies or system abnormal events.
23. – Digital models interact with the descriptive and diagnostic analytics to improve the data quality
– Data anomaly filter 1: missing data points and preliminary data anomalies (i.e. Min-Max values)
detected
– Data anomaly filter 2: Additional data anomalies (i.e. the outliers of digital models) detected
– Data anomalies send to separate groups where the data anomalies against known and unknown
sensor and DAQ faults and system abnormalities compared
– Data sets from anomaly group 1 and 2 transfer through the data recovery filter and digital models
– A considerable amount of data anomalies can be recovered by this step
Data Anomaly Detection and Recovery
Procedure
24. Data Anomalies in Newton's Laws of Motion
F = ma
F is force,
m is mass
a is acceleration.
Z1,2 are singular vectors
25. Data Anomaly Detection and
Recovery Procedure: Filter 2
– Digital models interact with the descriptive and diagnostic analytics to improve the data quality
– Data anomaly filter 1: missing data points and preliminary data anomalies (i.e. Min-Max values)
detected
– Data anomaly filter 2: Additional data anomalies (i.e. the outliers of digital models) detected
– Data anomalies send to separate groups where the data anomalies against known and unknown
sensor and DAQ faults and system abnormalities compared
– Data sets from anomaly group 1 and 2 transfer through the data recovery filter and digital models
– A considerable amount of data anomalies can be recovered by this step
26. Data Anomaly Detection and
Recovery Procedure: Filter 2
– Digital models interact with the descriptive and diagnostic analytics to improve the data quality
– Data anomaly filter 1: missing data points and preliminary data anomalies (i.e. Min-Max values)
detected
– Data anomaly filter 2: Additional data anomalies (i.e. the outliers of digital models) detected
– Data anomalies send to separate groups where the data anomalies against known and unknown
sensor and DAQ faults and system abnormalities compared
– Data sets from anomaly group 1 and 2 transfer through the data recovery filter and digital models
– A considerable amount of data anomalies can be recovered by this step
27. Data Anomaly Detection and
Recovery Procedure: Data Recovery
– Digital models interact with the descriptive and diagnostic analytics to improve the data quality
– Data anomaly filter 1: missing data points and preliminary data anomalies (i.e. Min-Max values)
detected
– Data anomaly filter 2: Additional data anomalies (i.e. the outliers of digital models) detected
– Data anomalies send to separate groups where the data anomalies against known and unknown
sensor and DAQ faults and system abnormalities compared
– Data sets from anomaly group 1 and 2 transfer through the data recovery filter and digital
models
– A considerable amount of data anomalies can be recovered by this step
30. Visual Analytics
– Digital models should be visualized to extract relevant parameter
relationships.
– Covariance values of the data sets are represented by singular vectors.
– Each vector will present correlation information among the respective
parameters.
31. Visual Analytics
– Digital models should be visualized to extract relevant parameter
relationships in a High Dimensional Space.
– Covariance values of the data sets are represented by singular vectors.
– Each vector will present correlation information among the respective
parameters.
32. Visual Analytics
– Digital models should be visualized to extract relevant parameter relationships
in a High Dimensional Space.
– Covariance values of the data sets are represented by singular vectors.
– Each vector will present correlation information among the respective
parameters.
– The top singular vector is presented in the outer circle.
– The bottom singular vector is presented in the inner circle.
35. Predictive Analytics
– The outputs of the predictive analytics are predicted vessel and ship
system behavior.
– The information creates advanced knowledge and facilitates towards
industrial intelligence
38. Conclusions
– Novel mathematical framework to support industrial digitization of shipping
is presented: i.e. from Industrial IoT to Predictive Analytics.
– Data analytics can…
• self-learn (i.e. the data structure can learn itself)
• self-clean (i.e. data anomalies can be detected, isolated and recovered by considering the
outliers of the data structure),
• self-compress and expend (i.e. the respective parameters in the data sets can be reduced
and expanded by considering the same data structure)
• self-visualize (i.e. the respective data structures can be used for both vessel and ship
system performance observations)
– That introduces Intelligent Analytics to any industry and also provides
important solutions to the big data challenges under the Industrial
Digitalization.