The use of one or more techniques designed to make it impossible or at least more difficult to identify a particular individual from stored data related to them.
Document privacy is very important with communication technology in an innovative era of globalization. However, when the document changes hands or the document is sent, there is still a risk of data damage and the document can be changed by an unrelated party so that document security is necessary. If the confidentiality and authenticity of documents are damaged by irresponsible parties, the information is no longer useful. For this reason, a text encryption application was created using the SEAL, Blowfish and IDEA algorithms. A combination of these three algorithms will create better algorithm security for the company. The IDEA algorithm is a symmetric cryptography with high security which is not based on the confidentiality of the algorithm but emphasizes the security or confidentiality of the key used. The goal is to produce stronger and more secure data security, a combination of the SEAL algorithm for data encryption and the IDEA algorithm is used to maintain data confidentiality and security by going through the description process, as well as the Blowfish algorithm to maintain the confidentiality of user login data, so that only the person concerned can read the information. The important files that the researcher will secure are Emboss files in the form of credit card data and customer data with a Microsoft Office extension (.doc, .docx, xlx, .xlsx) and with text extension. The results of testing the SEAL, Blowfish and IDEA Algorithms are the time obtained to encode and decode is directly proportional to the size of the file being processed (the smaller the file size is processed, the faster the encode and decode process, the larger the file size, the longer it takes. encode and decode process).
IRJET- Student Placement Prediction using Machine LearningIRJET Journal
This document describes a study that uses machine learning algorithms to predict whether students will be placed in jobs after graduating. Specifically, it uses Naive Bayes and K-Nearest Neighbors classifiers to analyze historical student data and predict placements. The algorithms consider parameters like academic results, skills, and previous placement data to make predictions. This system aims to help institutions increase placement percentages by identifying students' strengths and areas for improvement. It is intended to benefit both students in preparing for careers and placement cells in targeting support. Accurately predicting placements could boost a school's reputation by demonstrating career outcomes.
Data mining referred to extracting the hidden predictive information from huge amount of data set. Recently, there are number of private institution are came into existence and they put their efforts to get fruitful admissions. In this paper, the techniques of data mining are used to analyze the mind setup of student after matriculate. One of the best tools of data mining is known as WEKA (Waikato Environment Knowledge Analysis), is used to formulate the process of analysis.
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...ijaia
Along with the spreading of online education, the importance of active support of students involved in
online learning processes has grown. The application of artificial intelligence in education allows
instructors to analyze data extracted from university servers, identify patterns of student behavior and
develop interventions for struggling students. This study used student data stored in a Moodle server and
predicted student success in course, based on four learning activities - communication via emails,
collaborative content creation with wiki, content interaction measured by files viewed and self-evaluation
through online quizzes. Next, a model based on the Multi-Layer Perceptron Neural Network was trained to
predict student performance on a blended learning course environment. The model predicted the
performance of students with correct classification rate, CCR, of 98.3%.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Student’s
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
1) The document summarizes a research project that uses data mining classification techniques to analyze a trajectory dataset in order to predict a user's mode of transportation.
2) Several classification algorithms (decision tree, naive Bayes, Bayesian network, neural network, support vector machines) were evaluated using metrics like accuracy, recall, precision, and kappa. The results showed that decision trees and Bayesian networks performed best.
3) Future work proposed applying density-based clustering to identify dense regions and build prediction models for public vs. personal transportation use in those areas based on historical data.
Marketplace affiliates potential analysis using cosine similarity and vision-...journalBEEI
One success factor of an online affiliate is determined by the quality of the content source. Therefore, affiliate marketplaces need to do an objective assessment to retrieve content data that will be used to choose the right product in the appropriate product filter. Usually, the selection is not made using a good and measured system so that the selection of product content is only based on parts that are not in accordance with what is seen or subjective. However, if analyzed using a good and measurable system will produce an objective product content and can have a positive impact on users because the selection is based on factual data. The purpose of this research is to analyze the potential of the affiliate marketplace by combining cosine similarity with vision-based page segmentation. This is a new breakthrough made for optimization to get the best content in accordance with the required criteria. This work will produce a number of product recommendations that are appropriate for publication and then made use of for comparison that matches the required criteria. At the limited evaluation stage, the performance of the proposed model obtained satisfactory results, in which 5 queries tested were all as expected.
Identification of important features and data mining classification technique...IJECEIAES
Employees absenteeism at the work costs organizations billions a year. Prediction of employees’ absenteeism and the reasons behind their absence help organizations in reducing expenses and increasing productivity. Data mining turns the vast volume of human resources data into information that can help in decision-making and prediction. Although the selection of features is a critical step in data mining to enhance the efficiency of the final prediction, it is not yet known which method of feature selection is better. Therefore, this paper aims to compare the performance of three well-known feature selection methods in absenteeism prediction, which are relief-based feature selection, correlation-based feature selection and information-gain feature selection. In addition, this paper aims to find the best combination of feature selection method and data mining technique in enhancing the absenteeism prediction accuracy. Seven classification techniques were used as the prediction model. Additionally, cross-validation approach was utilized to assess the applied prediction models to have more realistic and reliable results. The used dataset was built at a courier company in Brazil with records of absenteeism at work. Regarding experimental results, correlationbased feature selection surpasses the other methods through the performance measurements. Furthermore, bagging classifier was the best-performing data mining technique when features were selected using correlation-based feature selection with an accuracy rate of (92%).
Document privacy is very important with communication technology in an innovative era of globalization. However, when the document changes hands or the document is sent, there is still a risk of data damage and the document can be changed by an unrelated party so that document security is necessary. If the confidentiality and authenticity of documents are damaged by irresponsible parties, the information is no longer useful. For this reason, a text encryption application was created using the SEAL, Blowfish and IDEA algorithms. A combination of these three algorithms will create better algorithm security for the company. The IDEA algorithm is a symmetric cryptography with high security which is not based on the confidentiality of the algorithm but emphasizes the security or confidentiality of the key used. The goal is to produce stronger and more secure data security, a combination of the SEAL algorithm for data encryption and the IDEA algorithm is used to maintain data confidentiality and security by going through the description process, as well as the Blowfish algorithm to maintain the confidentiality of user login data, so that only the person concerned can read the information. The important files that the researcher will secure are Emboss files in the form of credit card data and customer data with a Microsoft Office extension (.doc, .docx, xlx, .xlsx) and with text extension. The results of testing the SEAL, Blowfish and IDEA Algorithms are the time obtained to encode and decode is directly proportional to the size of the file being processed (the smaller the file size is processed, the faster the encode and decode process, the larger the file size, the longer it takes. encode and decode process).
IRJET- Student Placement Prediction using Machine LearningIRJET Journal
This document describes a study that uses machine learning algorithms to predict whether students will be placed in jobs after graduating. Specifically, it uses Naive Bayes and K-Nearest Neighbors classifiers to analyze historical student data and predict placements. The algorithms consider parameters like academic results, skills, and previous placement data to make predictions. This system aims to help institutions increase placement percentages by identifying students' strengths and areas for improvement. It is intended to benefit both students in preparing for careers and placement cells in targeting support. Accurately predicting placements could boost a school's reputation by demonstrating career outcomes.
Data mining referred to extracting the hidden predictive information from huge amount of data set. Recently, there are number of private institution are came into existence and they put their efforts to get fruitful admissions. In this paper, the techniques of data mining are used to analyze the mind setup of student after matriculate. One of the best tools of data mining is known as WEKA (Waikato Environment Knowledge Analysis), is used to formulate the process of analysis.
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...ijaia
Along with the spreading of online education, the importance of active support of students involved in
online learning processes has grown. The application of artificial intelligence in education allows
instructors to analyze data extracted from university servers, identify patterns of student behavior and
develop interventions for struggling students. This study used student data stored in a Moodle server and
predicted student success in course, based on four learning activities - communication via emails,
collaborative content creation with wiki, content interaction measured by files viewed and self-evaluation
through online quizzes. Next, a model based on the Multi-Layer Perceptron Neural Network was trained to
predict student performance on a blended learning course environment. The model predicted the
performance of students with correct classification rate, CCR, of 98.3%.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Student’s
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
1) The document summarizes a research project that uses data mining classification techniques to analyze a trajectory dataset in order to predict a user's mode of transportation.
2) Several classification algorithms (decision tree, naive Bayes, Bayesian network, neural network, support vector machines) were evaluated using metrics like accuracy, recall, precision, and kappa. The results showed that decision trees and Bayesian networks performed best.
3) Future work proposed applying density-based clustering to identify dense regions and build prediction models for public vs. personal transportation use in those areas based on historical data.
Marketplace affiliates potential analysis using cosine similarity and vision-...journalBEEI
One success factor of an online affiliate is determined by the quality of the content source. Therefore, affiliate marketplaces need to do an objective assessment to retrieve content data that will be used to choose the right product in the appropriate product filter. Usually, the selection is not made using a good and measured system so that the selection of product content is only based on parts that are not in accordance with what is seen or subjective. However, if analyzed using a good and measurable system will produce an objective product content and can have a positive impact on users because the selection is based on factual data. The purpose of this research is to analyze the potential of the affiliate marketplace by combining cosine similarity with vision-based page segmentation. This is a new breakthrough made for optimization to get the best content in accordance with the required criteria. This work will produce a number of product recommendations that are appropriate for publication and then made use of for comparison that matches the required criteria. At the limited evaluation stage, the performance of the proposed model obtained satisfactory results, in which 5 queries tested were all as expected.
Identification of important features and data mining classification technique...IJECEIAES
Employees absenteeism at the work costs organizations billions a year. Prediction of employees’ absenteeism and the reasons behind their absence help organizations in reducing expenses and increasing productivity. Data mining turns the vast volume of human resources data into information that can help in decision-making and prediction. Although the selection of features is a critical step in data mining to enhance the efficiency of the final prediction, it is not yet known which method of feature selection is better. Therefore, this paper aims to compare the performance of three well-known feature selection methods in absenteeism prediction, which are relief-based feature selection, correlation-based feature selection and information-gain feature selection. In addition, this paper aims to find the best combination of feature selection method and data mining technique in enhancing the absenteeism prediction accuracy. Seven classification techniques were used as the prediction model. Additionally, cross-validation approach was utilized to assess the applied prediction models to have more realistic and reliable results. The used dataset was built at a courier company in Brazil with records of absenteeism at work. Regarding experimental results, correlationbased feature selection surpasses the other methods through the performance measurements. Furthermore, bagging classifier was the best-performing data mining technique when features were selected using correlation-based feature selection with an accuracy rate of (92%).
Biometric Identification and Authentication Providence using Fingerprint for ...IJECEIAES
The raise in the recent security incidents of cloud computing and its challenges is to secure the data. To solve this problem, the integration of mobile with cloud computing, Mobile biometric authentication in cloud computing is presented in this paper. To enhance the security, the biometric authentication is being used, since the Mobile cloud computing is popular among the mobile user. This paper examines how the mobile cloud computing (MCC) is used in security issue with finger biometric authentication model. Through this fingerprint biometric, the secret code is generated by entropy value. This enables the person to request for accessing the data in the desk computer. When the person requests the access to the authorized user through Bluetooth in mobile, the Authorized user sends the permit access through fingerprint secret code. Finally this fingerprint is verified with the database in the Desk computer. If it is matched, then the computer can be accessed by the requested person.
This document is a major project report submitted by four students (K. Bhargavi, Ch. Prakya Sri, Shalini Raina, and Rohan Reddy) for their bachelor's degree in computer science and engineering. It discusses the development of a data lineage project under the guidance of their professor Dr. J. Ujwala Rekha. The report includes declarations by the candidates and certificates from the supervisor and head of the department. It also provides an acknowledgements section, abstract, table of contents, and begins a literature review chapter on data lineage.
IRJET- A Survey on Mining of Tweeter Data for Predicting User BehaviorIRJET Journal
This document discusses mining and analyzing social media data using big data techniques to predict user behavior. It proposes using tools like Hadoop and HDFS to capture trends in areas like drug abuse from large amounts of Twitter data. A framework is presented that involves gathering Twitter data using APIs, applying big data mining techniques, and using the results for more sophisticated analysis applications to help address issues like monitoring public health. Challenges around managing large social media datasets are also discussed.
A Generic Model for Student Data Analytic Web Service (SDAWS)Editor IJCATR
Any university management system accumulates a cartload of data and analytics can be applied on it to gather useful
information to aid the academic decision making process. This paper is a novel attempt to demonstrate the significance of a data
analytic web service in the education domain. This can be integrated with the University Management System or any other application
of the university easily. Analytics as a web service offers much benefits over the traditional analysis methods. The web service can be
hosted on a web server and accessed over the internet or on to the private cloud of the campus. The data from various courses from
different departments can be uploaded and analyzed easily. In this paper we design a web service framework to be used in educational
data mining that provide analysis as a service.
IRJET - Encoded Polymorphic Aspect of ClusteringIRJET Journal
This document discusses using machine learning techniques for clustering multi-view data. It focuses on an unsupervised learning technique called clustering, which groups similar objects together into clusters while separating dissimilar objects into different clusters. Compared to single-view clustering, multi-view clustering can access more characteristics and structural information hidden in the data by exploiting richer properties to improve clustering performance. It also discusses encoding datasets into binary format for storage, clustering the encoded data, and retrieving desired data through decoding based on user queries. The goal is to efficiently handle large datasets using scalable machine learning algorithms.
This document presents an overview of data science by Doaa Mohey Eldin. It introduces data science and its main methods, then discusses how data science is used across different industries to solve problems and meet user needs. Examples are given of applications of data science at companies like IBM, Google, Facebook, Netflix and more. The conclusion emphasizes that data science can interpret data and behavior, with essential applications including internet search, recommendations, recognition and gaming.
This document presents an overview of key concepts in data science including data science, data analysis, data analytics, business intelligence, and big data. It discusses the commonalities and differences between these areas as well as data scientist job roles. The document is presented by Doaa Mohey Eldin and includes an agenda covering definitions, processes, applications, and advantages/disadvantages of each concept with the goal of explaining their relationships and distinctions.
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONijaia
Process Mining (PM) emerged from business process management but has recently been applied to
educational data and has been found to facilitate the understanding of the educational process.
Educational Process Mining (EPM) bridges the gap between process analysis and data analysis, based on
the techniques of model discovery, conformance checking and extension of existing process models. We
present a systematic review of the recent and current status of research in the EPM domain, focusing on
application domains, techniques, tools and models, to highlight the use of EPM in comprehending and
improving educational processes.
Text pre-processing of multilingual for sentiment analysis based on social ne...IJECEIAES
Sentiment analysis (SA) is an enduring area for research especially in the field of text analysis. Text pre-processing is an important aspect to perform SA accurately. This paper presents a text processing model for SA, using natural language processing techniques for twitter data. The basic phases for machine learning are text collection, text cleaning, pre-processing, feature extractions in a text and then categorize the data according to the SA techniques. Keeping the focus on twitter data, the data is extracted in domain specific manner. In data cleaning phase, noisy data, missing data, punctuation, tags and emoticons have been considered. For pre-processing, tokenization is performed which is followed by stop word removal (SWR). The proposed article provides an insight of the techniques, that are used for text pre-processing, the impact of their presence on the dataset. The accuracy of classification techniques has been improved after applying text preprocessing and dimensionality has been reduced. The proposed corpus can be utilized in the area of market analysis, customer behaviour, polling analysis, and brand monitoring. The text pre-processing process can serve as the baseline to apply predictive analysis, machine learning and deep learning algorithms which can be extended according to problem definition.
A Survey on the Classification Techniques In Educational Data MiningEditor IJCATR
Due to increasing interest in data mining and educational system, educational data mining is the emerging topic for research
community. educational data mining means to extract the hidden knowledge from large repositories of data with the use of technique
and tools. educational data mining develops new methods to discover knowledge from educational database and used for decision
making in educational system. The various techniques of data mining like classification. clustering can be applied to bring out hidden
knowledge from the educational data.
In this paper, we focus on the educational data mining and classification techniques. In this study we analyze attributes for the
prediction of student's behavior and academic performance by using WEKA open source data mining tool and various classification
methods like decision trees, C4.5 algorithm, ID3 algorithm etc.
This document presents a presentation on developing a secure database system using homomorphic encryption schemes. The presentation was given by 4 students from IIT, Jahangirnagar University. It introduces cloud computing and the need for confidentiality and integrity of data stored in the cloud. It then discusses private information retrieval techniques and how to execute SQL queries over encrypted data. The presentation describes a homomorphic encryption scheme and shows how to implement basic SQL operations like SELECT, UPDATE, and DELETE. It analyzes the performance and computational overhead of processing encrypted data and identifies optimization of efficiency as future work.
IRJET - Automated System for Frequently Occurred Exam QuestionsIRJET Journal
This document summarizes an automated system for generating exam question banks. It discusses the limitations of the traditional manual method for creating question banks, such as being time-consuming and prone to errors. The proposed system aims to automatically generate question banks from existing question papers in a database. It uses information retrieval and natural language processing techniques like stemming, stopword removal, and text ranking to analyze question papers according to the syllabus. This allows question banks to be generated quickly and reliably. The system is intended to save lecturers time while improving the quality and flexibility of question banks. It also discusses the workflow and advantages of the proposed automated system over the existing manual process.
This document is a presentation on data science given by Doaa Mohey Eldin. It defines data science as an interdisciplinary field that extracts knowledge from structured or unstructured data using scientific methods, algorithms, and processes. It discusses why data science is useful for effective problem interpretation, decision making, and predictive systems. Examples of applying data science include healthcare recommendations, predicting incarceration rates, and automating digital ads. The document also outlines techniques like linear regression and neural networks, challenges in privacy and domain expertise, and trends like artificial intelligence and the internet of things.
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CAREijistjournal
This document discusses clustering dichotomous health care data using the K-means algorithm after transforming the data using Wiener transformation. It begins with an introduction to dichotomous data and the challenges of clustering medical data. It then describes the K-means clustering algorithm and various distance measures used for binary data clustering. The document proposes using Wiener transformation to first transform binary data to real values before applying K-means clustering. It evaluates the results on a lens dataset using inter-cluster and intra-cluster distances, finding the transformed data yields better clusters than the original binary data according to these metrics.
This document presents research on classifying data using a new enhanced decision tree algorithm called NEDTA. It first provides background on data mining and decision tree classification techniques. It then discusses existing decision tree algorithms ID3, J48 and NBTree and applies them to a banking dataset to evaluate performance. The objectives are stated as applying the algorithms, evaluating results, comparing performance based on accuracy, time and error rate, and developing an enhanced method. The document outlines the implementation and provides results of applying the existing algorithms in Weka. It compares the accuracy and performance of ID3, J48 and NBTree and finds the new NEDTA algorithm produces better results.
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDYIAEME Publication
In the present digital era massive amount of data is being continuously generated
at exceptional and increasing scales. This data has become an important and
indispensable part of every economy, industry, organization, business and individual.
Further handling of these large datasets due to the heterogeneity in their formats is
one of the major challenge. There is a need for efficient data processing techniques to
handle the heterogeneous data and also to meet the computational requirements to
process this huge volume of data. The objective of this paper is to review, describe
and reflect on heterogeneous data with its complexity in processing, and also the use
of machine learning algorithms which plays a major role in data analytics
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET Journal
This document proposes a methodology to automatically assign topics to unlabeled datasets using topic modeling techniques. It applies latent Dirichlet allocation (LDA) and non-negative matrix factorization (NMF) with term frequency-inverse document frequency (TF-IDF) weighting to product reviews to generate topics. Word similarities are used to cluster words for each topic. Sentiment analysis and word clouds are also used to gain insights. The methodology successfully converts unlabeled to labeled data and provides automatic topic labeling to facilitate further research and opportunity discovery.
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...IJDKP
The document summarizes a proposed methodology that integrates associative classification and neural networks for improved classification accuracy. It begins by introducing association rule mining and associative classification. It then describes using chi-squared analysis and the Gini index for attribute selection and rule pruning to generate a reduced set of rules. These rules are used to train a backpropagation neural network classifier. The methodology is tested on datasets from a public repository, demonstrating improved accuracy over traditional associative classification alone. Future work to integrate optical neural networks is also proposed.
The document discusses using k-means clustering on a life insurance customer dataset to predict customer preferences. It first provides background on k-means clustering and its application in data mining. It then describes applying k-means to a dataset of 14,180 customer records with 10 attributes from an Albanian insurance company. This identified 5 clusters characterizing different customer segments based on attributes like gender, age, and preferred insurance product type and amount. The results help the insurance company better understand customer preferences to improve performance.
Processing of the data generated from transactions that occur every day which resulted in nearly thousands of data per day requires software capable of enabling users to conduct a search of the necessary data. Data mining becomes a solution for the problem. To that end, many large industries began creating software that can perform data processing. Due to the high cost to obtain data mining software that comes from the big industry, then eventually some communities such as universities eventually provide convenience for users who want just to learn or to deepen the data mining to create software based on open source. Meanwhile, many commercial vendors market their products respectively. WEKA and Salford System are both of data mining software. They have the advantages and the disadvantages. This study is to compare them by using several attributes. The users can select which software is more suitable for their daily activities.
Secure Data Retrieval for DecentralizedDisruption-Tolerant Military Networksravi sharma
The document presents a secure data retrieval method for decentralized disruption-tolerant military networks using ciphertext-policy attribute-based encryption (CP-ABE). It proposes a system architecture with multiple independent key authorities, storage nodes, senders, and soldier users. The algorithm allows senders to encrypt data based on an access policy, and only authorized soldiers holding matching attributes can decrypt the ciphertext. The method provides security advantages like data confidentiality, collusion resistance, and backward/forward secrecy. It discusses applications in military, space agencies, and other disruption-prone networks by enabling reliable access to trusted information through external storage nodes.
IRJET- Performance for Student Higher Education using Decision Tree to Predic...IRJET Journal
This document discusses using decision trees to predict career decisions for 12th grade students in India. It first provides background on the challenges in the Indian education system and how data mining can help improve decision making. It then reviews previous studies applying various data mining techniques like decision trees and random forests to predict student performance. The paper proposes using a decision tree approach on student data to distinguish slow and fast learners and help students make better career choices based on their interests and skills. The decision tree approach achieved 80% accuracy in predicting student career decisions, helping students choose appropriate paths.
Biometric Identification and Authentication Providence using Fingerprint for ...IJECEIAES
The raise in the recent security incidents of cloud computing and its challenges is to secure the data. To solve this problem, the integration of mobile with cloud computing, Mobile biometric authentication in cloud computing is presented in this paper. To enhance the security, the biometric authentication is being used, since the Mobile cloud computing is popular among the mobile user. This paper examines how the mobile cloud computing (MCC) is used in security issue with finger biometric authentication model. Through this fingerprint biometric, the secret code is generated by entropy value. This enables the person to request for accessing the data in the desk computer. When the person requests the access to the authorized user through Bluetooth in mobile, the Authorized user sends the permit access through fingerprint secret code. Finally this fingerprint is verified with the database in the Desk computer. If it is matched, then the computer can be accessed by the requested person.
This document is a major project report submitted by four students (K. Bhargavi, Ch. Prakya Sri, Shalini Raina, and Rohan Reddy) for their bachelor's degree in computer science and engineering. It discusses the development of a data lineage project under the guidance of their professor Dr. J. Ujwala Rekha. The report includes declarations by the candidates and certificates from the supervisor and head of the department. It also provides an acknowledgements section, abstract, table of contents, and begins a literature review chapter on data lineage.
IRJET- A Survey on Mining of Tweeter Data for Predicting User BehaviorIRJET Journal
This document discusses mining and analyzing social media data using big data techniques to predict user behavior. It proposes using tools like Hadoop and HDFS to capture trends in areas like drug abuse from large amounts of Twitter data. A framework is presented that involves gathering Twitter data using APIs, applying big data mining techniques, and using the results for more sophisticated analysis applications to help address issues like monitoring public health. Challenges around managing large social media datasets are also discussed.
A Generic Model for Student Data Analytic Web Service (SDAWS)Editor IJCATR
Any university management system accumulates a cartload of data and analytics can be applied on it to gather useful
information to aid the academic decision making process. This paper is a novel attempt to demonstrate the significance of a data
analytic web service in the education domain. This can be integrated with the University Management System or any other application
of the university easily. Analytics as a web service offers much benefits over the traditional analysis methods. The web service can be
hosted on a web server and accessed over the internet or on to the private cloud of the campus. The data from various courses from
different departments can be uploaded and analyzed easily. In this paper we design a web service framework to be used in educational
data mining that provide analysis as a service.
IRJET - Encoded Polymorphic Aspect of ClusteringIRJET Journal
This document discusses using machine learning techniques for clustering multi-view data. It focuses on an unsupervised learning technique called clustering, which groups similar objects together into clusters while separating dissimilar objects into different clusters. Compared to single-view clustering, multi-view clustering can access more characteristics and structural information hidden in the data by exploiting richer properties to improve clustering performance. It also discusses encoding datasets into binary format for storage, clustering the encoded data, and retrieving desired data through decoding based on user queries. The goal is to efficiently handle large datasets using scalable machine learning algorithms.
This document presents an overview of data science by Doaa Mohey Eldin. It introduces data science and its main methods, then discusses how data science is used across different industries to solve problems and meet user needs. Examples are given of applications of data science at companies like IBM, Google, Facebook, Netflix and more. The conclusion emphasizes that data science can interpret data and behavior, with essential applications including internet search, recommendations, recognition and gaming.
This document presents an overview of key concepts in data science including data science, data analysis, data analytics, business intelligence, and big data. It discusses the commonalities and differences between these areas as well as data scientist job roles. The document is presented by Doaa Mohey Eldin and includes an agenda covering definitions, processes, applications, and advantages/disadvantages of each concept with the goal of explaining their relationships and distinctions.
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONijaia
Process Mining (PM) emerged from business process management but has recently been applied to
educational data and has been found to facilitate the understanding of the educational process.
Educational Process Mining (EPM) bridges the gap between process analysis and data analysis, based on
the techniques of model discovery, conformance checking and extension of existing process models. We
present a systematic review of the recent and current status of research in the EPM domain, focusing on
application domains, techniques, tools and models, to highlight the use of EPM in comprehending and
improving educational processes.
Text pre-processing of multilingual for sentiment analysis based on social ne...IJECEIAES
Sentiment analysis (SA) is an enduring area for research especially in the field of text analysis. Text pre-processing is an important aspect to perform SA accurately. This paper presents a text processing model for SA, using natural language processing techniques for twitter data. The basic phases for machine learning are text collection, text cleaning, pre-processing, feature extractions in a text and then categorize the data according to the SA techniques. Keeping the focus on twitter data, the data is extracted in domain specific manner. In data cleaning phase, noisy data, missing data, punctuation, tags and emoticons have been considered. For pre-processing, tokenization is performed which is followed by stop word removal (SWR). The proposed article provides an insight of the techniques, that are used for text pre-processing, the impact of their presence on the dataset. The accuracy of classification techniques has been improved after applying text preprocessing and dimensionality has been reduced. The proposed corpus can be utilized in the area of market analysis, customer behaviour, polling analysis, and brand monitoring. The text pre-processing process can serve as the baseline to apply predictive analysis, machine learning and deep learning algorithms which can be extended according to problem definition.
A Survey on the Classification Techniques In Educational Data MiningEditor IJCATR
Due to increasing interest in data mining and educational system, educational data mining is the emerging topic for research
community. educational data mining means to extract the hidden knowledge from large repositories of data with the use of technique
and tools. educational data mining develops new methods to discover knowledge from educational database and used for decision
making in educational system. The various techniques of data mining like classification. clustering can be applied to bring out hidden
knowledge from the educational data.
In this paper, we focus on the educational data mining and classification techniques. In this study we analyze attributes for the
prediction of student's behavior and academic performance by using WEKA open source data mining tool and various classification
methods like decision trees, C4.5 algorithm, ID3 algorithm etc.
This document presents a presentation on developing a secure database system using homomorphic encryption schemes. The presentation was given by 4 students from IIT, Jahangirnagar University. It introduces cloud computing and the need for confidentiality and integrity of data stored in the cloud. It then discusses private information retrieval techniques and how to execute SQL queries over encrypted data. The presentation describes a homomorphic encryption scheme and shows how to implement basic SQL operations like SELECT, UPDATE, and DELETE. It analyzes the performance and computational overhead of processing encrypted data and identifies optimization of efficiency as future work.
IRJET - Automated System for Frequently Occurred Exam QuestionsIRJET Journal
This document summarizes an automated system for generating exam question banks. It discusses the limitations of the traditional manual method for creating question banks, such as being time-consuming and prone to errors. The proposed system aims to automatically generate question banks from existing question papers in a database. It uses information retrieval and natural language processing techniques like stemming, stopword removal, and text ranking to analyze question papers according to the syllabus. This allows question banks to be generated quickly and reliably. The system is intended to save lecturers time while improving the quality and flexibility of question banks. It also discusses the workflow and advantages of the proposed automated system over the existing manual process.
This document is a presentation on data science given by Doaa Mohey Eldin. It defines data science as an interdisciplinary field that extracts knowledge from structured or unstructured data using scientific methods, algorithms, and processes. It discusses why data science is useful for effective problem interpretation, decision making, and predictive systems. Examples of applying data science include healthcare recommendations, predicting incarceration rates, and automating digital ads. The document also outlines techniques like linear regression and neural networks, challenges in privacy and domain expertise, and trends like artificial intelligence and the internet of things.
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CAREijistjournal
This document discusses clustering dichotomous health care data using the K-means algorithm after transforming the data using Wiener transformation. It begins with an introduction to dichotomous data and the challenges of clustering medical data. It then describes the K-means clustering algorithm and various distance measures used for binary data clustering. The document proposes using Wiener transformation to first transform binary data to real values before applying K-means clustering. It evaluates the results on a lens dataset using inter-cluster and intra-cluster distances, finding the transformed data yields better clusters than the original binary data according to these metrics.
This document presents research on classifying data using a new enhanced decision tree algorithm called NEDTA. It first provides background on data mining and decision tree classification techniques. It then discusses existing decision tree algorithms ID3, J48 and NBTree and applies them to a banking dataset to evaluate performance. The objectives are stated as applying the algorithms, evaluating results, comparing performance based on accuracy, time and error rate, and developing an enhanced method. The document outlines the implementation and provides results of applying the existing algorithms in Weka. It compares the accuracy and performance of ID3, J48 and NBTree and finds the new NEDTA algorithm produces better results.
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDYIAEME Publication
In the present digital era massive amount of data is being continuously generated
at exceptional and increasing scales. This data has become an important and
indispensable part of every economy, industry, organization, business and individual.
Further handling of these large datasets due to the heterogeneity in their formats is
one of the major challenge. There is a need for efficient data processing techniques to
handle the heterogeneous data and also to meet the computational requirements to
process this huge volume of data. The objective of this paper is to review, describe
and reflect on heterogeneous data with its complexity in processing, and also the use
of machine learning algorithms which plays a major role in data analytics
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET Journal
This document proposes a methodology to automatically assign topics to unlabeled datasets using topic modeling techniques. It applies latent Dirichlet allocation (LDA) and non-negative matrix factorization (NMF) with term frequency-inverse document frequency (TF-IDF) weighting to product reviews to generate topics. Word similarities are used to cluster words for each topic. Sentiment analysis and word clouds are also used to gain insights. The methodology successfully converts unlabeled to labeled data and provides automatic topic labeling to facilitate further research and opportunity discovery.
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...IJDKP
The document summarizes a proposed methodology that integrates associative classification and neural networks for improved classification accuracy. It begins by introducing association rule mining and associative classification. It then describes using chi-squared analysis and the Gini index for attribute selection and rule pruning to generate a reduced set of rules. These rules are used to train a backpropagation neural network classifier. The methodology is tested on datasets from a public repository, demonstrating improved accuracy over traditional associative classification alone. Future work to integrate optical neural networks is also proposed.
The document discusses using k-means clustering on a life insurance customer dataset to predict customer preferences. It first provides background on k-means clustering and its application in data mining. It then describes applying k-means to a dataset of 14,180 customer records with 10 attributes from an Albanian insurance company. This identified 5 clusters characterizing different customer segments based on attributes like gender, age, and preferred insurance product type and amount. The results help the insurance company better understand customer preferences to improve performance.
Processing of the data generated from transactions that occur every day which resulted in nearly thousands of data per day requires software capable of enabling users to conduct a search of the necessary data. Data mining becomes a solution for the problem. To that end, many large industries began creating software that can perform data processing. Due to the high cost to obtain data mining software that comes from the big industry, then eventually some communities such as universities eventually provide convenience for users who want just to learn or to deepen the data mining to create software based on open source. Meanwhile, many commercial vendors market their products respectively. WEKA and Salford System are both of data mining software. They have the advantages and the disadvantages. This study is to compare them by using several attributes. The users can select which software is more suitable for their daily activities.
Secure Data Retrieval for DecentralizedDisruption-Tolerant Military Networksravi sharma
The document presents a secure data retrieval method for decentralized disruption-tolerant military networks using ciphertext-policy attribute-based encryption (CP-ABE). It proposes a system architecture with multiple independent key authorities, storage nodes, senders, and soldier users. The algorithm allows senders to encrypt data based on an access policy, and only authorized soldiers holding matching attributes can decrypt the ciphertext. The method provides security advantages like data confidentiality, collusion resistance, and backward/forward secrecy. It discusses applications in military, space agencies, and other disruption-prone networks by enabling reliable access to trusted information through external storage nodes.
IRJET- Performance for Student Higher Education using Decision Tree to Predic...IRJET Journal
This document discusses using decision trees to predict career decisions for 12th grade students in India. It first provides background on the challenges in the Indian education system and how data mining can help improve decision making. It then reviews previous studies applying various data mining techniques like decision trees and random forests to predict student performance. The paper proposes using a decision tree approach on student data to distinguish slow and fast learners and help students make better career choices based on their interests and skills. The decision tree approach achieved 80% accuracy in predicting student career decisions, helping students choose appropriate paths.
Integrating big data with an agile cloud platform can significantly affect how businesses achieve their
objectives. Many companies are moving to the cloud, but the trust issue seemed to make a move to the
cloud slower. This paper investigated the factors that affect Service Satisfaction that led to Trust. Since the
sample was not normally distributed, the researchers used the PLS-SEM tool to analyse the relationship of
the variables. The variables are Data Security, Data Privacy, Cloud Benefits, Reputation, Service Level
Agreement (SLA), Risk Management, Service Satisfaction and Trust. The variables were linked together
based on the analysis from qualitative research supported by theories, and the linkages were being
validated through quantitative data analysis. The quantitative data analysis found that Data Security, Cloud Benefits, Reputation and SLA influence Service Satisfaction and Service Satisfaction influences trust
Integrating big data with an agile cloud platform can significantly affect how businesses achieve their
objectives. Many companies are moving to the cloud, but the trust issue seemed to make a move to the
cloud slower. This paper investigated the factors that affect Service Satisfaction that led to Trust. Since the
sample was not normally distributed, the researchers used the PLS-SEM tool to analyse the relationship of
the variables. The variables are Data Security, Data Privacy, Cloud Benefits, Reputation, Service Level
Agreement (SLA), Risk Management, Service Satisfaction and Trust. The variables were linked together
based on the analysis from qualitative research supported by theories, and the linkages were being
validated through quantitative data analysis. The quantitative data analysis found that Data Security,
Cloud Benefits, Reputation and SLA influence Service Satisfaction and Service Satisfaction influences trust.
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...IJECEIAES
Leakage and misuse of sensitive data is a challenging problem to enterprises. It has become more serious problem with the advent of cloud and big data. The rationale behind this is the increase in outsourcing of data to public cloud and publishing data for wider visibility. Therefore Privacy Preserving Data Publishing (PPDP), Privacy Preserving Data Mining (PPDM) and Privacy Preserving Distributed Data Mining (PPDM) are crucial in the contemporary era. PPDP and PPDM can protect privacy at data and process levels respectively. Therefore, with big data privacy to data became indispensable due to the fact that data is stored and processed in semi-trusted environment. In this paper we proposed a comprehensive methodology for effective sanitization of data based on misusability measure for preserving privacy to get rid of data leakage and misuse. We followed a hybrid approach that caters to the needs of privacy preserving MapReduce programming. We proposed an algorithm known as Misusability Measure-Based Privacy Preserving Algorithm (MMPP) which considers level of misusability prior to choosing and application of appropriate sanitization on big data. Our empirical study with Amazon EC2 and EMR revealed that the proposed methodology is useful in realizing privacy preserving Map Reduce programming.
This document outlines the fundamentals of a data science course, including its objectives, outcomes, and syllabus. The course aims to introduce students to common data science tools and teach programming for data analytics. It covers topics like data analysis with Excel, NumPy, Pandas, and Matplotlib. The syllabus includes 6 units covering data science basics, the data science process, tools for analysis and visualization, and content beyond the core topics like R and Power BI. Online resources are also provided for additional learning.
IRJET- Intelligent Laboratory Management System based on Internet of Thin...IRJET Journal
This document describes a proposed intelligent laboratory management system based on Internet of Things (IoT) and machine learning. The system uses an STM32 microcontroller, RFID reader, and Raspberry Pi to automate student attendance tracking and course information display. It also analyzes student performance data using machine learning algorithms like XGBoost to predict academic performance and help educators evaluate student progress and identify areas for improvement. The system aims to standardize and optimize laboratory management with an intelligent, automated approach.
IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...IRJET Journal
The document describes a proposed e-learning system that uses cryptography and data mining techniques to provide security and personalized recommendations. Elliptic curve cryptography is used to authenticate users and encrypt data for security. A decision tree algorithm classifies learner information and course content to recommend additional courses tailored to each learner's interests and behavior. The system aims to address security and privacy issues in e-learning while enhancing the learning experience through targeted content filtering and recommendations.
This document outlines the details of an introductory data science course, including its mission, vision, core values, and schedule. It introduces data science and related fields such as data mining and analytics. It discusses the data science process and common job roles. Finally, it provides an overview of data science skills in high demand and lists several resources for data, tools, and learning.
Cluster Based Access Privilege Management Scheme for DatabasesEditor IJMTER
Knowledge discovery is carried out using the data mining techniques. Association rule mining,
classification and clustering operations are carried out under data mining. Clustering method is used to group up the
records based on the relevancy. Distance or similarity measures are used to estimate the transaction relationship.
Census data and medical data are referred as micro data. Data publish schemes are used to provide private data for
analysis. Privacy preservation is used to protect private data values. Anonymity is considered in the privacy
preservation process.
Data values are allowed to authorized users using the access control models. Privacy Protection Mechanism
(PPM) uses suppression and generalization of relational data to anonymize and satisfy privacy needs. Accuracyconstrained privacy-preserving access control framework is used to manage access control in relational database. The
access control policies define selection predicates available to roles while the privacy requirement is to satisfy the kanonymity or l-diversity. Imprecision bound constraint is assigned for each selection predicate. k-anonymous
Partitioning with Imprecision Bounds (k-PIB) is used to estimate accuracy and privacy constraints. Role-based Access
Control (RBAC) allows defining permissions on objects based on roles in an organization. Top Down Selection
Mondrian (TDSM) algorithm is used for query workload-based anonymization. The Top Down Selection Mondrian
(TDSM) algorithm is constructed using greedy heuristics and kd-tree model. Query cuts are selected with minimum
bounds in Top-Down Heuristic 1 algorithm (TDH1). The query bounds are updated as the partitions are added to the
output in Top-Down Heuristic 2 algorithm (TDH2). The cost of reduced precision in the query results is used in TopDown Heuristic 3 algorithm (TDH3). Repartitioning algorithm is used to reduce the total imprecision for the queries.
The privacy preserved access privilege management scheme is enhanced to provide incremental mining
features. Data insert, delete and update operations are connected with the partition management mechanism. Cell level
access control is provided with differential privacy method. Dynamic role management model is integrated with the
access control policy mechanism for query predicates.
This document discusses input for data visualization including data types, tasks, data quality, sources, and shapes. It describes different types of data like numerical, categorical, ordinal, and interval data. Common visualization tasks of comparison, trends, distribution, relationships, and geospatial are covered. Data quality factors like accuracy, consistency, completeness and timeliness are also defined. The document concludes by explaining common data shapes like tabular, hierarchical, network, and geospatial data.
This document is a resume for Dr. Parkavi.A that provides information over 3 sentences:
Dr. Parkavi.A is seeking a challenging career that utilizes her engineering knowledge and delivers expected commitments to society, with teaching experience at various institutions and expertise in areas like data analytics, programming languages, databases, and operating systems. She holds a Ph.D in computer science and engineering and has published papers in various international conferences and book chapters on topics related to education technology and data mining. She has also led academic projects focused on areas such as compiler design, networking applications, and maps of cities.
Educational Data Mining & Students Performance Prediction using SVM TechniquesIRJET Journal
This document discusses using educational data mining and support vector machine (SVM) techniques to predict student performance. It begins with an abstract stating that educational data mining focuses on analyzing educational data to improve learning and institutional effectiveness. The document then provides background on educational data mining and discusses comparing various educational data mining techniques and algorithms using the Weka tool to analyze their accuracy in predicting student performance. Key techniques discussed include SVM, machine learning programming, and various data mining algorithms. The document also reviews related work applying educational data mining and discusses implementing various data mining methods and algorithms to conduct predictive analytics on student performance data.
The document provides an introduction to big data, including:
1) It defines big data and discusses its key characteristics of volume, velocity, and variety.
2) It describes sources of big data like sensors, social media, and purchase transactions.
3) It discusses big data analytics including descriptive, predictive, and prescriptive analytics and the stages of capture, organize, analyze, and act.
Data mining referred to extracting the hidden predictive information from huge amount of data set. Recently, there are number of private institution are came into existence and they put their efforts to get fruitful admissions. In this paper, the techniques of data mining are used to analyze the mind setup of student after matriculate. One of the best tools of data mining is known as WEKA (Waikato Environment Knowledge Analysis), is used to formulate the process of analysis.
Analytics of Performance and Data Quality for Mobile Edge Cloud ApplicationsHong-Linh Truong
The document discusses performance and data quality analytics for mobile edge cloud applications. It presents MECCA, a mobile edge cloud application for providing cornering recommendations to cars. MECCA has a complex architecture using microservices and third party services. Analyzing MECCA's performance and data quality across different edge and cloud deployments is challenging due to dependencies between application parameters, streaming processing, and third party services. Future work aims to develop toolsets and datasets to better evaluate performance and data quality metrics for mobile edge cloud applications.
Educational Data Mining to Analyze Students Performance – Concept PlanIRJET Journal
This document discusses using data mining techniques to analyze student performance data from educational institutions. It proposes using clustering and classification algorithms like K-means and Naive Bayesian on data collected from sources like learning management systems and surveys. The goals are to classify students into performance levels, identify factors affecting performance, and make recommendations to help students improve. Clustering could group students and classification could predict performance based on attributes. Analyzing the data may provide insights to enhance guidance and outcomes. The paper presents this as a conceptual plan to apply data mining in education.
6 Weeks Data Science Summer Training in Noida in 2022Raj Sharma
6 Weeks Data Science Summer Training in Noida in 2022 covers the complete life cycle of data with practical experience. Using different Data Science methodologies, our company provides the feasible and practical solution of the student queries. In the short duration, 6 Weeks Data Science Summer Training in Noida in 2022 APTRON gives the deep inside knowledge to aspirants. Be a data scientist soon.
http://aptronnoida.in/summer/data-science-6-weeks-project-training-noida.html
Similar to Enhanced Privacy Preserving Access Control in Incremental Data using microaggregation (20)
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Enhanced Privacy Preserving Access Control in Incremental Data using microaggregation
1. Indira College of Engineering Management, Pune
Enhanced Privacy Preserving Access Control in
Incremental Data using microaggregation
ME II COMPUTER
DS-II
Presented by Guide Name
Mr. Ravi Sharma Prof. Manisha Bharati
2. CONTENT
• Introduction
• Motivation
• Problem Statement
• Literature survey
• System Architecture
• Mathematical Model
• Algorithms
• Result Analysis
• Conclusion
• Future Scope
• References
12 June 2018 Indira College of Engineering Management, Pune 2
3. What is need of privacy preservation ?
• Government agencies and other organizations publish medical data,
census data for scientific, research purpose.
• The privacy protection prevents the misuse of sensitive and
confidential information of data owners.
12 June 2018 Indira College of Engineering Management, Pune 3
5. Attribute types
• Identifier.
• Quasi-identifier or key attributes.
• Confidential attributes.
• Non-confidential attributes.
12 June 2018 Indira College of Engineering Management, Pune 5
6. Microaggregation
• Order records from the initial microdata by an attribute.
• Creation of groups of consecutive values.
• Replacement of such values by the group average .
• Microaggregation for attribute Income and minimum size 3.
• The total sum for all Income values remains the same.
12 June 2018 Indira College of Engineering Management, Pune 6
Rec. ID Age marital status Income
2 44 single 30,967
4 44 separated 30,967
10 45 single 30,967
1 44 married 47,500
6 45 married 47,500
7 25 separated 47,500
3 55 divorced 73,000
5 55 married 73,000
8 35 single 73,000
7. Sensitive Table
QI1 QI2 S1
ID Age Zip Salary
1 5 15 20K
2 15 28 30K
3 28 45 50K
4 25 60 10K
5 38 74 20K
6 32 89 50K
12 June 2018 Indira College of Engineering Management, Pune 7
8. Data Anonymization
• The use of one or more techniques designed to make it impossible or
at least more difficult to identify a particular individual from stored
data related to them.
12 June 2018 Indira College of Engineering Management, Pune 8
9. Non-anonymized database consisting of the Income records
12 June 2018 Indira College of Engineering Management, Pune 9
Name Age Gender Country Marital status Salary
Sam 29 Female United-States Divorced 20K
Robert 24 Female Germany Never-married 50K
Jack 28 Female Mexico married 20K
sunny 27 Male United-States married 50K
Jackson 24 Female Germany
Married-civ-
spouse
10K
Suresh 23 Male United-States married 20K
Albert 19 Male Thailand married 30K
Shone 29 Male Philippines
Married-civ-
spouse
50K
Johnson 17 Male Portugal Divorced 10K
John 19 Male Canada married 20K
10. Applying Generalization & Suppression
12 June 2018 Indira College of Engineering Management, Pune 10
Name Age Gender Country Marital status Salary
* 20 < Age ≤ 30 Female United-States * 20K
* 20 < Age ≤ 30 Female Germany * 50K
* 20 < Age ≤ 30 Female United-States * 20K
* 20 < Age ≤ 30 Male United-States * 50K
* 20 < Age ≤ 30 Female Germany * 10K
* 20 < Age ≤ 30 Male United-States * 20K
* Age ≤ 20 Male Thailand * 30K
* 20 < Age ≤ 30 Male Philippines * 50K
* Age ≤ 20 Male Portugal * 10K
* Age ≤ 20 Male Canada * 20K
11. Two types of attacks :
(i)Homogeneity attack and
(ii)Background knowledge attack.
12 June 2018 Indira College of Engineering Management, Pune 11
12. t-Closeness
12 June 2018 Indira College of Engineering Management, Pune 12
t-closeness effectively protects against
the sensitive attributes.
Distribution of sensitive attributes within
each quasi-identifier group should be
“close” to their Distribution in the entire
original database
13. • EMD(P,Q) measures the cost of transforming one distribution P into another
distribution Q by moving probability mass. EMD is computed as the
minimum transportation cost from the bins of P to the bins of Q, so it
depends on how much mass is moved and how far it is moved.
• If the numerical attribute takes values {v1, v2, ... vm}, where vi < vj if i < j,
then ordered distance(vi,vj)=(i-j)/(m-1). Now, if P and Q are
distributions over {v1, v2, ... vm} that, respectively, assign probability pi and
qi to vi , then the EMD for the ordered distance can be computed as
• EMD (P,Q) =
12 June 2018 Indira College of Engineering Management, Pune 13
14. t-Closeness
12 June 2018 Indira College of Engineering Management, Pune 14
• Earth Movers Distance (EMD)
• Work on attributes,
let ri = pi - qi , (i = 1; 2,……….,m),
EMD between P and Q can be calculated
15. Access Control Mechanism
• A user has access to an object based on the assigned role.
• Access Control is a set of controls to restrict access to certain
resources.
12 June 2018 Indira College of Engineering Management, Pune 15
16. Motivation
12 June 2018 Indira College of Engineering Management, Pune 16
Resource
Protection
Threat
Information Sharing
Security – Why???
17. Problem Statement
• A microdata set is a dataset whose carries sensitive information of
individual respondent like person or enterprise. To avoid sensitive data
access within provided set of data where sub-set of data is public and
remaining sub-set is private or protected in nature.
12 June 2018 Indira College of Engineering Management, Pune 17
18. Objectives
• To identify the strong points of microaggregation to achieve k-
anonymous t-closeness.
• To improve the utility of the anonymized data set.
• Provide additional masking freedom and improving data utility.
• Increase data granularity
• Reducing the impact of outliers
12 June 2018 Indira College of Engineering Management, Pune 18
19. Literature Survey
Sr .No Paper Title Aim of the Paper Advantages Disadvantages
1. t-Closeness through
Microaggregation: Strict
Privacy with Enhanced
Utility Preservation, IEEE
transactions on knowledge
and data engineering,11,
november 2015.
Microaggregation has
several advantages over
generalization/recoding
for k-anonymity that
are mostly related to
data utility preservation
increasing data
granularity and avoiding
discretization of
numerical data.
Privacy and Security
issue to anonymized
data.
2. Jianneng Cao, Panagiotis
Karras, Panos Kalnis,
Kian-Lee Tan propose a
SABRE: a Sensitive
Attribute Bucketization
and Redistribution
framework for t-
closeness, May 2009
the need
of microdata privacy
and cover the gap with
SABRE
SABRE provides the best
known resolution of the
tradeoff between privacy,
information quality, and
computational efficiency
with a t-closeness
guarantee in mind.
A greater number of
buckets leads to
equivalence classes
with more records
and thus, to more
information loss.
20. Literature Survey
Sr .No Paper Title Aim of the Paper Advantages Disadvantages
3. Josep DomingoFerrer, Hybrid
microdata using
microaggregation 10 April
2010.
method combines
microaggregation and
any synthetic
data generator.
to produce hybrid
microdata sets that can
be free with low
disclosure risk and
acceptable data utility.
preservation of means,
variances, covariances, and
third-order
and fourth-order central
moments this feature was
not open by the current
hybrid
4. Josep DomingoFerrer , Jordi
SoriaComas, From tcloseness to
differential privacy and vice
versa in data 4 anonymization, 11
November 2014.
data set
anonymization
k-anonymity, t-
closeness and e-
differential privacy
prior bucketization of
the values of the
confidential attribute
is required.
5. Ninghui Li, Tiancheng Li,
Suresh Venkatasubramanian,
tCloseness: Privacy Beyond
kAnonymity and Diversity,
2007 IEEE.
k-anonymity protects
against identity
disclosure
distribution of a sensitive
attribute in any
equivalence class is close
to the distribution of the
attribute in the overall
table
it does not provide
sufficient protection
against attribute
disclosure
22. 22
Sensitive table
Preprocessing K-Anonymity
Privacy Requirement
Imprecision
bound
Query
Predicate
Access Control
Anonymized Table
Anonymized Table
with Bound (output)
Partitioning (Top
Down Heuristic 3
Algorithm)
DatabaseAccess Data
Administrator
User
Server
Flow of Architecture
24. Modules
• Dataset Extraction
• Preprocessing
• Cluster Formation
• Anonymization
• Partitioning Using Heuristics Mechanism
• Database Incremental ,Anonymization And Partitioning
12 June 2018 Indira College of Engineering Management, Pune 24
25. Mathematical Model
S: System;
A system is defined as a set such that: S = I, P, O.
Where,
U: Set of users
UR: Set of Registered Users, UN: Set of UN-Registered Users
I: Set of Input.
O: Set of output.
P: Set of Processes.
• INPUT SET DETAILS:
12 June 2018 Indira College of Engineering Management, Pune 25
26. • 1. PHASE 1: REGISTRATION. Ir= username: ir1,
Address: ir2,
Pincode: ir3,
Country: ir4,
Gender : ir5
12 June 2018 Indira College of Engineering Management, Pune 26
27. 2. PHASE 2: Data Processing
Ie= userinfo: iv1, searchFeaturesiv2, Featurelist: iv3
3. PHASE 3: Result Id= userinfo: iv1, Feature-Key: iv2
PROCESS SET DETAILS:
1. PHASE 1: REGISTRATION.
P1= User Registration: p11
2. PHASE 2: Data Processing
P2= feature selection: p21
Data exctraction: p22
12 June 2018 Indira College of Engineering Management, Pune 27
28. 3. PHASE 3: Result P3= view mining result: p31,
4. User Verification: p32
OUTPUT SET DETAILS:
1. PHASE 1: REGISTRATION. O1= userid: o11,
Password: o12
2. PHASE 2: Data Processing
O2=FeatureData: o21
3. PHASE 3: Result
Success Conditions:
12 June 2018 Indira College of Engineering Management, Pune 28
29. Algorithm
• t-closeness through microaggregation merging of
Microaggregated grouped of records.
• Algorithm 1 t-Closeness through microaggregation and merging of microaggregated groups of records.
• Data: X: original data set
• k: minimum cluster size
• t: t-closeness level
• Result Set of clusters satisfying k-anonymity and t-closeness
• X0=microaggregation(X, k)
• while EMD(X0;X) > t do
• C = cluster in X0 with the greatest EMD to X
• C0 = cluster in X0 closest to C in terms of QIs
• X0 = merge C and C0 in X0
• end while
• return X0
12 June 2018 Indira College of Engineering Management, Pune 29
30. Algorithm
12 June 2018 Indira College of Engineering Management, Pune 30
• Top-Down Heuristic Algorithm:
• Input: T,K,Q AND BQi where T for total tuples, K –cluster, Q –query, B= bound for query i
• Output: P the output partitions
• STEPS:
1. Initialize candidate partitions CP<- T
2. For all CP do the following:
a) Find the queries that overlap in that partition.
b) Select the queries with least IB and IB>0
c) Select the query with smallest bound.
d) Create query cut.
e) if(!skewed partition) then feasible cut is found and add to CP.
else
Reject the cut.
3. Return (P).
31. Software Requirements
• Operating System : Windows 7
• Coding Language : JAVA
• Front-End : NetBeans 8.0.2
• Data Base : My SQL 5.0
• Frame Work : JDK 1.8
12 June 2018 Indira College of Engineering Management, Pune 31
32. Advantages of System
• Easy sharing of privacy sensitive data for analysis. Business-to-
Businesses, Entities-to-Entities and Government-to- Government.
• The anonymity technique can be used with an access control
mechanism to ensure both security and privacy of the sensitive
information.
12 June 2018 Indira College of Engineering Management, Pune 32
33. Application
• Electronic commerce results in the automated collection of large
amounts of consumer data. These data, which are gathered by many
companies, are shared with subsidiaries and partners.
• Health care is a very sensitive sector with strict regulations. Requires
the strict regulation of protected health information for use in medical
research.
12 June 2018 Indira College of Engineering Management, Pune 33
36. Conclusion
• The access control mechanism allows only authorized query predicates
on sensitive data.
• The Proposed work use microaggregation as a method to attain k-
anonymous t-closeness.
• It’s maintain data’s in secure manner.
12 June 2018 Indira College of Engineering Management, Pune 36
37. Future Scope
• k-means clustering parameters can be chosen dynamically through
k-means algorithm
12 June 2018 Indira College of Engineering Management, Pune 37
38. References
[1] R. Brand, J. Domingo-Ferrer, and J. M. Mateo-Sanz. Reference data sets to test and compare SDC methods for protection of
numerical microdata. European Project IST-2000-25069 CASC [Online]. Available: http://neon.vb.cbs.nl/casc/CASCtestsets.htm,
2002..
[2] J. Cao, P. Karras, P. Kalnis, and K.-L. Tan, “SABRE: A sensitive attribute Bucketization and Redistribution framework for
tcloseness,” VLDB J., vol. 20, no. 1, pp. 59–81, 2011.
[3] D. Defays and P. Nanopoulos, “Panels of enterprises and confidentiality: The small aggregates method,” in Proc. Symp. Design
Anal. Longitudinal Surveys, 1992, pp. 195–204.
[4] J. Domingo-Ferrer and V. Torra, “A quantitative comparison of disclosure control methods for microdata,” in Confidentiality,
Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies. L. Zayatz, P. Doyle, J. Theeuwes, and J.
Lane, Eds. Amsterdam, The Netherlands: North Holland, 2001, pp. 111–134.
[5] J. Domingo-Ferrer and J. M. Mateo-Sanz, “Practical data-oriented microaggregation for statistical disclosure control,” IEEE Trans.
Knowl. Data Eng., vol. 14, no. 1, pp. 189–201, Jan./Feb. 2002.
12 June 2018 Indira College of Engineering Management, Pune 38
39. References
[6] J. Domingo-Ferrer and U. Gonz_alez-Nicol_as, “Hybrid microdata using microaggregation,” Inf. Sci., vol. 180, no. 15,
pp. 2834–2844, 2010.
[7] J. Domingo-Ferrer, D. S_anchez and G. Rufian-Torrell, “Anonymization of nominal data based on semantic
marginality,” Inf. Sci., vol. 242, pp. 35–48, 2013.
[8] J. Domingo-Ferrer and J. Soria-Comas, “From t-closeness to differential privacy and vice versa in data
anonymization,” Knowl.- Based Syst., vol. 74, pp. 151–158, 2015.
[9] J. Domingo-Ferrer and V. Torra, “Ordinal, continuous and heterogeneous k-anonymity through microaggregation,”
Data Mining Knowl. Discovery, vol. 11, no. 2, pp. 195–212, 2005.
[10] C. Dwork, “Differential privacy,” in Proc. 33rd Int. Colloquium Automata, Languages Programm.,2006,pp.1–12
12 June 2018 Indira College of Engineering Management, Pune 39