Processing of the data generated from transactions that occur every day which resulted in nearly thousands of data per day requires software capable of enabling users to conduct a search of the necessary data. Data mining becomes a solution for the problem. To that end, many large industries began creating software that can perform data processing. Due to the high cost to obtain data mining software that comes from the big industry, then eventually some communities such as universities eventually provide convenience for users who want just to learn or to deepen the data mining to create software based on open source. Meanwhile, many commercial vendors market their products respectively. WEKA and Salford System are both of data mining software. They have the advantages and the disadvantages. This study is to compare them by using several attributes. The users can select which software is more suitable for their daily activities.
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATAIJMIT JOURNAL
Classification is one among the data mining function that assigns items in a collection to target categories
or collection of data to provide more accurate predictions and analysis. Classification using supervised
learning method aims to identify the category of the class to which a new data will fall under. With the
advancement of technology and increase in the generation of real-time data from various sources like
Internet, IoT and Social media it needs more processing and challenging. One such challenge in
processing is data imbalance. In the imbalanced dataset, majority classes dominate over minority classes
causing the machine learning classifiers to be more biased towards majority classes and also most
classification algorithm predicts all the test data with majority classes. In this paper, the author analysis
the data imbalance models using big data and classification algorithm
A new hybrid algorithm for business intelligence recommender systemIJNSA Journal
Business Intelligence is a set of methods, process and technologies that transform raw data into meaningful
and useful information. Recommender system is one of business intelligence system that is used to obtain
knowledge to the active user for better decision making. Recommender systems apply data mining
techniques to the problem of making personalized recommendations for information. Due to the growth in
the number of information and the users in recent years offers challenges in recommender systems.
Collaborative, content, demographic and knowledge-based are four different types of recommendations
systems. In this paper, a new hybrid algorithm is proposed for recommender system which combines
knowledge based, profile of the users and most frequent item mining technique to obtain intelligence.
Recommendation system using bloom filter in mapreduceIJDKP
Many clients like to use the Web to discover product details in the form of online reviews. The reviews are
provided by other clients and specialists. Recommender systems provide an important response to the
information overload problem as it presents users more practical and personalized information facilities.
Collaborative filtering methods are vital component in recommender systems as they generate high-quality
recommendations by influencing the likings of society of similar users. The collaborative filtering method
has assumption that people having same tastes choose the same items. The conventional collaborative
filtering system has drawbacks as sparse data problem & lack of scalability. A new recommender system is
required to deal with the sparse data problem & produce high quality recommendations in large scale
mobile environment. MapReduce is a programming model which is widely used for large-scale data
analysis. The described algorithm of recommendation mechanism for mobile commerce is user based
collaborative filtering using MapReduce which reduces scalability problem in conventional CF system.
One of the essential operations for the data analysis is join operation. But MapReduce is not very
competent to execute the join operation as it always uses all records in the datasets where only small
fraction of datasets are applicable for the join operation. This problem can be reduced by applying
bloomjoin algorithm. The bloom filters are constructed and used to filter out redundant intermediate
records. The proposed algorithm using bloom filter will reduce the number of intermediate results and will
improve the join performance.
A statistical data fusion technique in virtual data integration environmentIJDKP
Data fusion in the virtual data integration environment starts after detecting and clustering duplicated
records from the different integrated data sources. It refers to the process of selecting or fusing attribute
values from the clustered duplicates into a single record representing the real world object. In this paper, a
statistical technique for data fusion is introduced based on some probabilistic scores from both data
sources and clustered duplicates
A Survey of Agent Based Pre-Processing and Knowledge RetrievalIOSR Journals
Abstract: Information retrieval is the major task in present scenario as quantum of data is increasing with a
tremendous speed. So, to manage & mine knowledge for different users as per their interest, is the goal of every
organization whether it is related to grid computing, business intelligence, distributed databases or any other.
To achieve this goal of extracting quality information from large databases, software agents have proved to be
a strong pillar. Over the decades, researchers have implemented the concept of multi agents to get the process
of data mining done by focusing on its various steps. Among which data pre-processing is found to be the most
sensitive and crucial step as the quality of knowledge to be retrieved is totally dependent on the quality of raw
data. Many methods or tools are available to pre-process the data in an automated fashion using intelligent
(self learning) mobile agents effectively in distributed as well as centralized databases but various quality
factors are still to get attention to improve the retrieved knowledge quality. This article will provide a review of
the integration of these two emerging fields of software agents and knowledge retrieval process with the focus
on data pre-processing step.
Keywords: Data Mining, Multi Agents, Mobile Agents, Preprocessing, Software Agents
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATAIJMIT JOURNAL
Classification is one among the data mining function that assigns items in a collection to target categories
or collection of data to provide more accurate predictions and analysis. Classification using supervised
learning method aims to identify the category of the class to which a new data will fall under. With the
advancement of technology and increase in the generation of real-time data from various sources like
Internet, IoT and Social media it needs more processing and challenging. One such challenge in
processing is data imbalance. In the imbalanced dataset, majority classes dominate over minority classes
causing the machine learning classifiers to be more biased towards majority classes and also most
classification algorithm predicts all the test data with majority classes. In this paper, the author analysis
the data imbalance models using big data and classification algorithm
A new hybrid algorithm for business intelligence recommender systemIJNSA Journal
Business Intelligence is a set of methods, process and technologies that transform raw data into meaningful
and useful information. Recommender system is one of business intelligence system that is used to obtain
knowledge to the active user for better decision making. Recommender systems apply data mining
techniques to the problem of making personalized recommendations for information. Due to the growth in
the number of information and the users in recent years offers challenges in recommender systems.
Collaborative, content, demographic and knowledge-based are four different types of recommendations
systems. In this paper, a new hybrid algorithm is proposed for recommender system which combines
knowledge based, profile of the users and most frequent item mining technique to obtain intelligence.
Recommendation system using bloom filter in mapreduceIJDKP
Many clients like to use the Web to discover product details in the form of online reviews. The reviews are
provided by other clients and specialists. Recommender systems provide an important response to the
information overload problem as it presents users more practical and personalized information facilities.
Collaborative filtering methods are vital component in recommender systems as they generate high-quality
recommendations by influencing the likings of society of similar users. The collaborative filtering method
has assumption that people having same tastes choose the same items. The conventional collaborative
filtering system has drawbacks as sparse data problem & lack of scalability. A new recommender system is
required to deal with the sparse data problem & produce high quality recommendations in large scale
mobile environment. MapReduce is a programming model which is widely used for large-scale data
analysis. The described algorithm of recommendation mechanism for mobile commerce is user based
collaborative filtering using MapReduce which reduces scalability problem in conventional CF system.
One of the essential operations for the data analysis is join operation. But MapReduce is not very
competent to execute the join operation as it always uses all records in the datasets where only small
fraction of datasets are applicable for the join operation. This problem can be reduced by applying
bloomjoin algorithm. The bloom filters are constructed and used to filter out redundant intermediate
records. The proposed algorithm using bloom filter will reduce the number of intermediate results and will
improve the join performance.
A statistical data fusion technique in virtual data integration environmentIJDKP
Data fusion in the virtual data integration environment starts after detecting and clustering duplicated
records from the different integrated data sources. It refers to the process of selecting or fusing attribute
values from the clustered duplicates into a single record representing the real world object. In this paper, a
statistical technique for data fusion is introduced based on some probabilistic scores from both data
sources and clustered duplicates
A Survey of Agent Based Pre-Processing and Knowledge RetrievalIOSR Journals
Abstract: Information retrieval is the major task in present scenario as quantum of data is increasing with a
tremendous speed. So, to manage & mine knowledge for different users as per their interest, is the goal of every
organization whether it is related to grid computing, business intelligence, distributed databases or any other.
To achieve this goal of extracting quality information from large databases, software agents have proved to be
a strong pillar. Over the decades, researchers have implemented the concept of multi agents to get the process
of data mining done by focusing on its various steps. Among which data pre-processing is found to be the most
sensitive and crucial step as the quality of knowledge to be retrieved is totally dependent on the quality of raw
data. Many methods or tools are available to pre-process the data in an automated fashion using intelligent
(self learning) mobile agents effectively in distributed as well as centralized databases but various quality
factors are still to get attention to improve the retrieved knowledge quality. This article will provide a review of
the integration of these two emerging fields of software agents and knowledge retrieval process with the focus
on data pre-processing step.
Keywords: Data Mining, Multi Agents, Mobile Agents, Preprocessing, Software Agents
Performance Analysis of Selected Classifiers in User Profilingijdmtaiir
User profiles can serve as indicators of personal
preferences which can be effectively used while providing
personalized services. Building user files which can capture
accurate information of individuals has been a daunting task.
Several attempts have been made by researchers to extract
information from different data sources to build user profiles
on different application domains. Towards this end, in this
paper we employ different classification algorithmsto create
accurate user profiles based on information gathered from
demographic data. The aim of this work is to analyze the
performance of five most effective classification methods,
namely Bayesian Network(BN), Naïve Bayesian(NB), Naives
Bayes Updateable(NBU), J48, and Decision Table(DT). Our
simulation results show that, in general, the J48has the highest
classification accuracy performance with the lowest error rate.
On the other hand, it is found that Naïve Bayesian and Naives
Bayes Updateable classifiers have the lowest time requirement
to build the classification model
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
Characterizing and Processing of Big Data Using Data Mining TechniquesIJTET Journal
Abstract— Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. It concerns Large-Volume, Complex and growing data sets in both multiple and autonomous sources. Not only in science and engineering big data are now rapidly expanding in all domains like physical, bio logical etc...The main objective of this paper is to characterize the features of big data. Here the HACE theorem, that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective, is used. The aggregation of mining, analysis, information sources, user interest modeling, privacy and security are involved in this model. To explore and extract the large volumes of data and useful information or knowledge respectively is the most fundamental challenge in Big Data. So we should have a tendency to analyze these problems and knowledge revolution.
APPLICATION WISE ANNOTATIONS ON INTELLIGENT DATABASE TECHNIQUESJournal For Research
Databases are systems, which are used for storing data. With the increase in information at rapid pace, extraction of relevant information becomes time-consuming using traditional databases. Thus, the need of intelligent or smart databases arises. Intelligent database is a system in which automation is applied on conventional database in order to enhance its functionality and efficiency. Intelligent databases help us in making decisions and can respond or act to situations by learning. It has applications in wide areas such as healthcare, automobile, education, information security, business etc. This paper represents various intelligent database techniques, which are further used for implementation of applications of this domain.
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
Institution is a place where teacher explains and student just understands and learns the lesson. Every student has his own definition for toughness and easiness and there isn’t any absolute scale for measuring knowledge but examination score indicate the performance of student. In this case study, knowledge of data mining is combined with educational strategies to improve students’ performance. Generally, data mining (sometimes called data or knowledge discovery) is the process of analysing data from different perspectives and summarizing it into useful information. Data mining software is one of a number of analytical tools for data. It allows users to analyse data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational database. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).This project describes the use of clustering data mining technique to improve the efficiency of academic performance in the educational institutions .In this project, a live experiment was conducted on students .By conducting an exam on students of computer science major using MOODLE(LMS) and analysing that data generated using RapidMiner(Datamining Software) and later by performing clustering on the data. This method helps to identify the students who need special advising or counselling by the teacher to give high quality of education.
Identification of important features and data mining classification technique...IJECEIAES
Employees absenteeism at the work costs organizations billions a year. Prediction of employees’ absenteeism and the reasons behind their absence help organizations in reducing expenses and increasing productivity. Data mining turns the vast volume of human resources data into information that can help in decision-making and prediction. Although the selection of features is a critical step in data mining to enhance the efficiency of the final prediction, it is not yet known which method of feature selection is better. Therefore, this paper aims to compare the performance of three well-known feature selection methods in absenteeism prediction, which are relief-based feature selection, correlation-based feature selection and information-gain feature selection. In addition, this paper aims to find the best combination of feature selection method and data mining technique in enhancing the absenteeism prediction accuracy. Seven classification techniques were used as the prediction model. Additionally, cross-validation approach was utilized to assess the applied prediction models to have more realistic and reliable results. The used dataset was built at a courier company in Brazil with records of absenteeism at work. Regarding experimental results, correlationbased feature selection surpasses the other methods through the performance measurements. Furthermore, bagging classifier was the best-performing data mining technique when features were selected using correlation-based feature selection with an accuracy rate of (92%).
Framework to Avoid Similarity Attack in Big Streaming Data IJECEIAES
The existing methods for privacy preservation are available in variety of fields like social media, stock market, sentiment analysis, electronic health applications. The electronic health dynamic stream data is available in large quantity. Such large volume stream data is processed using delay free anonymization framework. Scalable privacy preserving techniques are required to satisfy the needs of processing large dynamic stream data. In this paper privacy preserving technique which can avoid similarity attack in big streaming data is proposed in distributed environment. It can process the data in parallel to reduce the anonymization delay. In this paper the replacement technique is used for avoiding similarity attack. Late validation technique is used to reduce information loss. The application of this method is in medical diagnosis, e-health applications, health data processing at third party.
Rapid changes in the technology lead to increased variety of data sources. These varied data sources
generating data in the large volume and with extremely high speed. To accommodate and use this data in decision
making systems is the big challenge. To make fullest use of the valuable data generated by different systems, target
users of the analysis systems need to be increased. In general knowledge discovery process using the tools which are
available requires the handsome expertise in the domain as well as in the technology. The project ITDA (Integrated
Tool for Data Analysis) focuses to provide the complete platform for multidimensional data analysis to enhance the
decision making process in every domain. This projects provides all the techniques required to perform
multidimensional data analysis and avoids the overheads occurred by the traditional cube architecture followed by
most of the analytics system. Modelling the available data in the multidimensional form is the basis and crucial step
for multidimensional analysis. This work describes the multidimensional modelling aspect and its implementation
using ITDA project.
Enhancement techniques for data warehouse staging areaIJDKP
Poor performance can turn a successful data warehousing project into a failure. Consequently, several
attempts have been made by various researchers to deal with the problem of scheduling the Extract-
Transform-Load (ETL) process. In this paper we therefore present several approaches in the context of
enhancing the data warehousing Extract, Transform and loading stages. We focus on enhancing the
performance of extract and transform phases by proposing two algorithms that reduce the time needed in
each phase through employing the hidden semantic information in the data. Using the semantic
information, a large volume of useless data can be pruned in early design stage. We also focus on the
problem of scheduling the execution of the ETL activities, with the goal of minimizing ETL execution time.
We explore and invest in this area by choosing three scheduling techniques for ETL. Finally, we
experimentally show their behavior in terms of execution time in the sales domain to understand the impact
of implementing any of them and choosing the one leading to maximum performance enhancement.
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...Editor IJMTER
This system technique is used for efficient data mining in SRMS (Student Records
Management System) through vertical approach with association rules in distributed databases. The
current leading technique is that of Kantarcioglu and Clifton[1]. In this system I deal with two
challenges or issues, one that computes the union of private subsets that each of the interacting users
hold, and another that tests the inclusion of an element held by one user in a subset held by another.
The existing system uses different techniques for data mining purpose like Apriori algorithm. The
Fast Distributed Mining (FDM) algorithm of Cheung et al. [2], which is an unsecured distributed
version of the Apriori algorithm. Proposed system offers enhanced privacy and data mining with
respect to the Encryption techniques and Association rule with Fp-Growth Algorithm in private
cloud (system contains different files of subjects with respect to their branches). Due to this above
techniques the expected effect on this system is that, it is simpler and more efficient in terms of
communication cost and combinational cost. Due to these techniques it will affect the parameter like
time consumption for execution, length of the code is decrease, find the data fast, extracting hidden
predictive information from large databases and the efficiency of this proposed system should
increase by the 20%.
An Comprehensive Study of Big Data Environment and its Challenges.ijceronline
Big Data is a data analysis methodology enabled by recent advances in technologies and Architecture. Big data is a massive volume of both structured and unstructured data, which is so large that it's difficult to process with traditional database and software techniques. This paper provides insight to Big data and discusses its nature, definition that include such features as Volume, Velocity, and Variety .This paper also provides insight to source of big data generation, tools available for processing large volume of variety of data, applications of big data and challenges involved in handling big data
Performance Analysis of Selected Classifiers in User Profilingijdmtaiir
User profiles can serve as indicators of personal
preferences which can be effectively used while providing
personalized services. Building user files which can capture
accurate information of individuals has been a daunting task.
Several attempts have been made by researchers to extract
information from different data sources to build user profiles
on different application domains. Towards this end, in this
paper we employ different classification algorithmsto create
accurate user profiles based on information gathered from
demographic data. The aim of this work is to analyze the
performance of five most effective classification methods,
namely Bayesian Network(BN), Naïve Bayesian(NB), Naives
Bayes Updateable(NBU), J48, and Decision Table(DT). Our
simulation results show that, in general, the J48has the highest
classification accuracy performance with the lowest error rate.
On the other hand, it is found that Naïve Bayesian and Naives
Bayes Updateable classifiers have the lowest time requirement
to build the classification model
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
Characterizing and Processing of Big Data Using Data Mining TechniquesIJTET Journal
Abstract— Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. It concerns Large-Volume, Complex and growing data sets in both multiple and autonomous sources. Not only in science and engineering big data are now rapidly expanding in all domains like physical, bio logical etc...The main objective of this paper is to characterize the features of big data. Here the HACE theorem, that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective, is used. The aggregation of mining, analysis, information sources, user interest modeling, privacy and security are involved in this model. To explore and extract the large volumes of data and useful information or knowledge respectively is the most fundamental challenge in Big Data. So we should have a tendency to analyze these problems and knowledge revolution.
APPLICATION WISE ANNOTATIONS ON INTELLIGENT DATABASE TECHNIQUESJournal For Research
Databases are systems, which are used for storing data. With the increase in information at rapid pace, extraction of relevant information becomes time-consuming using traditional databases. Thus, the need of intelligent or smart databases arises. Intelligent database is a system in which automation is applied on conventional database in order to enhance its functionality and efficiency. Intelligent databases help us in making decisions and can respond or act to situations by learning. It has applications in wide areas such as healthcare, automobile, education, information security, business etc. This paper represents various intelligent database techniques, which are further used for implementation of applications of this domain.
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
Institution is a place where teacher explains and student just understands and learns the lesson. Every student has his own definition for toughness and easiness and there isn’t any absolute scale for measuring knowledge but examination score indicate the performance of student. In this case study, knowledge of data mining is combined with educational strategies to improve students’ performance. Generally, data mining (sometimes called data or knowledge discovery) is the process of analysing data from different perspectives and summarizing it into useful information. Data mining software is one of a number of analytical tools for data. It allows users to analyse data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational database. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).This project describes the use of clustering data mining technique to improve the efficiency of academic performance in the educational institutions .In this project, a live experiment was conducted on students .By conducting an exam on students of computer science major using MOODLE(LMS) and analysing that data generated using RapidMiner(Datamining Software) and later by performing clustering on the data. This method helps to identify the students who need special advising or counselling by the teacher to give high quality of education.
Identification of important features and data mining classification technique...IJECEIAES
Employees absenteeism at the work costs organizations billions a year. Prediction of employees’ absenteeism and the reasons behind their absence help organizations in reducing expenses and increasing productivity. Data mining turns the vast volume of human resources data into information that can help in decision-making and prediction. Although the selection of features is a critical step in data mining to enhance the efficiency of the final prediction, it is not yet known which method of feature selection is better. Therefore, this paper aims to compare the performance of three well-known feature selection methods in absenteeism prediction, which are relief-based feature selection, correlation-based feature selection and information-gain feature selection. In addition, this paper aims to find the best combination of feature selection method and data mining technique in enhancing the absenteeism prediction accuracy. Seven classification techniques were used as the prediction model. Additionally, cross-validation approach was utilized to assess the applied prediction models to have more realistic and reliable results. The used dataset was built at a courier company in Brazil with records of absenteeism at work. Regarding experimental results, correlationbased feature selection surpasses the other methods through the performance measurements. Furthermore, bagging classifier was the best-performing data mining technique when features were selected using correlation-based feature selection with an accuracy rate of (92%).
Framework to Avoid Similarity Attack in Big Streaming Data IJECEIAES
The existing methods for privacy preservation are available in variety of fields like social media, stock market, sentiment analysis, electronic health applications. The electronic health dynamic stream data is available in large quantity. Such large volume stream data is processed using delay free anonymization framework. Scalable privacy preserving techniques are required to satisfy the needs of processing large dynamic stream data. In this paper privacy preserving technique which can avoid similarity attack in big streaming data is proposed in distributed environment. It can process the data in parallel to reduce the anonymization delay. In this paper the replacement technique is used for avoiding similarity attack. Late validation technique is used to reduce information loss. The application of this method is in medical diagnosis, e-health applications, health data processing at third party.
Rapid changes in the technology lead to increased variety of data sources. These varied data sources
generating data in the large volume and with extremely high speed. To accommodate and use this data in decision
making systems is the big challenge. To make fullest use of the valuable data generated by different systems, target
users of the analysis systems need to be increased. In general knowledge discovery process using the tools which are
available requires the handsome expertise in the domain as well as in the technology. The project ITDA (Integrated
Tool for Data Analysis) focuses to provide the complete platform for multidimensional data analysis to enhance the
decision making process in every domain. This projects provides all the techniques required to perform
multidimensional data analysis and avoids the overheads occurred by the traditional cube architecture followed by
most of the analytics system. Modelling the available data in the multidimensional form is the basis and crucial step
for multidimensional analysis. This work describes the multidimensional modelling aspect and its implementation
using ITDA project.
Enhancement techniques for data warehouse staging areaIJDKP
Poor performance can turn a successful data warehousing project into a failure. Consequently, several
attempts have been made by various researchers to deal with the problem of scheduling the Extract-
Transform-Load (ETL) process. In this paper we therefore present several approaches in the context of
enhancing the data warehousing Extract, Transform and loading stages. We focus on enhancing the
performance of extract and transform phases by proposing two algorithms that reduce the time needed in
each phase through employing the hidden semantic information in the data. Using the semantic
information, a large volume of useless data can be pruned in early design stage. We also focus on the
problem of scheduling the execution of the ETL activities, with the goal of minimizing ETL execution time.
We explore and invest in this area by choosing three scheduling techniques for ETL. Finally, we
experimentally show their behavior in terms of execution time in the sales domain to understand the impact
of implementing any of them and choosing the one leading to maximum performance enhancement.
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...Editor IJMTER
This system technique is used for efficient data mining in SRMS (Student Records
Management System) through vertical approach with association rules in distributed databases. The
current leading technique is that of Kantarcioglu and Clifton[1]. In this system I deal with two
challenges or issues, one that computes the union of private subsets that each of the interacting users
hold, and another that tests the inclusion of an element held by one user in a subset held by another.
The existing system uses different techniques for data mining purpose like Apriori algorithm. The
Fast Distributed Mining (FDM) algorithm of Cheung et al. [2], which is an unsecured distributed
version of the Apriori algorithm. Proposed system offers enhanced privacy and data mining with
respect to the Encryption techniques and Association rule with Fp-Growth Algorithm in private
cloud (system contains different files of subjects with respect to their branches). Due to this above
techniques the expected effect on this system is that, it is simpler and more efficient in terms of
communication cost and combinational cost. Due to these techniques it will affect the parameter like
time consumption for execution, length of the code is decrease, find the data fast, extracting hidden
predictive information from large databases and the efficiency of this proposed system should
increase by the 20%.
An Comprehensive Study of Big Data Environment and its Challenges.ijceronline
Big Data is a data analysis methodology enabled by recent advances in technologies and Architecture. Big data is a massive volume of both structured and unstructured data, which is so large that it's difficult to process with traditional database and software techniques. This paper provides insight to Big data and discusses its nature, definition that include such features as Volume, Velocity, and Variety .This paper also provides insight to source of big data generation, tools available for processing large volume of variety of data, applications of big data and challenges involved in handling big data
Case study marki Żywiec z Albumu Superbrands Polska 2006Superbrands Polska
Żywiec to najbardziej znane polskie piwo. Tworzenie wizerunku marki trwało latami, a przywiązanie klientów budowano poprzez doskonałą i niezmienną jakość oraz szacunek, jakim darzono konsumentów. Rola i znaczenie marki żywieckiej mimo przemian historycznych stale wzrastały i dzisiaj Żywiec jest jedną z kilku marek narodowych, stanowiąc wzór dla kolejnych
pokoleń przedsiębiorców. Marka Żywiec, jako pierwsza polska marka w historii, została wyróżniona w brytyjskiej edycji Superbrands. Żywiec otrzymał wyróżnienie w kategorii „Brands to watch” – marki godnej uwagi.
Za nami pierwszy rok działania Płockampu. Przygotowałem małe podsumowanie tego, co działo się w tym czasie. 8 spotkań, 28 prelegentów i prawie 600 uczestników!
W Płocku startupy mogą czuć się dobrze i mogą się rozwijać. Biznes ma tu fajne warunki.
ACERCA DEL REGISTRO NACIONAL DE ABOGADOS SANCIONADOS POR MALA PRACTICA PROFES...Ciro Victor Palomino Dongo
Cuando buscamos el significado de la palabra corrupción, caemos en la cuenta que tiene
múltiples definiciones, pero si hacemos un esfuerzo de síntesis de todo lo escrito sobre el tema,
podemos decir, que es el acto que de manera secreta y privada se comete para trasgredir las normas
legales y los principios éticos con el objeto de conseguir una ventaja ilegítima, de donde podemos
concluir que básicamente la corrupción es la comisión de delitos en el ejercicio de un cargo público,
sin que por eso dejen de ser corruptos los que vulneran los Códigos de Ética establecidos, y
corrompidos los que quebrantan las buenas costumbres sociales.
La corrupción genera un impacto negativo a todo nivel, pues consolida la desigualdad
social, influye en el crecimiento de los costos de los bienes y servicios, fomenta el enriquecimiento
sin causa, deteriora el valor de la mano de obra, merma la moral que es el sustento de la
convivencia social, insulta la inteligencia y genera redes de complicidad en las elites políticas y
económicas abusivas del poder, entre otras lacras que hay que limpiar.
Placing a potted plant in your office space can make you healthier.
***
Play Saturday: WOW Me.
It was a unique opportunity for 24Slides team members to shine and showcase two of their greatest skills: presenting awesome slide designs (Works of Wonder) and sharing inspiring Words of Wisdom—making it a WOW moment for all the participants and the audience.
A NEW HYBRID ALGORITHM FOR BUSINESS INTELLIGENCE RECOMMENDER SYSTEMIJNSA Journal
Business Intelligence is a set of methods, process and technologies that transform raw data into meaningful and useful information. Recommender system is one of business intelligence system that is used to obtain knowledge to the active user for better decision making. Recommender systems apply data mining techniques to the problem of making personalized recommendations for information. Due to the growth in the number of information and the users in recent years offers challenges in recommender systems. Collaborative, content, demographic and knowledge-based are four different types of recommendations systems. In this paper, a new hybrid algorithm is proposed for recommender system which combines knowledge based, profile of the users and most frequent item mining technique to obtain intelligence.
Due to the arrival of new technologies, devices, and communication means, the amount of data produced by mankind is growing rapidly every year. This gives rise to the era of big data. The term big data comes with the new challenges to input, process and output the data. The paper focuses on limitation of traditional approach to manage the data and the components that are useful in handling big data. One of the approaches used in processing big data is Hadoop framework, the paper presents the major components of the framework and working process within the framework.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
In the recent years the scope of data mining has evolved into an active area of research because of the previously unknown and interesting knowledge from very large database collection. The data mining is applied on a variety of applications in multiple domains like in business, IT and many more sectors. In Data Mining the major problem which receives great attention by the community is the classification of the data. The classification of data should be such that it could be they can be easily verified and should be easily interpreted by the humans. In this paper we would be studying various data mining techniques so that we can find few combinations for enhancing the hybrid technique which would be having multiple techniques involved so enhance the usability of the application. We would be studying CHARM Algorithm, CM-SPAM Algorithm, Apriori Algorithm, MOPNAR Algorithm and the Top K Rules.
Software Bug Detection Algorithm using Data mining TechniquesAM Publications
The main aim of software development is to develop high quality software and high quality software is
developed using enormous amount of software engineering data. The software engineering data can be used to gain
empirically based understanding of software development. The meaning full information can be extracted using
various data mining techniques. As Data Mining for Secure Software Engineering improves software productivity and
quality, software engineers are increasingly applying data mining algorithms to various software engineering tasks.
However mining software engineering data poses several challenges, requiring various algorithms to effectively mine
sequences, graphs and text from such data. Software engineering data includes code bases, execution traces,
historical code changes, mailing lists and bug data bases. They contains a wealth of information about a projectsstatus,
progress and evolution. Using well established data mining techniques, practitioners and researchers can
explore the potential of this valuable data in order to better manage their projects and do produce higher-quality
software systems that are delivered on time and within budget
Service Level Comparison for Online Shopping using Data MiningIIRindia
The term knowledge discovery in databases (KDD) is the analysis step of data mining. The data mining goal is to extract the knowledge and patterns from large data sets, not the data extraction itself. Big-Data Computing is a critical challenge for the ICT industry. Engineers and researchers are dealing with the cloud computing paradigm of petabyte data sets. Thus the demand for building a service stack to distribute, manage and process massive data sets has risen drastically. We investigate the problem for a single source node to broadcast the big chunk of data sets to a set of nodes to minimize the maximum completion time. These nodes may locate in the same datacenter or across geo-distributed data centers. The Big-data broadcasting problem is modeled into a LockStep Broadcast Tree (LSBT) problem. And the main idea of the LSBT is defining a basic unit of upload bandwidth, r, a node with capacity c broadcasts data to a set of [c=r] children at the rate r. Note that r is a parameter to be optimized as part of the LSBT problem. The broadcast data are further divided into m chunks. In a pipeline manner, these m chunks can then be broadcast down the LSBT. In a homogeneous network environment in which each node has the same upload capacity c, the optimal uplink rate r, of LSBT is either c=2 or 3, whichever gives the smaller maximum completion time. For heterogeneous environments, an O(nlog2n) algorithm is presented to select an optimal uplink rate r, and to construct an optimal LSBT. With lower computational complexity and low maximum completion time, the numerical results shows better performance.The methodology includes Various Web applications Building and Broadcasting followed by the Gateway Application and Batch Processing over the TSV Data after which the Web Crawling for Resources and MapReduce process takes place and finally Picking Products from Recommendations and Purchasing it.
Different Classification Technique for Data mining in Insurance Industry usin...IOSRjournaljce
this paper addresses the issues and techniques for Property/Casualty actuaries applying data mining methods. Data mining means the effective unknown pattern discovery from a large amount database. It is an interactive knowledge discovery procedure which is includes data acquisition, data integration, data exploration, model building, and model validation. The paper provides an overview of the data discovery method and introduces some important data mining method for application to insurance concluding cluster discovery approaches.
Analysing Transportation Data with Open Source Big Data Analytic Toolsijeei-iaes
Big data analytics allows a vast amount of structured and unstructured data to be effectively processed so that correlations, hidden patterns, and other useful information can be mined from the data. Several open source big data analytic tools that can perform tasks such as dimensionality reduction, feature extraction, transformation, optimization, are now available. One interesting area where such tools can provide effective solutions is transportation. Big data analytics can be used to efficiently manage transport infrastructure assets such as roads, airports, bus stations or ports. In this paper an overview of two open source big data analytic tools is first provided followed by a simple demonstration of application of these tools on transport dataset.
Survey of the Euro Currency Fluctuation by Using Data Miningijcsit
Data mining or Knowledge Discovery in Databases (KDD) is a new field in information technology that emerged because of progress in creation and maintenance of large databases by combining statistical and artificial intelligence methods with database management. Data mining is used to recognize hidden patterns and provide relevant information for decision making on complex problems where conventional methods are inecient or too slow. Data mining can be used as a powerful tool to predict future trends and behaviors, and this prediction allows making proactive, knowledge-driven decisions in businesses. Since the automated prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools, it can answer the business questions which are traditionally time consuming to resolve. Based on this great advantage, it provides more interest for the government, industry and commerce. In this paper we have used this tool to investigate the Euro currency fluctuation.For this investigation, we have three different algorithms: K*, IBK and MLP and we have extracted.Euro currency volatility by using the same criteria for all used algorithms. The used dataset has
21,084 records and is collected from daily price fluctuations in the Euro currency in the period
of10/2006 to 04/2010.
Decision Making Framework in e-Business Cloud Environment Using Software Metr...ijitjournal
Cloud computing technology is most important one in IT industry by enabling them to offer access to their
system and application services on payment type. As a result, more than a few enterprises with Facebook,
Microsoft, Google, and amazon have started offer to their clients. Quality software is most important one in
market competition in this paper presents a hybrid framework based on the goal/question/metric paradigm
to evaluate the quality and effectiveness of previous software goods in project, product and organizations
in a cloud computing environment. In our approach it support decision making in the area of project,
product and organization levels using Neural networks and three angular metrics i.e., project metrics,
product metrics, and organization metrics
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...ijdpsjournal
The paper aims at proposing a solution for designing and developing a seamless automation and
integration of machine learning capabilities for Big Data with the following requirements: 1) the ability to
seamlessly handle and scale very large amount of unstructured and structured data from diversified and
heterogeneous sources; 2) the ability to systematically determine the steps and procedures needed for
analyzing Big Data datasets based on data characteristics, domain expert inputs, and data pre-processing
component; 3) the ability to automatically select the most appropriate libraries and tools to compute and
accelerate the machine learning computations; and 4) the ability to perform Big Data analytics with high
learning performance, but with minimal human intervention and supervision. The whole focus is to provide
a seamless automated and integrated solution which can be effectively used to analyze Big Data with highfrequency
and high-dimensional features from different types of data characteristics and different
application problem domains, with high accuracy, robustness, and scalability. This paper highlights the
research methodologies and research activities that we propose to be conducted by the Big Data
researchers and practitioners in order to develop and support seamless automation and integration of
machine learning capabilities for Big Data analytics.
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...ijdpsjournal
The paper aims at proposing a solution for designing and developing a seamless automation and integration of machine learning capabilities for Big Data with the following requirements: 1) the ability to seamlessly handle and scale very large amount of unstructured and structured data from diversified and heterogeneous sources; 2) the ability to systematically determine the steps and procedures needed for
analyzing Big Data datasets based on data characteristics, domain expert inputs, and data pre-processing component; 3) the ability to automatically select the most appropriate libraries and tools to compute and accelerate the machine learning computations; and 4) the ability to perform Big Data analytics with high learning performance, but with minimal human intervention and supervision. The whole focus is to provide
a seamless automated and integrated solution which can be effectively used to analyze Big Data with highfrequency
and high-dimensional features from different types of data characteristics and different application problem domains, with high accuracy, robustness, and scalability. This paper highlights the research methodologies and research activities that we propose to be conducted by the Big Data researchers and practitioners in order to develop and support seamless automation and integration of machine learning capabilities for Big Data analytics.
The security and speed of data transmission is very important in data communications, the steps that can be done is to use the appropriate cryptographic and compression algorithms in this case is the Data Encryption Standard and Lempel-Ziv-Welch algorithms combined to get the data safe and also the results good compression so that the transmission process can run properly, safely and quickly.
The problem of electric power quality is a matter of changing the form of voltage, current or frequency that can cause failure of equipment, either utility equipment or consumer property. Components of household equipment there are many nonlinear loads, one of which Mixer. Even a load nonlinear current waveform and voltage is not sinusoidal. Due to the use of household appliances such as mixers, it will cause harmonics problems that can damage the electrical system equipment. This study analyzes the percentage value of harmonics in Mixer and reduces harmonics according to standard. Measurements made before the use of LC passive filter yield total current harmonic distortion value (THDi) is 61.48%, while after passive filter use LC the THDi percentage becomes 23.75%. The order of harmonic current in the 3rd order mixer (IHDi) is 0.4185 A not according to standard, after the use of LC passive filter to 0.088 A and it is in accordance with the desired standard, and with the use of passive filter LC, the power factor value becomes better than 0.75 to 0.98.
This paper examines the long-term simultaneous response between dividend policy and corporate value. The main problem studied is that the dividend policy is responded very slowly to the final goal of corporate value. Analysis of Data was using Vector Autoregression (VAR). The result of the discussion concludes the effect of different simultaneous response every period between dividend policy with corporate value, short-term, medium-term, and long-term. The strongest response to dividend changes comes from free cash flow whereas the highest response to corporate value comes from market book value.
Whatsapp is a social media application that is currently widely used from various circles due to ease of use and security is good enough, the security at the time of communicating at this time is very important as well with Whatsapp. Whatsapp from the network is very secure but on the local storage that contains the message was not safe enough because the message on local storage is not secured properly using a special algorithm even using the software Whatsapp Database Viewer whatsapp message can be known, to improve the security of messages on local storage whatsapp submitted security enhancements using the Modular Multiplication Block Cipher algorithm so that the message on whatsapp would be better in terms of security and not easy to read by unauthorized ones.
Consumers are increasingly easy to access to information resources. Consumers quickly interact with whatever they will spend. Ease of use of technology an impact on consumer an attitude are increasingly intelligent and has encouraged the rise of digital transactions. Technology makes it easy for them to transact on an e-commerce shopping channel. Future e-commerce trends will lead to User Generated Content related to user behavior in Indonesia that tends to compare between shopping channels. The purpose of this study was to examine the direct and indirect effects of Perceived Ease of Use on Behavioral Intention to transact in which Perceived Usefulness is used as an intervening variable. The present study used the descriptive exploratory method with causal-predictive analysis. Determination method of research sample used purposive sampling. The enumerator team assists in the distribution of questionnaires. The results of the study found that the direct effect of perceived ease of use on behavioral intention to transact is smaller than that indirectly mediated by perceived usefulness variables.
Performance is a process of assessment of the algorithm. Speed and security is the performance to be achieved in determining which algorithm is better to use. In determining the optimum route, there are two algorithms that can be used for comparison. The Genetic and Primary algorithms are two very popular algorithms for determining the optimum route on the graph. Prim can minimize circuit to avoid connected loop. Prim will determine the best route based on active vertex. This algorithm is especially useful when applied in a minimum spanning tree case. Genetics works with probability properties. Genetics cannot determine which route has the maximum value. However, genetics can determine the overall optimum route based on appropriate parameters. Each algorithm can be used for the case of the shortest path, minimum spanning tree or traveling salesman problem. The Prim algorithm is superior to the speed of Genetics. The strength of the Genetic algorithm lies in the number of generations and population generated as well as the selection, crossover and mutation processes as the resultant support. The disadvantage of the Genetic algorithm is spending to much time to get the desired result. Overall, the Prim algorithm has better performance than Genetic especially for a large number of vertices.
Implementation of Decision Support System for various purposes now can facilitate policy makers to get the best alternative from a variety of predefined criteria, one of the methods used in the implementation of Decision Support System is VIKOR (Vise Kriterijumska Optimizacija I Kompromisno Resenje), VIKOR method in this research got the best results with an efficient and easily understood process computationally, it is expected that the results of this study facilitate various parties to develop a model any solutions.
Edge detection is one of the most frequent processes in digital image processing for various purposes, one of which is detecting road damage based on crack paths that can be checked using a Canny algorithm. This paper proposed a mobile application to detect cracks in the road and with customized threshold function in the requests to produce useful and accurate edge detection. The experimental results show that the use of threshold function in a canny algorithm can detect better damage in the road
The security and confidentiality of information becomes an important factor in communication, the use of cryptography can be a powerful way of securing the information, IDEA (International Data Encryption Algorithm) and WAKE (Word Auto Key Encryption) are some modern symmetric cryptography algorithms with encryption and decryption function are much faster than the asymmetric cryptographic algorithm, with the combination experiment IDEA and WAKE it probable to produce highly secret ciphertext and it hopes to take a very long time for cryptanalyst to decrypt the information without knowing the key of the encryption process.
Employees are the backbone of corporate activities and the giving of bonuses, job titles and allowances to employees to motivate the work of employees is very necessary, salesman on the company very much and to find the best salesman cannot be done manually and for that required the implementation of a system in this decision support system by applying the TOPSIS method, it is expected with the implementation of TOPSIS method the expected results of top management can be fulfilled.
English is a language that must be known all-digital era at this time where almost all information is in English, ranging from kindergarten to college learn English. elementary school is now also there are learning and to help introduce English is prototype application recogni-tion of common words in English and can be updated dynamically so that updates occur information to new words and sentences in Eng-lish to be introduced to students.
The selection of the best employees is one of the process of evaluating how well the performance of the employees is adjusted to the standards set by the company and usually done by top management such as General Manager or Director. In general, the selection of the best employees is still perform manually with many criteria and alternatives, and this usually make it difficult top managerial making decisions as well as the selection of the best employees periodically into a long and complicated process. Therefore, it is necessary to build a decision support system that can help facilitate the decision maker in determining the best choice based on standard criteria, faster, and more objective. In this research, the computational method of decision-making system used is Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS). The criteria used in the selection of the best employees are: job responsibilities, work discipline, work quality, and behaviour. The final result of the global priority value of the best employee candidates is used as the best employee selection decision making tool by top management.
Rabin Karp algorithm is a search algorithm that searches for a substring pattern in a text using hashing. It is beneficial for matching words with many patterns. One of the practical applications of Rabin Karp's algorithm is in the detection of plagiarism. Michael O. Rabin and Richard M. Karp invented the algorithm. This algorithm performs string search by using a hash function. A hash function is the values that are compared between two documents to determine the level of similarity of the document. Rabin-Karp algorithm is not very good for single pattern text search. This algorithm is perfect for multiple pattern search. The Levenshtein algorithm can be used to replace the hash calculation on the Rabin-Karp algorithm. The hash calculation on Rabin-Karp only counts the number of hashes that have the same value in both documents. Using the Levenshtein algorithm, the calculation of the hash distance in both documents will result in better accuracy.
Cybercrime is a digital crime committed to reaping profits through the Internet as a medium. Any criminal activity that occurs in the digital world or through the internet network is referred to as internet crime. Cybercrime also refers to criminal activity on computers and computer networks. This activity can be done in a certain location or even done between countries. These crimes include credit card forgery, confidence fraud, the dissemination of personal information, pornography, and so on. In ancient times there was no strong law to combat cybercrime. Since there are electronic information laws and transactions, legal jurisdiction of computer crime has been applied. Computer networks are not only installed in one particular local area but can be applied to a worldwide network. It is what makes cybercrime can occur between countries freely. This issue requires universal jurisdiction. A country has the authority to combat crimes that threaten the international community. This jurisdiction is applied without determining where the crime was committed and the citizen who committed the cybercrime. This jurisdiction is created in the absence of an international judicial body specifically to try individual crimes. Cybercrime cannot be totally eradicated. Implementing international jurisdiction at least reduces the number of cybercrimes in the world.
Competitive market competition so the company must be smart in managing finance. In promoting the selling point, marketing is the most important step to be considered. Promotional routine activity is one of the marketing techniques to increase consumer appeal to marketed products. One of the important agendas of promotion is the selection of the most appropriate promotional media. The problem that often occurs in the process of selecting a promotional media is the subjectivity of decision making. Marketing activities have a taxation fund that must be issued. Limited funds are one of the constraints of improving market strategy. So far, the selection of promotional media is performed by the company manually using standardized determination that already applies. It has many shortcomings, among others, regarding effectiveness and efficiency of time and limited funds. Markov Chain is very helpful to the company in analyzing the development of the company over a period. This method can predict the market share in the future so that company can optimize promotion cost at the certain time. Implementation of this algorithm produces a percentage of market share so that businesses can determine and choose which way is more appropriate to improve the company's market strategy. Assessment is done by looking at consumer criteria of a particular product. These criteria can determine consumer interest in a product so that it can be analyzed consumer behavior.
The transition of copper cable technology to fiber optic is very triggering the development of technology where data can be transmitted quickly and accurately. This cable change can be seen everywhere. This cable is an expensive cable. If it is not installed optimally, it will cost enormously. This excess cost can be used to other things to support performance rather than for excess cable that should be minimized. Determining how much cable use at the time of installation is difficult if done manually. Prim's algorithm can optimize by calculating the minimum spanning tree on branches used for fiber optic cable installation. This algorithm can be used to shorten the time to a destination by making all the points interconnected according to the points listed. Use of this method helps save the cost of fiber optic construction.
An image is a medium for conveying information. The information contained therein may be a particular event, experience or moment. Not infrequently many images that have similarities. However, this level of similarity is not easily detected by the human eye. Eigenface is one technique to calculate the resemblance of an object. This technique calculates based on the intensity of the colors that exist in the two images compared. The stages used are normalization, eigenface, training, and testing. Eigenface is used to calculate pixel proximity between images. This calculation yields the feature value used for comparison. The smallest value of the feature value is an image very close to the original image. Application of this method is very helpful for analysts to predict the likeness of digital images. Also, it can be used in the field of steganography, digital forensic, face recognition and so forth.
Compression is an activity performed to reduce its size into smaller than earlier. Compression is created since lack of adequate storage capacity. Data compression is also needed to speed up data transmission activity between computer networks. Compression has the different rule between speed and density. Compressed compression will take longer than compression that relies on speed. Elias Delta is one of the lossless compression techniques that can compress the characters. This compression is created based on the frequency of the character of a character on a document to be compressed. It works based on bit deductions on seven or eight bits. The most common characters will have the least number of bits, while the fewest characters will have the longest number of bits. The formation of character sets serves to eliminate double characters in the calculation of the number of each character as well as for the compression table storage. It has a good level of comparison between before and after compression. The speed of compression and decompression process possessed by this method is outstanding and fast.
Technological developments in computer networks increasingly demand security on systems built. Security also requires flexibility, efficiency, and effectiveness. The exchange of information through the internet connection is a common thing to do now. However, this way can be able to trigger data theft or cyber crime which resulted in losses for both parties. Data theft rate is getting higher by using a wireless network. The wireless system does not have any signal restrictions that can be intercepted Filtering is used to restrict incoming access through the internet. It aims to avoid intruders or people who want to steal data. This is fatal if not anticipated. IP and MAC filtering is a way to protect wireless networks from being used and misused by just anyone. This technique is very useful for securing data on the computer if it joins the public network. By registering IP and MAC on a router, this will keep the information unused and stolen. This system is only a few computers that can be connected to a wireless hotspot by IP and MAC Address listed.
Catfish is one type of freshwater fish. This fish has a good taste. In the cultivation of these fish, many obstacles need to be faced. Because living in dirty water, this type of fish is susceptible to disease. Many symptoms arise during the fish cultivation process; From skin disease to physical. Catfish farmers do not know how to diagnose diseases that exist in their livestock. This diagnosis serves to separate places between good and sick catfish. The goal is that the sale value of the fish is high. Catfish that have diseases will be sold cheaper to be used as other animal feed while healthy fish will be sold to the market or exported to other countries. Diagnosis can be done by expert system method. The algorithm of certainty factor is one of the good algorithms to determine the percentage of possible fish disease. This algorithm is very helpful for farmers to improve catfish farming.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
Comparison Between WEKA and Salford System in Data Mining Software
1. SSRG International Journal of Mobile Computing & Application (SSRG-IJMCA) – volume 3 Issue 4 July to August 2016
ISSN: 2393 - 9141 www.internationaljournalssrg.org Page 1
Comparison Between WEKA and Salford
Systemin Data Mining Software
Wirda Fitriani1
,Andysah Putera Utama Siahaan2
1,2
Faculty of Computer Science
1,2
Universitas Pembangunan Panca Budi
1,2
Jl. Jend. Gatot Subroto Km. 4,5 Sei Sikambing, 20122, Medan, Sumatera Utara, Indonesia
Abstract–Processing of the data generated from
transactions that occur every day which resulted in
nearly thousands of data per day requires software
capable of enabling users to conduct a search of the
necessary data. Data mining becomes a solution for
the problem. To that end, many large industries began
creating software that can perform data processing.
Due to the high cost to obtain data mining software
that comes from the big industry, then eventually some
communities such as universities eventually provide
convenience for users who want just to learn or to
deepen the data mining to create software based on
open source. Meanwhile, many commercial vendors
market their products respectively. WEKA and Salford
System are both of data mining software. They have
the advantages and the disadvantages. This study is to
compare them by using several attributes. The users
can select which software is more suitable for their
daily activities.
Keywords –Data Mining, Decision Tree, Software
I. INTRODUCTION
Data is one of the valuable assets. The input data is
a set of sequences, called datasequences [5]. To be
able to get useful information, then the required data
to produce good quality reliable information and real
time. The excellent information will provide useful
results for the user information. It is undeniable that in
the current era of globalization, the transactions that
occur each day will produce data whose numbers are
also very much [1, 8]. Moreover, such data should be
processed for any interests. Imagine if the data is
processed manually, it takes a long time to finish. The
system must handle thousands of data every day. It
also requires a proper machine to process thousands of
data stored in the database so that the information
received fast, accurate, reliable and of course available
when needed [8]. Because every day, the collected
information in a huge number, are the result of some
transactions that occur in next time. So it takes data
mining to analyze large data sets. Moreover, of course,
tools can analyze and process thousands of data per
day. With the tools available for data mining, the data
processing jobs will be easier to explain.
Currently, there is various software available for
data mining. The open source and the commercial
applications Each of the software certainly has
advantages and disadvantages, besides indeed paid
apps will get better support from the provider. Even so,
it does not mean open source software quality is not
better than the commercial one. Moreover, the open
source software will support the learning process with
no charge at all. Only, when it was discovered bug
during use of the software, users can not directly get
improvements. The users have to wait for the open
source community fix it then.In this paper, this
research tries to test the contribution of each software.
In some case, people selects the specific software
based on some criteria. It involves some tests of the
decision tree to compare which one is better for the
overall process.
II. THEORIES
A. Data Mining.
Data mining is a method that uses statistical
techniques, mathematic, artificial intelligence and
machine learning to extract and identify useful
information and knowledge. It is assembled from a
variety of large databases [2]. Also, data mining is
also often referred to as Knowledge Discovery from
Data or KDD [1]. Relationships are sought in data
mining. It could be a connection between two or more
in one dimension. For example in the dimensions of
the products, the linkages purchase of a product from
other products. Relationships arise between two or
more attributes and two or more objects [9].
B. C.45 Algorithm
The decision tree is a classification and prediction
method that is powerful and famous. Decision tree
method changes a huge fact into a decision tree that
represents the rule. Moreover, also can be explained in
the form of database language such as Structured
Query Language to find records in a particular
category. The decision tree is also useful to explore
the data, find hidden relationships between some
potential input variables with a target variable.A
decision tree is a structure that divides large datasets
into the sets of records that are smaller by applying a
set of decision rules. With each of the division series,
members of the result set to be similar to each other
[1].Several algorithms can be used in the formation of
2. SSRG International Journal of Mobile Computing & Application (SSRG-IJMCA) – volume 3 Issue 4 July to August 2016
ISSN: 2393 - 9141 www.internationaljournalssrg.org Page 2
the decision tree, such as ID3, CART, and C4.5. The
C4.5 algorithm is a development of the algorithm ID3
[4][7].
C. CART (Classification and Regresion Trees)
CART is a method or algorithm of decision tree
technique. CART is a nonparametric statistical method
to describe the relationship between the response
variable (dependent variable) with one or more
predictor variables [4]. CART method was first
proposed by Leo Breiman in 1984. The resulting
CART decision tree is a binary tree, where each node
required to have two branches. CART is recursively
dividing the records into the training data into subsets
that have the value of the target attribute (class) of the
same. CART algorithm build a decision tree to select
the most optimal branch for each node. Selection
works by counting all possibilities at each variable [5].
D. Salford System
Salford Systems was founded in 1983; Salford
Systems specializes in providing new-generation data
mining and modeling software and consulting services
selection. Applications are provided, both software
and consulting range of market segmentation research,
direct sales, fraud detection, credit scoring, risk
management, biomedical research and the quality
control of the manufacturer. Industrial use of products
and consulting services from Salford Systems,
including telecommunications, transportation, banking,
financial services, insurance, healthcare,
manufacturing, retail and catalog sales, and education.
Salford Software Systems have been installed in more
than 3,500 locations worldwide, including 300 major
universities. The main customers include AT & T
Universal Card Services, Pfizer Pharmaceuticals,
General Motors, and Sears, Roebuck and Co.
E. WEKA
Waikato Environment for Knowledge Analysis
(WEKA) is a machine learning software written in
Java are popular, developed at the University of
Waikato in New Zealand [3]. WEKA is free software
available under the GNU General Public License.
Weka provides the use classification technique using
J48 decision tree algorithm [6]. Classification
techniques and algorithms used in the WEKA called
classifier.
III. PROPOSED WORK
This paper uses the patient registration data in the
Emergency Room (IRD) on Pirngadi Hospital Medan,
where the data is taken from patient registration on
May 16, 2016. From these data, it is known that the
patients admitted and registered at the IRD are not
only due to illness, but there are also recorded as a
traffic accident patients. Data mining softwares are
suitable for this situation. It produces complete reports
and decisions [10].
Fig. 1 The screenshot of the patients list
Figure 1 shows the patients list captured by the
computer. The data is a raw data before making the
decision. It works on both WEKA and Salford System.
IV. TESTING AND IMPLEMENTATION
A. Several Attributes
This research proposes the comparison between
WEKA and Salford System. The test covers several
attributes such as installation, price, configuration,
interface and operation system. Table 1 illustrates
several attributes that compare both software.
TABLE 1 : THE SEVERAL COMPARISONS
Attribute WEKA Salford
Installation Easy Easy
Price Open Source Commercial
Configuration Easy Easy
Interface Complicated Simple
Operating System Windows Windows
Operation Hard Easy
B. Evaluation of Salford System
The test on data in Figure 1, using the Salford
Predictive Modeler version 8. Data use cases will
compare the attributes of the last condition, where
these attributes will determine whether the patients
had received treatment at IRD allowed to go straight
home, or hospitalized. In Figure 2, there are ten
attributes which will be compared then. Each attribute
is connected to the other. The algorithm provided by
the software calculates the attributes and results the
decision tree.
3. SSRG International Journal of Mobile Computing & Application (SSRG-IJMCA) – volume 3 Issue 4 July to August 2016
ISSN: 2393 - 9141 www.internationaljournalssrg.org Page 3
Fig. 2 The attributes
After configuring the data on the model by
specifying the attributes to be compared to the target,
and then determine the Class Value on Categorical
and finally press the Start button to determine the
results and see the tree configuration arising from the
configuration. Figure 3 shows the configuration
window and the attributes. These checkboxes must be
selected to determine what attributes are used.
Fig. 3 The attributes configuration
The continue buttonwill process the calculation. It
produces two pieces of nodes on the tree using
decision tree CART. The decision tree comes with
several colored-nodes. There is also a chart to
represent the node value and the legend of each node.
Fig. 4 The decision tree
C. Evaluation of WEKA
With the same attributes and the same test sample
data in Figure 1, the data will be tested by using Weka
version 3.8.0. Data with 29 cases that used to be
comparing the attributes of the last condition as well
as the comparison attribute. Figure 5 shows the
configuration attributes using the J48 algorithm. In
open source apps, the script can be modified to
improve the result better.
Fig. 5 The WEKA configuration attributes
After configuring the data attribute case with the
last condition for comparison by using the J48
decision tree, then the result output that appears with
4. SSRG International Journal of Mobile Computing & Application (SSRG-IJMCA) – volume 3 Issue 4 July to August 2016
ISSN: 2393 - 9141 www.internationaljournalssrg.org Page 4
the source code. After the configuration is done, it
shows the visualization tree consisting of four pieces
of leaves as showed in Figure 6.
.
Fig. 6 The Tree Visualizer
V. CONCLUSION
In the global network, data is valuable assets to
maintain. Many software offers the facilities to keep
data structured. From the comparison of both the
software, it is concluded that the configuration data on
the use of Weka software easier than on the
configuration of the Salford Predictive Modeler. In
Salford System Predictive Modeler, the name attribute
that consists of 29 cases become the root. Meanwhile,
in WEKA, the KET_DTL_SUBINST attribute
becomes the root. The tree structure looks clearer than
in Salford System. Use of Salford Predictive Modeler
data processing cases more detail above. From this
comparison, the solution of the problem more
complex the better use of Salford Systems. For the
case of small and simple, the use of the software is
more focused on Weka.
REFERENCES
[1] M. J. Berry, G. Linoff, Data Mining Techniques: For
Marketing, Sales, and Customer Support, New York: John
Wiley & Sons, Inc, 1997.
[2] A. Kumar, O. Singh, V. Rishiwal, R. K. Dwivedi, R. Kumar,
“Association Rule Mining On Web Logs For Extracting
Interesting Patterns Through Weka Tool,” International
Journal of Advanced Technology In Engineering And
Science, vol. 3, no. 1, pp. 134-140, 2015.
[3] D. T. Larose, Data Mining Methods and Models, Canada: A
John Wiley & Sons, Inc, 2006.
[4] C. D., Discovering Knowledge in Data: An Introduction to
Data Mining, Canada: John Wiley & Sons, 2014.
[5] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann,
I. H. Witten, “The WEKA Data Mining Software: An
Update,” SIGKDD Explorations, vol. 11, no. 1, pp. 10-18,
2015.
[6] T. Krishna, D. Vasumathi, “A Study of Mining Software
Engineering Data and Software Testing,” Journal of
Emerging Trends in Computing and Information Sciences,
vol. 2, no. 11, 2011.
[7] T. Silwattananusarn, A. D. KulthidaTuamsuk, “Data Mining
and Its Applications for Knowledge Management : A
Literature Review from 2007 to 2012,” International Journal
of Data Mining & Knowledge Management Process, vol. 2,
no. 5, 2012.
[8] S. Rajagopal, “Customer Data Clustering Using Data Mining
Technique,” International Journal of Database Management
Systems, vol. 3, no. 4, pp. 1-11, 2011.
[9] D. Tomar, S. Agarwal, “A survey on Data Mining approaches
for Healthcare,” International Journal of Bio-Science and
Bio-Technology, vol. 5, no. 5, pp. 241-266, 2013.
[10] A. P. U. Siahaan, “Various Patterns of Data Mining
Techniques,” 2011. [Online]. Available:
http://www.academia.edu/download/46339017/Various_Patte
rns_of_Data_Mining_Techniques.doc. [Access: 10 7 2016].
AUTHORS PROFILE
Wirda Fitriani was born in
Medan, Indonesia, in 1979.
She received the S.Kom.
degree in computer science
from Universitas
Pembangunan Panca Budi,
Medan, Indonesia, in 2005,
She joined the Department of Engineering, Universitas
Pembangunan Panca Budi, as a Lecturer in 2008, and
in 2014, she entered the post-graduate university,
AMIKOM, Yogyakarta. She is now studying for her
master degree. She has become a writer in several
conferences. She is still working as a System Support
at PT. Buana Varia Komputama, Medan.
Andysah Putera Utama
Siahaan was born in Medan,
Indonesia, in 1980. He
received the S.Kom. degree in
computer science from
Universitas Pembangunan
Panca Budi, Medan, Indonesia,
in 2010, and the M.Kom. in
computer science as well from
the University of Sumatera
Utara, Medan, Indonesia, in 2012. In 2010, he joined
the Department of Engineering, Universitas
Pembangunan Panca Budi, as a Lecturer, and in 2012
became a researcher. He is applying for his Ph.D.
degree in 2016. He has written several international
journals. He is now active in writing papers and
joining conferences.