This document summarizes a study analyzing sales data at PT. Panca Putra Solusindo using the Apriori and FP-Growth algorithms. The study aimed to determine the most frequently purchased electronic products to help develop marketing strategies. Transaction data from 2016 was analyzed using the Apriori and FP-Growth algorithms in Weka software. The results showed the top selling items, with PCs and HP notebooks having the highest support over 46% and 30%. The algorithms helped identify best-selling products to optimize promotions.
IRJET- Stock Market Prediction using Financial News ArticlesIRJET Journal
This document presents a system for predicting stock prices using financial news articles and machine learning. It first extracts data from trusted news sources and cleans it using natural language processing. Sentiment analysis determines if the news is positive or negative. This polarity is input to a machine learning model, which uses an algorithm like linear regression to predict if a stock's price will increase in the next 10 minutes. The system was tested on a dataset divided into 70% for training and 30% for testing. Accuracy, specificity and precision metrics showed the system could successfully predict stock price movements based on analyzing financial news sentiment.
This document summarizes an article that proposes a new algorithm for efficiently mining both positive and negative association rules from transactional databases. The algorithm first constructs a frequent pattern tree (FP-tree) to store the transaction information. It then uses an FP-growth approach to iteratively find frequent patterns and generate the positive and negative association rules without candidate generation. The algorithm aims to overcome limitations of previous methods and efficiently find all valid comparative association rules.
The disruptometer: an artificial intelligence algorithm for market insightsjournalBEEI
Social media data mining is rapidly developing to be a mainstream tool for marketing insights in today’s world, due to the abundance of data and often freely accessed information. In this paper, we propose a framework for market research purposes called the Disruptometer. The algorithm uses keywords to provide different types of market insights from data crawling. The preliminary algorithm data-mines information from Twitter and outputs 2 parameters-Product-to-Market Fit and Disruption Quotient, which is obtained from a brand’s customer value proposition, problem space, and incumbent space. The algorithm has been tested with a venture capitalist portfolio company and market research firm to show high correlated results. Out of 4 brand use cases, 3 obtained identical results with the
analysts ‘studies.
E-commerce online review for detecting influencing factors users perceptionjournalBEEI
To date, online shopping using e-commerce services becomes a trend. The emergence of e-commerce truly helps people to shop more effectively and efficiently. However, there are still some problems encountered in e-commerce, especially from the user perspective. This research aims to explore user review data, particularly on factors that influence user perception of e-commerce applications, classify, and identify potential solutions to finding problems in e-commerce applications. Data is grabbed using web scraping techniques and classified using proper machine learning, i.e., support vector machine (SVM). Text associations and fishbone analysis are performed based on the classified user review data. The results of this study show that the user satisfaction problem can be captured. Furthermore, various services that should be provided as a potential solution to experienced customers' problems or application users' perception problems can be generated. A detailed discussion of these findings is available in this article.
Lexalytics Text Analytics Workshop: Perfect Text AnalyticsLexalytics
This document summarizes and promotes the text analytics capabilities of Perfect Text Analytics. It discusses how Perfect is fast, usable, consistent, provides new knowledge, is inclusive of all text, and is trainable. Customer use cases are presented in reputation management, politics, market intelligence, hospitality, financial services, pharma, and opinion mining. The document outlines planned enhancements over the next year, including sarcasm detection, foreign language support, and more customizable tools. Overall, it argues that text analytics can provide valuable insights across many industries when combined with business logic.
This document summarizes several research papers on improving recommendation systems using item-based collaborative filtering approaches. It discusses challenges with traditional collaborative filtering like data sparsity and scalability. It then summarizes various item-based recommendation algorithms that analyze item relationships to indirectly compute recommendations. These include item-item similarity techniques like cosine similarity and adjusting for average ratings. The document also reviews literature on combining item categories and interestingness measures to improve accuracy. Overall, it analyzes different item-based collaborative filtering techniques to address challenges and provide high quality, personalized recommendations at scale.
This document summarizes strategies for improving the effectiveness of marketing and sales discussed in various research papers. It first introduces common data mining algorithms like Apriori, FP-growth, and Eclat that are used to find frequently purchased item sets by analyzing customer transaction data. This information can then be used to better target products to customers. The document then summarizes 11 research papers that examine techniques like sentiment analysis of reviews to predict sales, using price indexes to understand market impacts, and designing marketing management systems. The goal of these strategies is to enhance customer relationships and make more informed decisions about product placement, pricing, and promotions.
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...IRJET Journal
This document proposes using Twitter sentiment analysis and an LSTM neural network to predict election results. It involves collecting tweets related to political parties and candidates in India, cleaning the data, training an LSTM classifier on labeled tweets, and using the trained model to classify tweets as positive, negative or neutral sentiment and compare sentiment levels for each candidate. The goal is to analyze public sentiment expressed on Twitter and how it correlates with actual election outcomes.
IRJET- Stock Market Prediction using Financial News ArticlesIRJET Journal
This document presents a system for predicting stock prices using financial news articles and machine learning. It first extracts data from trusted news sources and cleans it using natural language processing. Sentiment analysis determines if the news is positive or negative. This polarity is input to a machine learning model, which uses an algorithm like linear regression to predict if a stock's price will increase in the next 10 minutes. The system was tested on a dataset divided into 70% for training and 30% for testing. Accuracy, specificity and precision metrics showed the system could successfully predict stock price movements based on analyzing financial news sentiment.
This document summarizes an article that proposes a new algorithm for efficiently mining both positive and negative association rules from transactional databases. The algorithm first constructs a frequent pattern tree (FP-tree) to store the transaction information. It then uses an FP-growth approach to iteratively find frequent patterns and generate the positive and negative association rules without candidate generation. The algorithm aims to overcome limitations of previous methods and efficiently find all valid comparative association rules.
The disruptometer: an artificial intelligence algorithm for market insightsjournalBEEI
Social media data mining is rapidly developing to be a mainstream tool for marketing insights in today’s world, due to the abundance of data and often freely accessed information. In this paper, we propose a framework for market research purposes called the Disruptometer. The algorithm uses keywords to provide different types of market insights from data crawling. The preliminary algorithm data-mines information from Twitter and outputs 2 parameters-Product-to-Market Fit and Disruption Quotient, which is obtained from a brand’s customer value proposition, problem space, and incumbent space. The algorithm has been tested with a venture capitalist portfolio company and market research firm to show high correlated results. Out of 4 brand use cases, 3 obtained identical results with the
analysts ‘studies.
E-commerce online review for detecting influencing factors users perceptionjournalBEEI
To date, online shopping using e-commerce services becomes a trend. The emergence of e-commerce truly helps people to shop more effectively and efficiently. However, there are still some problems encountered in e-commerce, especially from the user perspective. This research aims to explore user review data, particularly on factors that influence user perception of e-commerce applications, classify, and identify potential solutions to finding problems in e-commerce applications. Data is grabbed using web scraping techniques and classified using proper machine learning, i.e., support vector machine (SVM). Text associations and fishbone analysis are performed based on the classified user review data. The results of this study show that the user satisfaction problem can be captured. Furthermore, various services that should be provided as a potential solution to experienced customers' problems or application users' perception problems can be generated. A detailed discussion of these findings is available in this article.
Lexalytics Text Analytics Workshop: Perfect Text AnalyticsLexalytics
This document summarizes and promotes the text analytics capabilities of Perfect Text Analytics. It discusses how Perfect is fast, usable, consistent, provides new knowledge, is inclusive of all text, and is trainable. Customer use cases are presented in reputation management, politics, market intelligence, hospitality, financial services, pharma, and opinion mining. The document outlines planned enhancements over the next year, including sarcasm detection, foreign language support, and more customizable tools. Overall, it argues that text analytics can provide valuable insights across many industries when combined with business logic.
This document summarizes several research papers on improving recommendation systems using item-based collaborative filtering approaches. It discusses challenges with traditional collaborative filtering like data sparsity and scalability. It then summarizes various item-based recommendation algorithms that analyze item relationships to indirectly compute recommendations. These include item-item similarity techniques like cosine similarity and adjusting for average ratings. The document also reviews literature on combining item categories and interestingness measures to improve accuracy. Overall, it analyzes different item-based collaborative filtering techniques to address challenges and provide high quality, personalized recommendations at scale.
This document summarizes strategies for improving the effectiveness of marketing and sales discussed in various research papers. It first introduces common data mining algorithms like Apriori, FP-growth, and Eclat that are used to find frequently purchased item sets by analyzing customer transaction data. This information can then be used to better target products to customers. The document then summarizes 11 research papers that examine techniques like sentiment analysis of reviews to predict sales, using price indexes to understand market impacts, and designing marketing management systems. The goal of these strategies is to enhance customer relationships and make more informed decisions about product placement, pricing, and promotions.
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...IRJET Journal
This document proposes using Twitter sentiment analysis and an LSTM neural network to predict election results. It involves collecting tweets related to political parties and candidates in India, cleaning the data, training an LSTM classifier on labeled tweets, and using the trained model to classify tweets as positive, negative or neutral sentiment and compare sentiment levels for each candidate. The goal is to analyze public sentiment expressed on Twitter and how it correlates with actual election outcomes.
Recommendation based on Clustering and Association RulesIJARIIE JOURNAL
This document proposes a hybrid recommendation framework that uses both clustering and association rule mining. It first clusters users using the k-means algorithm to group similar users together. It then applies the Eclat algorithm to generate frequent itemsets and association rules from the user-item data. These association rules are used to make recommendations to users. The framework aims to address issues like sparsity and cold start problems. An experimental study compares the performance of the Eclat and Apriori algorithms on recommendation quality and execution time. The proposed approach integrates clustering and association rule mining to provide more accurate recommendations.
This is an introduction to text analytics for advanced business users and IT professionals with limited programming expertise. The presentation will go through different areas of text analytics as well as provide some real work examples that help to make the subject matter a little more relatable. We will cover topics like search engine building, categorization (supervised and unsupervised), clustering, NLP, and social media analysis.
Opinion mining on newspaper headlines using SVM and NLPIJECEIAES
Opinion Mining also known as Sentiment Analysis, is a technique or procedure which uses Natural Language processing (NLP) to classify the outcome from text. There are various NLP tools available which are used for processing text data. Multiple research have been done in opinion mining for online blogs, Twitter, Facebook etc. This paper proposes a new opinion mining technique using Support Vector Machine (SVM) and NLP tools on newspaper headlines. Relative words are generated using Stanford CoreNLP, which is passed to SVM using count vectorizer. On comparing three models using confusion matrix, results indicate that Tf-idf and Linear SVM provides better accuracy for smaller dataset. While for larger dataset, SGD and linear SVM model outperform other models.
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET Journal
This document summarizes research on sentiment polarity analysis of Twitter data from different events. It discusses how Twitter data can be used for opinion mining and sentiment analysis. Several papers that used techniques like naive Bayes classifier, support vector machines, and dual sentiment analysis on Twitter data are summarized. The document also provides an overview of the key steps involved in a Twitter sentiment analysis system, including data collection, preprocessing, feature extraction, training a classification model, and evaluating accuracy. The goal of analyzing sentiments on Twitter is to understand public opinions on different topics and events.
This document summarizes research on improving the Apriori algorithm for mining association rules from transactional databases. It first provides background on association rule mining and describes the basic Apriori algorithm. The Apriori algorithm finds frequent itemsets by multiple passes over the database but has limitations of increased search space and computational costs as the database size increases. The document then reviews research on variations of the Apriori algorithm that aim to reduce the number of database scans, shrink the candidate sets, and facilitate support counting to improve performance.
An Introduction to Text Analytics: 2013 Workshop presentationSeth Grimes
This document provides an introduction to text analytics. It discusses perspectives on text analytics from different roles like IT support, researchers, and solution providers. It explains how text analytics can boost business results by analyzing unstructured text data from sources like emails, social media, surveys etc. It discusses how text analytics transforms information retrieval to information access by extracting semantics, entities, topics and relationships from text. It also provides definitions and explanations of key concepts in text analytics like entities, features, metadata, natural language processing, information extraction, categorization, classification and evaluation metrics.
Automation Tool Development to Improve Machine Results using Data AnalysisIRJET Journal
This document discusses the development of an automation tool using Python to improve the accuracy of machine-generated results from analyzing merchant datasets. The tool uses concepts like the confusion matrix to evaluate the precision, recall, and accuracy of the machine's predictions. It analyzes the merchant descriptions and identifiers using patterns and partial matches to determine correct and incorrect predictions. When run on sample merchant data, the tool was able to improve the precision, recall, and accuracy scores compared to the machine alone. The tool demonstrates how data analysis techniques in Python can help optimize machine learning models.
Prediction of Default Customer in Banking Sector using Artificial Neural Networkrahulmonikasharma
The aim of this article is to present perdition and risk accuracy analysis of default customer in the banking sector. The neural network is a learning model inspired by biological neuron it is used to estimate and predict that can depend on a large number of inputs. The bank customer dataset from UCI repository, used for data analysis method to extract informative data set from a large volume of the dataset. This dataset is used in the neural network for training data and testing data. In a training of data, the data set is iterated till the desired output. This training data is cross check with test data. This paper focuses on predicting default customer by using deep learning neural network (DNN) algorithm.
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm IJECEIAES
Social media provides convenience in communicating and can present twoway communication that allows companies to interact with their customer. Companies can use information obtained from social media to analyze how the communities respond to their services or products. The biggest challenge in processing information in social media like Twitter, is the unstructured sentences which could lead to incorrect text processing. However, this information is very important for companies’ survival. In this research, we proposed a method to extract keywords from tweets in Indonesian language, WPKE. We compared it with RAKE, an algorithm that is language independent and usually used for keyword extraction. Finally, we develop a method to do clustering to groups the topics of complaints with data set obtained from Twitter using the “komplain” hashtag. Our method can obtain the accuracy of 72.92% while RAKE can only obtain 35.42%.
This document summarizes a research paper that proposes using a genetic algorithm to generate high-quality association rules for measuring data quality. The genetic algorithm evaluates rules based on four metrics: confidence, completeness, comprehensibility, and interestingness. It aims to discover high-level prediction rules that perform better than traditional greedy rule induction algorithms at handling attribute interactions. The genetic algorithm represents rules as chromosomes and uses the four metrics as an objective fitness function to evaluate the quality of each rule.
The document proposes a method to recommend users on Q&A sites who are most likely to correctly answer questions. It involves:
1) Classifying questions into tags using logistic regression and SVM models trained on historical data.
2) Calculating a weighted score for each user based on past answer performance for each tag.
3) Recommending top users for tags identified in step 1 as most likely to answer new questions correctly. Experimental results showed this approach worked better for common tags with more training data, while rare tags remained inaccurate to classify. Future work is needed to improve recommendations and user experience.
IRJET- Fake News Detection using Logistic RegressionIRJET Journal
1) The document discusses a study that uses logistic regression to classify news articles as real or fake. It outlines the methodology which includes data preprocessing, feature extraction using bag-of-words and TF-IDF, and using a logistic regression classifier to predict fake news.
2) The model achieved an accuracy of approximately 72% at classifying news as real or fake when using TF-IDF features and logistic regression.
3) The study aims to address the growing issue of fake news proliferation online by developing a computational method for identifying unreliable news sources.
Twitter’s Sentiment Analysis on Gsm Services using Multinomial Naïve BayesTELKOMNIKA JOURNAL
Telecommunication users are rapidly growing each year. As people keep demanding a better service level of Short Message Service (SMS), telephone or data use, service providers compete to attract their customer, while customer feedbacks in some platforms, for example Twitter, are their souce of information. Multinomial Naïve Bayes Tree, adapted from the method of Multinomial Naïve Bayes and Decision Tree, is one technique in data mining used to classify the raw data or feedback from customers.Multinomial Naïve Bayes method used specifically addressing frequency in the text of the sentence or document. Documents used in this study are comments of Twitter users on the GSM telecommunications provider in Indonesia.This research employed Multinomial Naïve Bayes Tree classification technique to categorize customers sentiment opinion towards telecommunication providers in Indonesia. Sentiment analysis only included the class of positive, negative and neutral. This research generated a Decision Tree roots in the feature "aktif" in which the probability of the feature "aktif" was from positive class in Multinomial Naive Bayes method. The evaluation showed that the highest accuracy of classification using Multinomial Naïve Bayes Tree (MNBTree) method was 16.26% using 145 features. Moreover, the Multinomial Naïve Bayes (MNB) yielded the highest accuracy of 73,15% by using all dataset of 1665 features. The expected benefits in this research are that the Indonesian telecommunications provider can evaluate the performance and services to reach customer satisfaction of various needs.
Text analytics is used to extract structured data from unstructured text sources like social media posts, reviews, emails and call center notes. It involves acquiring and preparing text data, processing and analyzing it using algorithms like decision trees, naive bayes, support vector machines and k-nearest neighbors to extract terms, entities, concepts and sentiment. The results are then visualized to support data-driven decision making for applications like measuring customer opinions and providing search capabilities. Popular tools for text analytics include RapidMiner, KNIME, SPSS and R.
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...IOSR Journals
This document discusses an optimal approach to derive disjunctive positive and negative association rules from association rule mining using a genetic algorithm. It aims to address some shortfalls of conventional algorithms like Apriori by supporting disjunctive rules, using multiple minimum support thresholds, and effectively identifying negative rules. The proposed approach uses a modified FP-Growth algorithm and genetic algorithm to generate conjunctive and disjunctive positive and negative rules in an optimized manner by reducing candidate generation time and capturing useful rare item relationships.
This document outlines the aim, objectives, scope, and structure of a dissertation on using genetic programming to optimize and combine K nearest neighbor classifiers for intrusion detection. The aim is to use genetic programming with the KDD Cup 1999 dataset to develop a numeric classifier that shows improved performance over individual KNN classifiers. The objectives are to determine if a GP-based numeric classifier outperforms individual KNN classifiers, if GP combination techniques produce higher performance than KNN component classifiers, and if heterogeneous KNN classifier combination performs better than homogeneous combination. The document describes the methodology that will be used, including developing an optimal KNN classifier using fitness evaluation in the first phase and combining optimal KNN classifiers based on ROC curves in the second phase.
A Trinity Construction for Web Extraction Using Efficient AlgorithmIOSR Journals
This document describes a proposed system for web extraction using a trinity construction and efficient algorithms. It begins with an abstract discussing how trinity characteristics can be used to automatically extract content from websites in a sequential tree structure. It then discusses the existing system which uses trinity tree and prefix/suffix sorting but has limitations. The proposed system introduces fuzzy logic for multi-perspective crawling across multiple websites. A genetic algorithm is used to load extracted content into the trinity structure and remove unwanted data. Finally, an ant colony optimization algorithm is used to obtain an effective structure and suggest optimized solutions.
In the recent years the scope of data mining has evolved into an active area of research because of the previously unknown and interesting knowledge from very large database collection. The data mining is applied on a variety of applications in multiple domains like in business, IT and many more sectors. In Data Mining the major problem which receives great attention by the community is the classification of the data. The classification of data should be such that it could be they can be easily verified and should be easily interpreted by the humans. In this paper we would be studying various data mining techniques so that we can find few combinations for enhancing the hybrid technique which would be having multiple techniques involved so enhance the usability of the application. We would be studying CHARM Algorithm, CM-SPAM Algorithm, Apriori Algorithm, MOPNAR Algorithm and the Top K Rules.
IRJET- Improving the Performance of Smart Heterogeneous Big DataIRJET Journal
This document discusses improving the performance of smart heterogeneous big data. It begins by defining key concepts like big data, data mining, and the challenges of analyzing large, complex datasets. It then describes two common association rule mining algorithms - Apriori and FP-Growth - that are used to extract patterns from big data. The document proposes using principal component analysis as a feature selection method to improve the performance of these algorithms. It finds that this proposed approach reduces execution time compared to the original algorithms when processing big data.
This document compares and analyzes the Apriori and FP-Growth algorithms for finding strategic area patterns for campus introduction case studies at STKIP ADZKIA Padang. It analyzes data from the 2015/2016 student cohort with minimum support of 0.05% and minimum confidence of 0.7% to obtain 19 association rules and the 2 highest rules that can be used as new knowledge and a valuable reference. The document first provides background on the problem, literature review on data mining and association rule algorithms, research methodology using Tanagra 1.4.50 and RapidMiner 7.0.001, data analysis using both algorithms, and conclusions comparing the results.
Association Rules Analysis using FP Growth Algorithm to Make Product Recommen...ijtsrd
Companies usually have historical data on sales transactions from month to month, but unfortunately, they are only used as weekly and monthly reports. If it is allowed to continue for longer, there will be data growth which results in data richness but poor information. At the same time, companies often still use manual methods in their product marketing strategies that have no reference and are only based on estimates. One of them is the X Fashion Store that sells various local fashions. X Fashion Store has not used data to develop their marketing strategy. This study conducted an association rules analysis to develop a sales strategy. Sales transaction data used is data for December 2020 with a minimum value of support of 25 and a minimum value of confidence of 80 by processing data using Rapidminer application. FP Growth algorithm can produce association rules as a reference in product promotion and decision support in providing product recommendations to consumers based on predetermined minimum support and confidence values. The association rule result with the highest lift ratio is 10.51. Ni Putu Priyastini Dessy Safitri "Association Rules Analysis using FP-Growth Algorithm to Make Product Recommendations for Customer" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-2 , February 2021, URL: https://www.ijtsrd.com/papers/ijtsrd38459.pdf Paper Url: https://www.ijtsrd.com/computer-science/data-miining/38459/association-rules-analysis-using-fpgrowth-algorithm-to-make-product-recommendations-for-customer/ni-putu-priyastini-dessy-safitri
Recommendation based on Clustering and Association RulesIJARIIE JOURNAL
This document proposes a hybrid recommendation framework that uses both clustering and association rule mining. It first clusters users using the k-means algorithm to group similar users together. It then applies the Eclat algorithm to generate frequent itemsets and association rules from the user-item data. These association rules are used to make recommendations to users. The framework aims to address issues like sparsity and cold start problems. An experimental study compares the performance of the Eclat and Apriori algorithms on recommendation quality and execution time. The proposed approach integrates clustering and association rule mining to provide more accurate recommendations.
This is an introduction to text analytics for advanced business users and IT professionals with limited programming expertise. The presentation will go through different areas of text analytics as well as provide some real work examples that help to make the subject matter a little more relatable. We will cover topics like search engine building, categorization (supervised and unsupervised), clustering, NLP, and social media analysis.
Opinion mining on newspaper headlines using SVM and NLPIJECEIAES
Opinion Mining also known as Sentiment Analysis, is a technique or procedure which uses Natural Language processing (NLP) to classify the outcome from text. There are various NLP tools available which are used for processing text data. Multiple research have been done in opinion mining for online blogs, Twitter, Facebook etc. This paper proposes a new opinion mining technique using Support Vector Machine (SVM) and NLP tools on newspaper headlines. Relative words are generated using Stanford CoreNLP, which is passed to SVM using count vectorizer. On comparing three models using confusion matrix, results indicate that Tf-idf and Linear SVM provides better accuracy for smaller dataset. While for larger dataset, SGD and linear SVM model outperform other models.
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET Journal
This document summarizes research on sentiment polarity analysis of Twitter data from different events. It discusses how Twitter data can be used for opinion mining and sentiment analysis. Several papers that used techniques like naive Bayes classifier, support vector machines, and dual sentiment analysis on Twitter data are summarized. The document also provides an overview of the key steps involved in a Twitter sentiment analysis system, including data collection, preprocessing, feature extraction, training a classification model, and evaluating accuracy. The goal of analyzing sentiments on Twitter is to understand public opinions on different topics and events.
This document summarizes research on improving the Apriori algorithm for mining association rules from transactional databases. It first provides background on association rule mining and describes the basic Apriori algorithm. The Apriori algorithm finds frequent itemsets by multiple passes over the database but has limitations of increased search space and computational costs as the database size increases. The document then reviews research on variations of the Apriori algorithm that aim to reduce the number of database scans, shrink the candidate sets, and facilitate support counting to improve performance.
An Introduction to Text Analytics: 2013 Workshop presentationSeth Grimes
This document provides an introduction to text analytics. It discusses perspectives on text analytics from different roles like IT support, researchers, and solution providers. It explains how text analytics can boost business results by analyzing unstructured text data from sources like emails, social media, surveys etc. It discusses how text analytics transforms information retrieval to information access by extracting semantics, entities, topics and relationships from text. It also provides definitions and explanations of key concepts in text analytics like entities, features, metadata, natural language processing, information extraction, categorization, classification and evaluation metrics.
Automation Tool Development to Improve Machine Results using Data AnalysisIRJET Journal
This document discusses the development of an automation tool using Python to improve the accuracy of machine-generated results from analyzing merchant datasets. The tool uses concepts like the confusion matrix to evaluate the precision, recall, and accuracy of the machine's predictions. It analyzes the merchant descriptions and identifiers using patterns and partial matches to determine correct and incorrect predictions. When run on sample merchant data, the tool was able to improve the precision, recall, and accuracy scores compared to the machine alone. The tool demonstrates how data analysis techniques in Python can help optimize machine learning models.
Prediction of Default Customer in Banking Sector using Artificial Neural Networkrahulmonikasharma
The aim of this article is to present perdition and risk accuracy analysis of default customer in the banking sector. The neural network is a learning model inspired by biological neuron it is used to estimate and predict that can depend on a large number of inputs. The bank customer dataset from UCI repository, used for data analysis method to extract informative data set from a large volume of the dataset. This dataset is used in the neural network for training data and testing data. In a training of data, the data set is iterated till the desired output. This training data is cross check with test data. This paper focuses on predicting default customer by using deep learning neural network (DNN) algorithm.
Complaint Analysis in Indonesian Language Using WPKE and RAKE Algorithm IJECEIAES
Social media provides convenience in communicating and can present twoway communication that allows companies to interact with their customer. Companies can use information obtained from social media to analyze how the communities respond to their services or products. The biggest challenge in processing information in social media like Twitter, is the unstructured sentences which could lead to incorrect text processing. However, this information is very important for companies’ survival. In this research, we proposed a method to extract keywords from tweets in Indonesian language, WPKE. We compared it with RAKE, an algorithm that is language independent and usually used for keyword extraction. Finally, we develop a method to do clustering to groups the topics of complaints with data set obtained from Twitter using the “komplain” hashtag. Our method can obtain the accuracy of 72.92% while RAKE can only obtain 35.42%.
This document summarizes a research paper that proposes using a genetic algorithm to generate high-quality association rules for measuring data quality. The genetic algorithm evaluates rules based on four metrics: confidence, completeness, comprehensibility, and interestingness. It aims to discover high-level prediction rules that perform better than traditional greedy rule induction algorithms at handling attribute interactions. The genetic algorithm represents rules as chromosomes and uses the four metrics as an objective fitness function to evaluate the quality of each rule.
The document proposes a method to recommend users on Q&A sites who are most likely to correctly answer questions. It involves:
1) Classifying questions into tags using logistic regression and SVM models trained on historical data.
2) Calculating a weighted score for each user based on past answer performance for each tag.
3) Recommending top users for tags identified in step 1 as most likely to answer new questions correctly. Experimental results showed this approach worked better for common tags with more training data, while rare tags remained inaccurate to classify. Future work is needed to improve recommendations and user experience.
IRJET- Fake News Detection using Logistic RegressionIRJET Journal
1) The document discusses a study that uses logistic regression to classify news articles as real or fake. It outlines the methodology which includes data preprocessing, feature extraction using bag-of-words and TF-IDF, and using a logistic regression classifier to predict fake news.
2) The model achieved an accuracy of approximately 72% at classifying news as real or fake when using TF-IDF features and logistic regression.
3) The study aims to address the growing issue of fake news proliferation online by developing a computational method for identifying unreliable news sources.
Twitter’s Sentiment Analysis on Gsm Services using Multinomial Naïve BayesTELKOMNIKA JOURNAL
Telecommunication users are rapidly growing each year. As people keep demanding a better service level of Short Message Service (SMS), telephone or data use, service providers compete to attract their customer, while customer feedbacks in some platforms, for example Twitter, are their souce of information. Multinomial Naïve Bayes Tree, adapted from the method of Multinomial Naïve Bayes and Decision Tree, is one technique in data mining used to classify the raw data or feedback from customers.Multinomial Naïve Bayes method used specifically addressing frequency in the text of the sentence or document. Documents used in this study are comments of Twitter users on the GSM telecommunications provider in Indonesia.This research employed Multinomial Naïve Bayes Tree classification technique to categorize customers sentiment opinion towards telecommunication providers in Indonesia. Sentiment analysis only included the class of positive, negative and neutral. This research generated a Decision Tree roots in the feature "aktif" in which the probability of the feature "aktif" was from positive class in Multinomial Naive Bayes method. The evaluation showed that the highest accuracy of classification using Multinomial Naïve Bayes Tree (MNBTree) method was 16.26% using 145 features. Moreover, the Multinomial Naïve Bayes (MNB) yielded the highest accuracy of 73,15% by using all dataset of 1665 features. The expected benefits in this research are that the Indonesian telecommunications provider can evaluate the performance and services to reach customer satisfaction of various needs.
Text analytics is used to extract structured data from unstructured text sources like social media posts, reviews, emails and call center notes. It involves acquiring and preparing text data, processing and analyzing it using algorithms like decision trees, naive bayes, support vector machines and k-nearest neighbors to extract terms, entities, concepts and sentiment. The results are then visualized to support data-driven decision making for applications like measuring customer opinions and providing search capabilities. Popular tools for text analytics include RapidMiner, KNIME, SPSS and R.
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...IOSR Journals
This document discusses an optimal approach to derive disjunctive positive and negative association rules from association rule mining using a genetic algorithm. It aims to address some shortfalls of conventional algorithms like Apriori by supporting disjunctive rules, using multiple minimum support thresholds, and effectively identifying negative rules. The proposed approach uses a modified FP-Growth algorithm and genetic algorithm to generate conjunctive and disjunctive positive and negative rules in an optimized manner by reducing candidate generation time and capturing useful rare item relationships.
This document outlines the aim, objectives, scope, and structure of a dissertation on using genetic programming to optimize and combine K nearest neighbor classifiers for intrusion detection. The aim is to use genetic programming with the KDD Cup 1999 dataset to develop a numeric classifier that shows improved performance over individual KNN classifiers. The objectives are to determine if a GP-based numeric classifier outperforms individual KNN classifiers, if GP combination techniques produce higher performance than KNN component classifiers, and if heterogeneous KNN classifier combination performs better than homogeneous combination. The document describes the methodology that will be used, including developing an optimal KNN classifier using fitness evaluation in the first phase and combining optimal KNN classifiers based on ROC curves in the second phase.
A Trinity Construction for Web Extraction Using Efficient AlgorithmIOSR Journals
This document describes a proposed system for web extraction using a trinity construction and efficient algorithms. It begins with an abstract discussing how trinity characteristics can be used to automatically extract content from websites in a sequential tree structure. It then discusses the existing system which uses trinity tree and prefix/suffix sorting but has limitations. The proposed system introduces fuzzy logic for multi-perspective crawling across multiple websites. A genetic algorithm is used to load extracted content into the trinity structure and remove unwanted data. Finally, an ant colony optimization algorithm is used to obtain an effective structure and suggest optimized solutions.
In the recent years the scope of data mining has evolved into an active area of research because of the previously unknown and interesting knowledge from very large database collection. The data mining is applied on a variety of applications in multiple domains like in business, IT and many more sectors. In Data Mining the major problem which receives great attention by the community is the classification of the data. The classification of data should be such that it could be they can be easily verified and should be easily interpreted by the humans. In this paper we would be studying various data mining techniques so that we can find few combinations for enhancing the hybrid technique which would be having multiple techniques involved so enhance the usability of the application. We would be studying CHARM Algorithm, CM-SPAM Algorithm, Apriori Algorithm, MOPNAR Algorithm and the Top K Rules.
IRJET- Improving the Performance of Smart Heterogeneous Big DataIRJET Journal
This document discusses improving the performance of smart heterogeneous big data. It begins by defining key concepts like big data, data mining, and the challenges of analyzing large, complex datasets. It then describes two common association rule mining algorithms - Apriori and FP-Growth - that are used to extract patterns from big data. The document proposes using principal component analysis as a feature selection method to improve the performance of these algorithms. It finds that this proposed approach reduces execution time compared to the original algorithms when processing big data.
This document compares and analyzes the Apriori and FP-Growth algorithms for finding strategic area patterns for campus introduction case studies at STKIP ADZKIA Padang. It analyzes data from the 2015/2016 student cohort with minimum support of 0.05% and minimum confidence of 0.7% to obtain 19 association rules and the 2 highest rules that can be used as new knowledge and a valuable reference. The document first provides background on the problem, literature review on data mining and association rule algorithms, research methodology using Tanagra 1.4.50 and RapidMiner 7.0.001, data analysis using both algorithms, and conclusions comparing the results.
Association Rules Analysis using FP Growth Algorithm to Make Product Recommen...ijtsrd
Companies usually have historical data on sales transactions from month to month, but unfortunately, they are only used as weekly and monthly reports. If it is allowed to continue for longer, there will be data growth which results in data richness but poor information. At the same time, companies often still use manual methods in their product marketing strategies that have no reference and are only based on estimates. One of them is the X Fashion Store that sells various local fashions. X Fashion Store has not used data to develop their marketing strategy. This study conducted an association rules analysis to develop a sales strategy. Sales transaction data used is data for December 2020 with a minimum value of support of 25 and a minimum value of confidence of 80 by processing data using Rapidminer application. FP Growth algorithm can produce association rules as a reference in product promotion and decision support in providing product recommendations to consumers based on predetermined minimum support and confidence values. The association rule result with the highest lift ratio is 10.51. Ni Putu Priyastini Dessy Safitri "Association Rules Analysis using FP-Growth Algorithm to Make Product Recommendations for Customer" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-2 , February 2021, URL: https://www.ijtsrd.com/papers/ijtsrd38459.pdf Paper Url: https://www.ijtsrd.com/computer-science/data-miining/38459/association-rules-analysis-using-fpgrowth-algorithm-to-make-product-recommendations-for-customer/ni-putu-priyastini-dessy-safitri
This document describes a Multilevel Relationship Algorithm (MRA) for improving association rule mining. MRA works in three stages: 1) It uses an Apriori algorithm to find level 1 associations between items within individual shops. 2) It uses the level 1 associations to find frequent itemsets across shops. 3) It uses Bayesian probability to determine dependencies between items across shops and generate learning rules. The algorithm aims to discover relationships between sales data from different shops to gain insights for business decisions.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Data Mining is an important aspect for any business. Most of the management level decisions are based on the process of Data Mining. One of such aspect is the association between different sale products i.e. what is the actual support of a product respected to the other product. This concept is called Association Mining. According to this concept we define the process of estimating the sale of one product respective to the other product. We are proposing an association rule based on the concept of Hardware support. In this concept we first maintain the database and compare it with systolic array after this a pruning process is being performed to filter the database and to remove the rarely used items. Finally the data is indexed according to hashing technique and the decision is performed in terms of support count. Krishan Rohilla | Shabnam Kumari | Reema"Data Mining based on Hashing Technique" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-4 , June 2017, URL: http://www.ijtsrd.com/papers/ijtsrd82.pdf http://www.ijtsrd.com/computer-science/data-miining/82/data-mining-based-on-hashing-technique/krishan-rohilla
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351IRJET Journal
This document summarizes a survey on the FP-Growth algorithm for association rule mining. It discusses how FP-Growth is an effective algorithm that examines the transaction database only twice to generate frequent itemsets, unlike other algorithms. The document also reviews other association rule mining techniques like Apriori and discusses how FP-Growth avoids generating numerous conditional FP-trees.
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...ijsrd.com
Data mining can be defined as the process of uncovering hidden patterns in random data that are potentially useful. The discovery of interesting association relationships among large amounts of business transactions is currently vital for making appropriate business decisions. Association rule analysis is the task of discovering association rules that occur frequently in a given transaction data set. Its task is to find certain relationships among a set of data (itemset) in the database. It has two measurements: Support and confidence values. Confidence value is a measure of rule’s strength, while support value corresponds to statistical significance. There are currently a variety of algorithms to discover association rules. Some of these algorithms depend on the use of minimum support to weed out the uninteresting rules. Other algorithms look for highly correlated items, that is, rules with high confidence. Traditional association rule mining techniques employ predefined support and confidence values. However, specifying minimum support value of the mined rules in advance often leads to either too many or too few rules, which negatively impacts the performance of the overall system. This work proposes a way to efficiently mine association rules over dynamic databases using Dynamic Matrix Apriori technique and Multiple Support Apriori (MSApriori). A modification for Matrix Apriori algorithm to accommodate this modification is proposed. Experiments on large set of data bases have been conducted to validate the proposed framework. The achieved results show that there is a remarkable improvement in the overall performance of the system in terms of run time, the number of generated rules, and number of frequent items used.
IRJET- Online Sequential Behaviour Analysis using Apriori AlgorithmIRJET Journal
This document proposes using the Apriori algorithm to perform online sequential behavior analysis for recommendation systems. It summarizes that the Apriori algorithm can be used to find frequently purchased item sets in customer transaction data and those frequent items can then be recommended. It outlines how the Apriori algorithm works in an iterative manner, generating itemsets of increasing size and calculating their support and confidence. The proposed system would analyze customer browsing and purchase histories to generate product recommendations using this approach.
Effieient Algorithms to Find Frequent Itemset using Data MiningIRJET Journal
The document proposes a new differentially private frequent itemset mining (FIM) algorithm called PFP-growth that is based on the FP-growth algorithm. It aims to achieve high data utility, privacy, and time efficiency. PFP-growth has two phases: a preprocessing phase and a mining phase. The preprocessing phase transforms the database using a novel "smart splitting" technique to improve privacy and utility with minimal information loss. The mining phase uses a runtime estimation method to offset information lost during splitting and dynamically reduce noise added for privacy guarantees. Experimental results on real datasets show PFP-growth can efficiently find frequent itemsets while maintaining good privacy and utility.
Comparison Between High Utility Frequent Item sets Mining Techniquesijsrd.com
Data Mining can be defined as an activity that extracts some new nontrivial information contained in large databases. Traditional data mining techniques have focused largely on detecting the statistical correlations between the items that are more frequent in the transaction databases. Also termed as frequent itemsets mining, these techniques were based on the rationale that itemsets which appear more frequently must be of more importance to the user from the business perspective .In this paper we throw light upon an emerging area called Utility Mining which not only considers the frequency of the itemsets but also considers the utility associated with the itemsets. The term utility refers to the importance or the usefulness of the appearance of the itemset in transactions quantified in terms like profit , sales or any other user preferences. This paper presents a novel efficient algorithm FUFM (Fast Utility-Frequent Mining) which finds all utility-frequent itemsets within the given utility and support constraints threshold. It is faster and simpler than the original 2P-UF algorithm (2 Phase Utility-Frequent), as it is based on efficient methods for frequent itemset mining. Experimental evaluation on artificial datasets shown here, in contrast with 2P-UF, our algorithm can also be applied to mine large databases.
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...IJDKP
This document describes web-based data mining tools for performing association rule mining and feedback analysis on educational data. It presents two tools - one developed using ASP.NET for association rule mining to analyze elective course combinations, and one developed using PHP to collect and analyze student feedback on faculty performance and institutional infrastructure. The tools are intended to help educational institutions improve decision making, teaching effectiveness, and student outcomes by analyzing patterns in student feedback and course selection data. Sample outputs from applying the tools to student data demonstrate their ability to discover useful associations and evaluate performance.
Machine learning based recommender system for e-commerceIAESIJAI
Nowadays, e-commerce is becoming an essential part of business for many reasons, including the simplicity, availability, richness and diversity of products and services, flexibility of payment methods and the convenience of shopping remotely without losing time. These benefits have greatly optimized the lives of users, especially with the technological development of mobile devices and the availability of the Internet anytime and anywhere. Because of their direct impact on the revenue of e-commerce companies, recommender systems are considered a must in this field. Recommender systems detect items that match the customer's needs based on the customer's previous actions and make them appear in an interesting way. Such a customized experience helps to increase customer engagement and purchase rates as the suggested items are tailored to the customer's interests. Therefore, perfecting recommendation systems that allow for more personalized and accurate item recommendations is a major challenge in the e-marketing world. In our study, we succeeded in developing an algorithm to suggest personal recommendations to customers using association rules via the Frequent-Pattern-Growth algorithm. Our technique generated good results with a high average probability of purchasing the next product suggested by the recommendation system.
This document describes a decision support system (DSS) that uses the Apriori algorithm, genetic algorithm, and fuzzy logic to analyze medical data and make accurate diagnostic decisions. The DSS first uses Apriori to extract association rules from pre-processed medical data. It then applies a genetic algorithm to optimize the results and determine optimal attribute values. Finally, it employs fuzzy logic for decision-making based on the optimized attribute values. The authors tested their DSS on diabetes data and found the results to be interesting. Their proposed system aims to help medical professionals make quicker and more accurate diagnostic decisions.
Sales analysis using product rating in data mining techniqueseSAT Journals
Abstract
In this paper a new product rating approach for mathematically and graphically analyzing sales of same type of products from different manufactures and with most frequent combination of items is proposed. In product sales market there is no specific rating for product of same type and combination of product purchasing pattern. By this we retrieve the best combination of products with mathematically rating. By this rating and pattern we can make graphical representation of rating and combination of product of same type to compare them with other . Data mining provide more abstract knowledge to analyze business functionalities with retail product data. The purpose of product is to fulfill need of customer , based upon it there are different company makes product of same type , by analyzing it mathematically best one can be calculated thing such as customer satisfaction , product efficiency , popularity among them.
Keywords: Data Mining, Sales Report, Product rating, Threshold value.
Review on: Techniques for Predicting Frequent Itemsvivatechijri
Electronic commerce(E- Commerce) is the trading or facilitation of trading in products or services
using computer networks, such as the Internet. It comes under a part of Data Mining which takes large amount
of data and extracts them. The paper uses the information about the techniques and methods used in the
shopping cart for prediction of product that the customer wants to buy or will buy and shows the relevant
products according to the cost of the product. The paper also summarizes the descriptive methods with
examples. For predicting the frequent pattern of itemset, many prediction algorithms, rule mining techniques
and various methods have already been designed for use of retail market. This paper examines literature
analysis on several techniques for mining frequent itemsets.The survey comprises various tree formations like
Partial tree, IT tree and algorithms with its advantages and its limitations.
This document presents a hybrid algorithm that combines Apriori Growth and FP-Split Tree algorithms for web usage mining. The algorithm has two phases: 1) It constructs an FP-Split Tree from web logs in a single pass, reducing complexity compared to FP-Tree which requires two passes. 2) It mines frequent patterns from the FP-Split Tree using an Apriori Growth approach instead of FP-Growth to avoid repeatedly recreating trees. The algorithm was tested on university website logs and showed better performance than traditional FP-Tree and Apriori methods, as it was faster at extracting frequent patterns for different support counts.
Surat tugas menunjuk Dr. Dwiza Riana sebagai penanggung jawab dan Achmad Rifai sebagai ketua pelaksana panitia pengabdian masyarakat STMIK Nusa Mandiri Jakarta untuk merancang company profile organisasi Karang Taruna Ragunan menggunakan blogspot pada 2-4 November 2018 di kampus STMIK Nusa Mandiri Jakarta.
This document compares the Topsis and Saw methods for selecting tourism destinations in Indonesia. It analyzes 11 potential tourism destinations based on 5 criteria: location, cost, transportation, distance, and visitation time. The Saw method ranks Toba Lake as the top choice, while Topsis ranks Raja Ampat in Papua as number one. Both methods are part of multi-attribute decision making and require assigning weights to criteria for calculations. The study aims to help increase tourism by determining criteria for decision making and weighting alternatives.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
1. Riset dan E-Jurnal Manajemen Informatika Komputer
Volume 3, Number 2, April 2019
e-ISSN : 2541-1330
p-ISSN : 2541-1332
41
Analysis of Sales by Using Apriori and FP-
Growth at PT. Panca Putra Solusindo
1
Sita Anggraeni, 2
Marlina Ana Iha, 3
Wati Erawati, 4
Sayyid Khairunnas
STMIK Nusa Mandiri ,STMIK Nusa Mandiri ,Universitas Bina Sarana Informatika,
Universitas Bina Sarana Informatika
Jakarta, Indonesia
sita.sia@nusamandiri.ac.id,marlina.anaiha@yahoo.co.id,wati.wti@bsi.ac.id, sayyid.skh@bsi.ac.id
Abstract— PT. Panca Putra Solusindo is a company which sells Electronic equipment.
It has a lot of transactions. Among these data, the company has not been able to provide
such platform that can be used as information for management to find out which products
are most preferred by consumers. Data mining is present to provide information patterns
to companies in answering these needs and helping in marketing to be more effective.
In this research, the function of the Apriori algorithm association is used to find out the
minimum support and minimum confidence and FP-Growth in knowing the frequent
itemset and both of them can find out the best rule that occurs in the transaction. These
two algorithms are tested by using Weka application version 3.8.
Keywords: Data Mining, Association Rule, Apriori, Frequent pattern growth, FP-Growth,
WEKA
I. INTRODUCTION (HEADING 1)
The increase of business competition requires
developers to find a strategy that can enhance
sales and marketing of products, one of which is
to use sales data. With sales activities every day,
the data will grow more and more. The data does
not only function as an archive for the company,
the data can be utilized and processed into useful
information for increasing sales of the product
and promotions.
Electronic products are items that are very
much needed today, because electronic
equpments are very helpful for humans in
carrying out various activities. This study raised
the problems that exist in PT. Panca Putra
Solusindo, which is still not providing the most
sold and interconnected products. Then the
promotion of products that are made in one
interconnected package is not maximized. In
addition, there are still many products in the
inventory, but the sales process is not or has not
been maximal.
One method that can be used is by applying
data mining usage. Because in data mining, there
are ways and techniques in meeting broad
information needs and that information can be
used as material for decision making. The
implemenation of data mining is by performing
one of the association functions in data mining
using apriori algorithms and analyst associations,
namely FP-Growth algorithm. The Apriori
algorithm which aims to find frequent itemsets is
run on a set of data. Apriori analysis defined a
process for finding all apriori rules that meet the
minimum requirements for support and
minimum requirements for confidence. As for
FPGrowth itself, it is an alternative algorithm
that can be used in terms of determining the set
of data that most frequently appears (frequent
itemset) in a set of data.
According to those backgrounds, the statements
of problems as follows:
1. How to apply the apriori and FP-Growth
algorithms to find out the sales of the most sold
electronic products?
2. Can apriori algorithms and FP-Growth help to
develop marketing strategies?
2. Riset dan E-Jurnal Manajemen Informatika Komputer
Volume 3, Number 2, April 2019
e-ISSN : 2541-1330
p-ISSN : 2541-1332
42
3. How to implement apriori and FP-Growth
algorithms on sales with the Weka application?
The purpose of this research is: To determine the
extent to which apriori and FP-Growth algorithms
can help the development of marketing strategies
with the implementation of Weka's data mining
application software.
The benefit of this research is: Helping companies
to find out the most sold electronic products.
II. LITERATURE REVIEW
Data mining is a term used to describe certain
knowledge and discover data in a database. Data
mining is a process that uses statistical techniques,
mathematics, artificial intelligence, and learning
machine to extract and identify useful information
and related knowledge from various large databases
(KusriniEmha, 2010)
Data mining has several main functions which in
each function has different usage goals, namely
A. Description, to give a brief description of a
large number of data sets and types.
B. Estimation, to guess an unknown value, for
example guessing someone's income when
information about that person is known.
C. Prediction, to estimate future value, for
example predicting the stock of goods in the next year.
D. Classification, is a process of finding a model
or function that explains or distinguishes a concept or
class of data, with the aim of being able to estimate
the class of an object whose label is unknown.
E. Clustering, grouping and identifying data that
has certain characteristics.
F. Association, also called market basket analysis
where this function identifies products that
consumers might purchase together with other
products.
Apriori algorithm is a basic algorithm proposed
by Agrawal & Srikant in 1994 to determine Frequent
itemsets for Boolean association rules. Apriori
algorithms include the types of Association Rules for
data mining. Rules that state associations between
several attributes are often called affinity analysis or
market basket analysis. Association analysis or
association rule mining is a data mining technique to
find the rules of an item combination. One of the
stages of association analysis that attracted the
attention of many researchers to produce efficient
algorithms is the analysis of high frequency patterns.
The importance of an association can be known by
two benchmarks, namely: support and confidence.
Support (supporting value) is the percentage of the
combination of items in the database, whereas
confidence (the value of certainty) is the strength of
the relationship between items in the rules of
association (KusriniEmha, 2010)
1. Analysis of High Frequency Patterns with
Apriori Algorithms
This stage looks for item combinations that
meet the minimum requirements of the support
value in the database. The support value of an
item is obtained by using the following formula:
𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐴) =
𝐽𝑢𝑚𝑙𝑎ℎ 𝑡𝑟𝑎𝑛𝑠𝑎𝑘𝑠𝑖 𝑚𝑒𝑛𝑔𝑎𝑛𝑑𝑢𝑛𝑔 𝐴
𝑇𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑘𝑠𝑖
Meanwhile, the support value of 2 items is
obtained using the formula:
𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐴, 𝐵) = 𝑃(𝐴 ∩ 𝐵)
𝑠𝑢𝑝𝑝𝑜𝑟𝑡 =
∑𝑡𝑟𝑎𝑛𝑠𝑎𝑘𝑠𝑖 𝑚𝑒𝑛𝑔𝑎𝑛𝑑𝑢𝑛𝑔 𝐴 𝑑𝑎𝑛 𝐵
∑𝑡𝑟𝑎𝑛𝑠𝑎𝑘𝑠𝑖
The frequent itemset shows itemset which
have a frequency of occurrence of more than the
specified minimum value (ø ). For instance ø= 2,
then all itemsets whose frequency of occurrence
is more than or equal to 2 times are called frequent.
The set of frequent k-itemset is denoted by Fk.
2. Formation of Association Rules
After all the high frequency patterns are found,
then association rules are sought that meet the
minimum requirements for confidence by
calculating the associative confidence rule A B.
The confidence values of the A B rule are
obtained by the following formula:
𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 =
∑𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝐴 𝑎𝑛𝑑 𝐵
∑𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝐴
To determine the association rules to be chosen,
they must be sorted according to the Support ×
Confidence. Rules are taken as many as n rules
that have the greatest results.
The apriori algorithm is divided into several
stages called narration or pass
1. Forming itemset candidates. The k-itemset
candidate is formed from the combination (k-1)
itemset obtained from the previous iteration. One way
of apriori algorithm is to trim the k-itemset candidate
3. Riset dan E-Jurnal Manajemen Informatika Komputer
Volume 3, Number 2, April 2019
e-ISSN : 2541-1330
p-ISSN : 2541-1332
43
whose subset contains k-1 items not included in the
high-frequency pattern with length k-1.
2. Calculation of support for each k-itemset
candidate. Support from each k-itemset candidate is
obtained by scanning the database to calculate the
number of transactions that contain all items in the k-
itemset candidate. This is also a feature of apriori
algorithm where it is necessary to calculate the entire
database as many as the longest k-itemset.
3. Set a high frequency pattern.
The high frequency pattern that contains k items
or k-itemset is determined from the k-itemset
candidate whose support is greater than the minimum
support.
4. If no new high frequency pattern is obtained
then the whole process is stopped.
The FP-Growth algorithm is a development of the
Apriori algorithm. Frequent Pattern Growth (FP-
Growth) algorithm is an alternative algorithm that can
be used to determine the frequent itemset in a data set
[4].
The FP-Growth algorithm uses the concept of tree
development, commonly called FP-Tree, in frequent
itemset search instead of generate candidate as done
in the Apriori algorithm. By using this concept, the
FP-Growth algorithm is faster than the Apriori
algorithm.
The FP-Growth algorithm has stages that must be
passed in order to provide maximum results, these
stages are:
1. The stage of generating conditional pattern base.
2. The stage of generating conditional FP-Tree.
3. The Stage of frequent itemset search.
Association rules is a process in data mining to
determine all associative rules that meet the minimum
requirements for support (minsup) and confidence
(minconf) in a database. Both of these conditions will
be used for applying interesting association rules
compared to the prescribed limits, namely minsup
and minconf. In determining the minimum support
value of an item, it can be obtained by using the
formula (1) as below:
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝐴) =
𝐴𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝐶𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝐴
𝑇𝑜𝑡𝑎𝑙 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝐴 ∩ 𝐵)
=
𝐴𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝐶𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝐴 𝑎𝑛𝑑 𝐵
𝑇𝑜𝑡𝑎𝑙 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠
Whereas in determining minimum confidence can
be determined by the formula equation (2) as below:
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 (𝐴 → 𝐵) = 𝑃(𝐴|𝐵) =
𝐴𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝐶𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝐴 𝑎𝑛𝑑 𝐵
𝐴𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑇𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝐶𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝐴
III. PROPOSED METHOD
In composing this study, transaction data, in
which sales data, is taken from the sale of PT. Panca
Putra Solusindo Jakarta in January 2016 to December
2016, namely the association of data mining using the
Apriori Algorithm and FP-Growth methods. The
authors took a sample of 116 data from the total
population of data on sales transaction data of PT
Panca Putra Solusindo Jakarta, which amounted to
130 data. The sample to be taken is up to 116 top data
contained in the primary data and contains 14 items.
Before analyzing the Association of data mining
(Moh SholikAbu, 2018), we need the Apriori
algorithm flowchart as follows :
Figure of Flowchart of Apriori Algorithms
Then the steps in determining data mining
associations using the FP-Growth algorithm are as
follows :
Figure of Block Image of FP-Growth Algorithm
Diagram
Furthermore, the author will prepare data with
tabulations in the form of excel with the terms "Y"
and "N" to describe that buying and noting an item of
4. Riset dan E-Jurnal Manajemen Informatika Komputer
Volume 3, Number 2, April 2019
e-ISSN : 2541-1330
p-ISSN : 2541-1332
44
goods (SensuseGoldie, 2012), then the data is
changed in the form of "CSV". The following
tabulation data is as follows.
Table of Tabulation of Transaction data
The next step is to transform the coding into the
Weka data mining application (HagugianPaska, 2017)
by taking part in the Association with the Apriori and
FP-Growth methods.
IV. RESULT AND DISCUSSION
The important phase of data mining analysis
Apriori algorithm is determining the highest
frequency pattern by calculating the support value of
one set item, so that the table is formed as follows:
Tabel of Support Per Item
Item Support Support (%)
PC HP 53 46%
Notebook HP 35 30.2%
PC Lenovo 32 27.6%
Notebook Lenovo 29 25%
Memory 27 23.3%
Other Electronic 21 18.10%
USB Flashdisk 17 15%
PC Dell 16 14%
Printer HP 16 14%
Monitor 7 6.03%
Keyboard 6 5.2%
LCD/CRT Lenovo 5 4.31%
Notebook Dell 3 2.6%
Scanner HP 1 0.9%
From the table above, minimum support can be
determined at 9%, then itemset will be determined
which can meet the minimum support as follows:
Table of Itemset that meets the minimum support
Item Support Support (%)
PC HP 53 46%
Notebook HP 35 30.2%
PC Lenovo 32 27.6%
Notebook Lenovo 29 25%
Memory 27 23.3%
Other Electronic 21 18.10%
USB Flashdisk 17 15%
PC Dell 16 14%
Printer HP 16 14%
Next phase, two item frequency pattern formation
is formed from items of the type of goods that meet
the minimum support by combining all items into two
combinations, the result of the combination of two
items results in a minimum support of 9%. Then the
items that meet the requirements are as follows:
Table of two itemset that fulfill minimum
support
Item Support
Support
(%)
PC HP x Notebook HP 13 11.20%
Notebook HP x Notebook Lenovo 12 10.34%
Notebook HP x Memory 11 9.5%
PC HP x PC Lenovo 10 9%
PC HP x Other Electronic 10 9%
PC HP x PC Dell 10 9%
After the minimum support value is formed, the
author determines how to determine the value of
confidence. In this case, it will be tested using the
Weka application.
The writers use Weka application then select explore
to see the "CSV" type data then by using the Apriori
algorithm the Association rules will be formed as
many as ten best rules with the determination of
confidence of 0.4 so that the image looks as follows:
5. Riset dan E-Jurnal Manajemen Informatika Komputer
Volume 3, Number 2, April 2019
e-ISSN : 2541-1330
p-ISSN : 2541-1332
45
Figure of Minimum Confidence in Apriori using
Weka Application
From the results above, we will get some rules in
the apriori method as follows:
Figure of Best Rule Apriori with Weka Application
And items that have frequent frequent items are
PC HP for 53 times the same sales as the following
picture:
Figure of Frequent item Apriori with Weka
Application
After being proven by testing the Weka
application, the next phase is data processing
based on the FP-Growth Algorithm, in this case
the data is changed back in one format, namely the
condition "true" or "t" as shown below:
Table of FP-Growth Tabulation
After tabulation as shown above, the data is
prepared in the format "CSV", then open the Weka
application to calculate the value of confidence and
the rule produced.
In processing data here, it is necessary to convert
Weka application data into NominaltoBinary and
NumeriktoBinary in the Unsupervised Attribute filter
format as shown below:
6. Riset dan E-Jurnal Manajemen Informatika Komputer
Volume 3, Number 2, April 2019
e-ISSN : 2541-1330
p-ISSN : 2541-1332
46
Figure of Filter Type FP-Growth data
Then after the data type filter is appropriate, rules
will be obtained by determining the minimum
confidence 0.4, which is as follows:
Figure of of FP-Growth results in Weka
Application
Overall, both the Apriori Association algorithm
and the expansion of FP-Growth produce several
rules that often occur in a transaction (frequent items).
This helps management to expand marketing
strategies in increasing sales according to the
combination of items in the association method that
has been described and tested by the Weka
application.
V. CONCLUSION AND SUGGESTION
1. In the Apriori Algorithm, the confidence
value of an item is 92% and we can clearly
know what products are sold high in a period,
while the FP-Growth algorithm produces
several rules that have the most value,
namely frequent itemset, so they can be
rooted in making the next FP-Tree.
2. The best rules are produced from the Apriori
and FP-Growth algorithms which provide
several choices with a predetermined
minimum confidence value of 0.4 so that it
can provide data to marketing to find out in
the future in the sales strategy planning to
improve a better performance.
3. The author is aware of many shortcomings
in data processing and data retrieval, the
authors hope that in the future there will be
other writers developing different
algorithmic methods and producing accurate
analysis for the needs of the company.
VI. ACKNOWLEDGMENT
We gratefully thank to the principal of PT Panca
Putra Solusindo for allowing our to conduct the
research there. We thank our colleagues from STMIK
Nusa Mandiri and Universitas Bina Sarana
Informatika who provided insight and expertise that
greatly assisted the research.
VII. REFERENCES
Hagugian, P. M. (2017). Pengujian Algoritma
Apriori dengan Aplikasi Weka dalam
Pembentukan Asosiation Rule. Jurnal
Mantik Penusa, 98-103.
Kennedi Tampubolon, H. S. (2013). Implementasi
Data Mining Algoritma Apriori pada
Sistem Persediaan Alat-alat Kesehatan.
Majalah Ilmiah INTI, 93-106.
Khoiriah, R. Y. (2015). Implementasi Data Mining
dengan Metode Algoritma Apriori dalam
Menentukan Pola Pembelian Obat. Citec
Journal, 102-113.
Kusrini, E. T. (2010). Algoritma Data Mining. In E.
T. Luthfi, Algoritma Data Mining.
Yogyakarta: Andi.
Moh Sholik, A. S. (2018). Implementasi Algoritma
Apriori untuk mencari Asosiasi Barang
yang dijual di E-Commerce Ordermas.
Techno Com, 158-170.
Nugroho Wandi, R. A. (2012). Pengembangan
Sistem Rekomendasi Penelusuran Buku
dengan Penggalian Association Rule
menggunakan Algoritma Apriori (Studi
Kasus Badan Perpustakaan dan Kearsipan
Provinsi Jawa Timur). Jurnal Teknik ITS,
445-449.
7. Riset dan E-Jurnal Manajemen Informatika Komputer
Volume 3, Number 2, April 2019
e-ISSN : 2541-1330
p-ISSN : 2541-1332
47
Purnamasari, D. (2013). Get Easy Using Weka. In
D. Purnamasari, Get Easy Using Weka.
Jakarta: Dapur Buku.
Samuel, D. (2008). Penerapan Struktur FP-Tree dan
Algoritma FP-Growth dalam Optimasi
Penentuan Frequent Itemset. In D. Samuel,
Penerapan Struktur FP-Tree dan Algoritma
FP-Growth dalam Optimasi Penentuan
Frequent Itemset. Bandung: Institut
Teknologi Bandung.
Sensuse, G. G. (2012). Penerapan Metode Data
Mining Market Basket Analysis terhadap
data Penjualan Produk Buku dengan
menggunakan Algoritma Apriori dan
Frequent Pattern Growth(FP-Growth) Studi
Kasus Percetakan PT Gramedia. Telematika
MKOM, 118-132.
Suryadi, S. S. (2010). Menggali Pengetahuan dan
Bongkahan Data. In S. S. Suryadi,
Menggali Pengetahuan dan Bongkahan
Data. Yogyakarta: Andi Publisher.
Triyanto, W. A. (2014). Association Rule Mining
untuk Penentuan Rekomendasi Promosi
Produk. Simetris, 121-126.
Wiyana, S. K. (2018). Analisis Algoritma FP
Growth untuk Rekomendasi Produk pada
Data Retail Penjualan Produk Kosmetik
(Studi Kasus: MT Shop Kelapa Gading).
Seminar Nasional Teknologi Informasi, 61-
69.