Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-item-associations-methodology-and-a-case-study-in-apparel-retailing/
Association mining is the conventional data mining technique for analyz-ing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not been integrated into market basket analy-sis in earlier studies. This paper proposes a new approach to mine price, time and domain related attributes through re-mining of association mining results. The underlying factors behind positive and negative relationships can be characterized and described through this second data mining stage. The applicability of the methodology is demonstrated through the analy-sis of data coming from a large apparel retail chain, and its algorithmic complexity is analyzed in comparison to the existing techniques.
Re-mining Positive and Negative Association Mining Resultsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-positive-and-negative-association-mining-results/
Positive and negative association mining are well-known and extensively studied data mining techniques to analyze market basket data. Efficient algorithms exist to find both types of association, sepa-rately or simultaneously. Association mining is performed by operating on the transaction data. Despite being an integral part of the transaction data, the pricing and time information has not been incorporated into market basket analysis so far, and additional attributes have been han-dled using quantitative association mining. In this paper, a new approach is proposed to incorporate price, time and domain related attributes into data mining by re-mining the association mining results. The underlying factors behind positive and negative relationships, as indicated by the as-sociation rules, are characterized and described through the second data mining stagere-mining. The applicability of the methodology is demon-strated by analyzing data coming from apparel retailing industry, where price markdown is an essential tool for promoting sales and generating increased revenue.
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...IAEME Publication
In this paper, MDL based reduction in frequent pattern is presented. The ideal outcome of any pattern mining process is to explore the data in new insights. And also, we need to eliminate the non-interesting patterns that describe noise. The major problem in frequent pattern mining is to identify the interesting patterns. Instead of performing association rule mining on all the frequent item sets, it is feasible to select a sub set of frequent item sets and perform the mining task. Selecting a small set of frequent item sets from large amount of interesting ones is a difficult task. In our approach, MDL based algorithm is used for reducing the number of frequent item sets to be used for association rule mining is presented.
4313ijmvsc02A FUZZY AHP APPROACH FOR SUPPLIER SELECTION PROBLEM: A CASE STUDY...ijmvsc
Supplier selection is one of the most important functions of a purchasing department. Since by deciding the best supplier, companies can save material costs and increase competitive advantage. However this decision becomes complicated in case of multiple suppliers, multiple conflicting criteria, and imprecise parameters. In addition the uncertainty and vagueness of the experts’ opinion is the prominent characteristic of the problem. Therefore an extensively used multi criteria decision making tool Fuzzy AHP can be utilized as an approach for supplier selection problem. This paper reveals the application of Fuzzy AHP in a gear motor company determining the best supplier with respect to selected criteria. The contribution of this study is not only the application of the Fuzzy AHP methodology for supplier selection problem, but also releasing a comprehensive literature review of multi criteria decision making problems. In addition by stating the steps of Fuzzy AHP clearly and numerically, this study can be a guide of the methodology to be implemented to other multiple criteria decision making problems.
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...inventionjournals
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
This document discusses applying machine learning algorithms to three datasets: a housing dataset to predict prices, a banking dataset to predict customer churn, and a credit card dataset for customer segmentation. For housing prices, linear regression, regression trees and gradient boosted trees are applied and evaluated on test data using R2 and RMSE. For customer churn, logistic regression and random forests are used with sampling to address class imbalance, and evaluated using confusion matrix metrics. For credit card data, k-means clustering with PCA is used to segment customers into groups.
Opinion mining framework using proposed RB-bayes model for text classicationIJECEIAES
Information mining is a capable idea with incredible potential to anticipate future patterns and conduct. It alludes to the extraction of concealed information from vast data sets by utilizing procedures like factual examination, machine learning, grouping, neural systems and genetic algorithms. In naive baye’s, there exists a problem of zero likelihood. This paper proposed RB-Bayes method based on baye’s theorem for prediction to remove problem of zero likelihood. We also compare our method with few existing methods i.e. naive baye’s and SVM. We demonstrate that this technique is better than some current techniques and specifically can analyze data sets in better way. At the point when the proposed approach is tried on genuine data-sets, the outcomes got improved accuracy in most cases. RB-Bayes calculation having precision 83.333.
Assessing Discriminatory Performance of a Binary Logistic Regression Modelsajjalp
The evaluation of fitted binary logistic regression model is very important in assessing the appropriateness of a model for specific purposes. The studyproposesto assess the discriminatory performance of a binary logistic regression model to correctly classify between the cases and non-cases. The discriminatory performance of binary logistic regression model is measured using two approaches. The first approach is the use of fitted binary logistic regression model to correctly predict the subjects that are cases and non-cases,with the help of the parameters sensitivity and specificity. The alternative approach is basedon receiver operatingcharacteristic(ROC)curvefor the fitted binary logistic regression model and then determining the area under the curve (AUC) as a measure of discriminatory performance. The value of sensitivity is observed to be greater than the value of 1-specificity, which signifies suitable discrimination for the mentioned cut point. The area under the curve indicates that there is evidence of reasonable discrimination reported bythe fitted model.
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...MereoConsulting
This document summarizes an academic paper that proposes using genetic algorithms and k-means clustering to optimize product composition and maximize profits for a printing company. The researchers first group the company's products into 9 clusters using k-means based on characteristics like contribution margin and production efficiency. They then apply a genetic algorithm to minimize costs by determining the optimal sales volume for each cluster, with the goal of maximizing overall profits while respecting production capacity constraints. The results indicate this approach can significantly increase both revenue and contribution margin for the printing company.
Re-mining Positive and Negative Association Mining Resultsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-positive-and-negative-association-mining-results/
Positive and negative association mining are well-known and extensively studied data mining techniques to analyze market basket data. Efficient algorithms exist to find both types of association, sepa-rately or simultaneously. Association mining is performed by operating on the transaction data. Despite being an integral part of the transaction data, the pricing and time information has not been incorporated into market basket analysis so far, and additional attributes have been han-dled using quantitative association mining. In this paper, a new approach is proposed to incorporate price, time and domain related attributes into data mining by re-mining the association mining results. The underlying factors behind positive and negative relationships, as indicated by the as-sociation rules, are characterized and described through the second data mining stagere-mining. The applicability of the methodology is demon-strated by analyzing data coming from apparel retailing industry, where price markdown is an essential tool for promoting sales and generating increased revenue.
A NOVEL APPROACH TO MINE FREQUENT PATTERNS FROM LARGE VOLUME OF DATASET USING...IAEME Publication
In this paper, MDL based reduction in frequent pattern is presented. The ideal outcome of any pattern mining process is to explore the data in new insights. And also, we need to eliminate the non-interesting patterns that describe noise. The major problem in frequent pattern mining is to identify the interesting patterns. Instead of performing association rule mining on all the frequent item sets, it is feasible to select a sub set of frequent item sets and perform the mining task. Selecting a small set of frequent item sets from large amount of interesting ones is a difficult task. In our approach, MDL based algorithm is used for reducing the number of frequent item sets to be used for association rule mining is presented.
4313ijmvsc02A FUZZY AHP APPROACH FOR SUPPLIER SELECTION PROBLEM: A CASE STUDY...ijmvsc
Supplier selection is one of the most important functions of a purchasing department. Since by deciding the best supplier, companies can save material costs and increase competitive advantage. However this decision becomes complicated in case of multiple suppliers, multiple conflicting criteria, and imprecise parameters. In addition the uncertainty and vagueness of the experts’ opinion is the prominent characteristic of the problem. Therefore an extensively used multi criteria decision making tool Fuzzy AHP can be utilized as an approach for supplier selection problem. This paper reveals the application of Fuzzy AHP in a gear motor company determining the best supplier with respect to selected criteria. The contribution of this study is not only the application of the Fuzzy AHP methodology for supplier selection problem, but also releasing a comprehensive literature review of multi criteria decision making problems. In addition by stating the steps of Fuzzy AHP clearly and numerically, this study can be a guide of the methodology to be implemented to other multiple criteria decision making problems.
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...inventionjournals
International Journal of Mathematics and Statistics Invention (IJMSI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJMSI publishes research articles and reviews within the whole field Mathematics and Statistics, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
This document discusses applying machine learning algorithms to three datasets: a housing dataset to predict prices, a banking dataset to predict customer churn, and a credit card dataset for customer segmentation. For housing prices, linear regression, regression trees and gradient boosted trees are applied and evaluated on test data using R2 and RMSE. For customer churn, logistic regression and random forests are used with sampling to address class imbalance, and evaluated using confusion matrix metrics. For credit card data, k-means clustering with PCA is used to segment customers into groups.
Opinion mining framework using proposed RB-bayes model for text classicationIJECEIAES
Information mining is a capable idea with incredible potential to anticipate future patterns and conduct. It alludes to the extraction of concealed information from vast data sets by utilizing procedures like factual examination, machine learning, grouping, neural systems and genetic algorithms. In naive baye’s, there exists a problem of zero likelihood. This paper proposed RB-Bayes method based on baye’s theorem for prediction to remove problem of zero likelihood. We also compare our method with few existing methods i.e. naive baye’s and SVM. We demonstrate that this technique is better than some current techniques and specifically can analyze data sets in better way. At the point when the proposed approach is tried on genuine data-sets, the outcomes got improved accuracy in most cases. RB-Bayes calculation having precision 83.333.
Assessing Discriminatory Performance of a Binary Logistic Regression Modelsajjalp
The evaluation of fitted binary logistic regression model is very important in assessing the appropriateness of a model for specific purposes. The studyproposesto assess the discriminatory performance of a binary logistic regression model to correctly classify between the cases and non-cases. The discriminatory performance of binary logistic regression model is measured using two approaches. The first approach is the use of fitted binary logistic regression model to correctly predict the subjects that are cases and non-cases,with the help of the parameters sensitivity and specificity. The alternative approach is basedon receiver operatingcharacteristic(ROC)curvefor the fitted binary logistic regression model and then determining the area under the curve (AUC) as a measure of discriminatory performance. The value of sensitivity is observed to be greater than the value of 1-specificity, which signifies suitable discrimination for the mentioned cut point. The area under the curve indicates that there is evidence of reasonable discrimination reported bythe fitted model.
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...MereoConsulting
This document summarizes an academic paper that proposes using genetic algorithms and k-means clustering to optimize product composition and maximize profits for a printing company. The researchers first group the company's products into 9 clusters using k-means based on characteristics like contribution margin and production efficiency. They then apply a genetic algorithm to minimize costs by determining the optimal sales volume for each cluster, with the goal of maximizing overall profits while respecting production capacity constraints. The results indicate this approach can significantly increase both revenue and contribution margin for the printing company.
Rule-based expert systems for supporting university studentsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/rule-based-expert-systems-for-supporting-university-students/
There are more than 15 million college students in the US alone. Academic advising for courses and scholarships is typically performed by human advisors, bringing an immense managerial workload to faculty members, as well as other staff at universities. This paper reports and discusses the development of two educational expert systems at a private international university. The first expert system is a course advising system which recommends courses to undergraduate students. The second system suggests scholarships to undergraduate students based on their eligibility. While there have been reported systems for course advising, the literature does not seem to contain any references to expert systems for scholarship recommendation and eligibility checking. Therefore the scholarship recommender that we developed is first of its kind. Both systems have been implemented and tested using Oracle Policy Automation (OPA) software.
Visual and analytical mining of transactions data for production planning f...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/visual-and-analytical-mining-of-sales-transaction-data-for-production-planning-and-marketing/
Recent developments in information technology paved the way for the collection of large amounts of data pertaining to various aspects of an enterprise. The greatest challenge faced in processing these massive amounts of raw data gathered turns out to be the effective management of data with the ultimate purpose of deriving necessary and meaningful information out of it. The following paper presents an attempt to illustrate the combination of visual and analytical data mining techniques for planning of marketing and production activities. The primary phases of the proposed framework consist of filtering, clustering and comparison steps
implemented using interactive pie charts, K-Means algorithm and parallel coordinate plots respectively. A prototype decision support system is developed and a sample analysis session is conducted to demonstrate the applicability of the framework.
İndirmek için Bağlantı > https://ertekprojects.com/gurdal-ertek-publications/blog/depolama-sistemleri/
Depolar, ürünlerin dağıtımı sırasında kullanılan geçici stok noktalarıdır. Depolar, tedarik zincirlerinin hedeflenen amaçlar doğrultusunda çalışmasına ve lojistik faaliyetlerinin etkin yürütülmesine önemli katkıda bulunurlar. Depolar, üretim tesislerinin içinde veya yanında bulunabileceği gibi, ayrı, özel olarak inşa edilmiş yapılar halinde de kurulabilirler. Şekil 4.1’de, tipik bir deponun genel görünüşü sunulmaktadır. Malzeme/ürünler, bu tipik depoda raflarda depolanmakta, malzeme giriş çıkışları depo rampaları üzerinden gerçekleşmekte, yükleme/boşaltma işlemleri forklift olarak adlandırılan araçlar kullanılarak gerçekleştirilmektedir. Deponun yönetimi, Depo Yöneticisi (Warehouse Manager) ya da Depo Müdürüunvanını taşıyan bir lojistik uzmanı tarafından yürütülmektedir.
Financial Benchmarking Of Transportation Companies In The New York Stock Exc...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/financial-benchmarking-of-transportation-companies-in-the-new-york-stock-exchange-nyse-through-data-envelopment-analysis-dea-and-visualization/
In this paper, we present a benchmarking study of industrial transportation companies traded in the New York Stock Exchange (NYSE). There are two distinguishing aspects of our study: First, instead of using operational data for the input and the output items of the developed Data Envelopment Analysis (DEA) model, we use financial data of the companies that are readily available on the Internet. Secondly, we visualize the efficiency scores of the companies in relation to the subsectors and the number of employees. These visualizations enable us to discover interesting insights about the companies within each subsector, and about subsectors in comparison to each other. The visualization approach that we employ can be used in any DEA study that contains subgroups within a group. Thus, our paper also contains a methodological contribution.
Simulation Modeling For Quality And Productivity In Steel Cord Manufacturingertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/simulation-modeling-for-quality-and-productivity-in-steel-cord-manufacturing/
We describe the application of simulation modeling to estimate and improve quality and productivity performance of a steel cord manufacturing system. We describe the typical steel cord manufacturing plant, emphasize its distinguishing characteristics, identify various production settings and discuss applicability of simulation as a management decision support tool. Besides presenting the general structure of the developed simulation model, we focus on wire fractures, which can be an important source of system disruption.
Teaching Warehousing Concepts through Interactive Animations and 3-D Modelsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/teaching-warehousing-concepts-through-interactive-animations-and-3-d-models/
Teaching Warehousing Concepts through Interactive Animations and 3-D Models
Impact of Cross Aisles in a Rectangular Warehouse: A Computational Studyertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/impact-of-cross-aisles-in-a-rectangular-warehouse-a-computational-study/
Order picking is typically the most costly operation in a warehouse and traveling is typically the most time consuming task within order picking. In this study we focus on the layout design for a rectangular warehouse, a warehouse with parallel storage blocks with main aisles separating them. We specifically analyze the impact of adding cross aisles that cut storage blocks perpendicularly, which can reduce travel times during order picking by introducing flexibility in going from one main aisle to the next. We consider two types of cross aisles, those that are equally spaced (Case 1) and those that are unequally spaced, which respectively have equal and unequal distances among them. For Case 2, we extend an earlier model and present a heuristic algorithm for finding the best distances among cross aisles. We carry out extensive computational experiments for a variety of warehouse designs. Our findings suggest that warehouse planners can obtain great travel time savings through establishing equally spaced cross aisles, but little additional savings in unequally-spaced cross isles. We present a look-up table that provides the best number of equally spaced cross aisles when the number of cross aisles (N) and the length of the warehouse (T) are given. Finally, when the values of N and T are not known, we suggest establishing three cross aisles in a warehouse.Order picking is typically the most costly operation in a warehouse and traveling is typically the most time consuming task within order picking. In this study we focus on the layout design for a rectangular warehouse, a warehouse with parallel storage blocks with main aisles separating them. We specifically analyze the impact of adding cross aisles that cut storage blocks perpendicularly, which can reduce travel times during order picking by introducing flexibility in going from one main aisle to the next. We consider two types of cross aisles, those that are equally spaced (Case 1) and those that are unequally spaced, which respectively have equal and unequal distances among them. For Case 2, we extend an earlier model and present a heuristic algorithm for finding the best distances among cross aisles. We carry out extensive computational experiments for a variety of warehouse designs. Our findings suggest that warehouse planners can obtain great travel time savings through establishing equally spaced cross aisles, but little additional savings in unequally-spaced cross isles. We present a look-up table that provides the best number of equally spaced cross aisles when the number of cross aisles (N) and the length of the warehouse (T) are given. Finally, when the values of N and T are not known, we suggest establishing three cross aisles in a warehouse.
Encapsulating And Representing The Knowledge On The Evolution Of An Engineeri...ertekg
1) The document describes a methodology for encapsulating and representing knowledge about the evolution of an engineering system. It involves mathematical formalism to represent design steps, contradictions, goals and TRIZ principles.
2) A graph visualization is proposed to represent the design process, with nodes as possible TRIZ principles and colored nodes indicating principles selected at each step.
3) A database structure is outlined to store information on contradictions, principles, design steps and which principles were selected, along with explanations. This knowledge representation aims to guide design of similar products.
Statistical Scoring Algorithm for Learning and Study Skillsertekg
İndirmek için Bağlantı > https://ertekprojects.com/gurdal-ertek-publications/blog/statistical-scoring-algorithm-for-learning-and-study-skills/
This study examines the study skills and the learning styles of university students by using scoring method. The study investigates whether the study skills can be summarized in a single universal score that measures how hard a student works. The sample consists of 418 undergraduate students of an international university. The presented scoring was method adapted from the domain of risk management. The proposed method computes an overall score that represents the study skills, using a linear weighted summation scheme. From among 50 questions regarding to learning and study skills, the 30 highest weighted questions are suggested to be used in the future studies as a learning and study skills inventor. The proposed scoring method and study yield results and insights that can guide educators regarding how they can improve their students’ study skills. The main point drawn from this study is that the students greatly value opportunities for interaction with instructors and peers, cooperative learning and active engagement in lectures.
Visual Mining of Science Citation Data for Benchmarking Scientific and Techno...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/visual-mining-of-science-citation-data-for-benchmarking-scientific-and-technological-competitiveness-of-world-countries/
In this paper we present a study where we visually analyzed science citation data to investigate the competitiveness of world countries in selected categories of science. The dataset that we worked on in our study includes the number of papers published and the number of citations made in the ESI (Essential Science Indicators) database in 2004. The dataset lists these values for practically every country in the world. In analyzing the data, we employ methods and software tools developed and used in the data mining and information visualization fields of the Computer Science. Some of the questions for which we look for answers in this study are the following: (a) Which countries are most competitive in the selected categories of science? (i.e. Engineering, Computer Science, Economics & Business) (b) What type of correlations exist between different categories of science? For example, do countries with many published papers in the field of Engineering science also have many papers published on Computer Science or Economics & Business? (c) Which countries produce the most influential papers? This analysis is needed since a country may have many papers published but these papers may be cited very rarely. (d) Can we gain useful and actionable insights by combining science citation data with socioeconomic and geographical data?
Design Requirements For a Tendon Rehabilitation Robot: Results From a Survey ...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/design-requirements-for-a-tendon-rehabilitation-robot-results-from-a-survey-of-engineers-and-health-professionals/
Exoskeleton type finger rehabilitation robots are helpful in assisting the treatment of tendon injuries. A survey has been carried out with engineers and health professionals to further develop an existing finger exoskeleton prototype. The goal of the study is to better understand the relative importance of several design criteria through the analysis of survey results and to improve the finger exoskeleton accordingly. The survey questions with strong correlations are identified and the preferences of the two respondent groups are statistically compared. The results of the statistical analysis are interpreted and insights obtained are used to guide the design process. The answers to the qualitative questions are also discussed together with their design implications. Finally, Quality Function Deployment (QFD) has been employed for visualizing these functional requirements in relation to the customer requirements.
Application of local search methods for solving a quadratic assignment proble...ertekg
Ertek, G., Aksu, B., Birbil, S. E., İkikat, M. C., Yıldırmaz, C. (2005). “Application of local search methods for solving a quadratic assignment problem: A case study”, Proceedings of Computers and Industrial Engineering Conference, 2005. Istanbul, Turkey.
Application Of Local Search Methods For Solving A Quadratic Assignment Probl...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/application-of-local-search-methods-for-solving-a-quadratic-assignment-problem-a-case-study/
This paper discusses the design and application of local search methods to a real-life application at a steel cord manufacturing plant. The case study involves a layout problem that can be represented as a Quadratic Assignment Problem (QAP). Due to the nature of the manufacturing process, certain machinery need to be allocated in close proximity to each other. This issue is incorporated into the objective function through assigning high penalty costs to the unfavorable allocations. QAP belongs to one of the most difficult class of combinatorial optimization problems, and is not solvable to optimality as the number of facilities increases. We implement the well-known local search methods, 2-opt, 3-opt and tabu search. We compare the solution performances of the methods to the results obtained from the NEOS server, which provides free access to many optimization solvers on the internet.
Linking Behavioral Patterns to Personal Attributes through Data Re-Miningertekg
Download Link >https://ertekprojects.com/gurdal-ertek-publications/blog/linking-behavioral-patterns-to-personal-attributes-through-data-re-mining/
A fundamental challenge in behavioral informatics is the development of methodologies and systems that can achieve its goals and tasks, including be-havior pattern analysis. This study presents such a methodology, that can be con-verted into a decision support system, by the appropriate integration of existing tools for association mining and graph visualization. The methodology enables the linking of behavioral patterns to personal attributes, through the re-mining of colored association graphs that represent item associations. The methodology is described and mathematically formalized, and is demonstrated in a case study related with retail industry.
Optimizing Waste Collection In An Organized Industrial Region: A Case Studyertekg
This document summarizes a case study that optimizes industrial waste collection from 17 factories located in an organized industrial region in Turkey. The authors developed a mixed integer programming model to determine the optimal waste container locations and transportation routes to minimize total costs. They applied the model to real data from an industrial zone. The optimal solution selected 3 out of 5 candidate locations and had a minimum monthly cost of 70,338 Turkish Lira. The authors also created a visualization of the optimal supply chain network to provide additional insights into the solution.
The Bullwhip Effect In Supply Chain Reflections After A Decadeertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/the-bullwhip-effect-in-supply-chain-reflections-after-a-decade/
A decade has passed since the publication of the two seminal papers by Lee, Padmanabhan and Whang (1997) that describes the “bullwhip effect” in supply chains and characterizes its underlying causes. The bullwhip phenomenon is observed in supply chains where the decisions at the subsequent stages of the supply chain are made greedily based on local information, rather than through coordination based on global information on the state of the whole chain. The first consequence of this information distortion is higher variance in purchasing quantities compared to sales quantities at a particular supply chain stage. The second consequence is increasingly higher variance in order quantities and inventory levels in the upstream stages compared to their downstream stages (buyers). In this paper, we survey a decade of literature on the bullwhip effect and present the key insights reported by researchers and practitioners. We also present our reflections and share our vision of possible future.
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/a-taxonomy-of-logistics-innovations/
In this paper we present a taxonomy of supply chain and logistics innovations, which is based on an extensive literature survey. Our primary goal is to provide guidelines for choosing the most appropriate innovations for a company, such that the company can outrun its competitors. We investigate the factors, both internal and external to the company, that determine the applicability and effectiveness of the listed innovations. We support our suggestions with real world cases reported in literature.
Supplier and Buyer Driven Channels in a Two-Stage Supply Chainertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/supplier-and-buyer-driven-channels-in-a-two-stage-supply-chain/
We explore the impact of power structure on price, sensitivity of market price, and profits in a two-stage supply chain with single product, supplier and buyer, and a price sensitive market. We develop and analyze the case where the supplier has dominant bargaining power and the case where the buyer has dominant bargaining power. We consider a pricing scheme for the buyer that involves both a multiplier and a markup. We show that it is optimal for the buyer to set the markup to zero and use only a multiplier. We also show that the market price and its sensitivity are higher when operational costs (namely distribution and inventory) exist. We observe that the sensitivity of the market price increases non-linearly as the wholesale price increases, and derive a lower bound for it. Through experimental analysis, we show that marginal impact of increasing shipment cost and carrying charge (interest rate) on prices and profits are decreasing in both cases. Finally, we show that there exist problem instances where the buyer may prefer supplier-driven case to markup-only buyer-driven and similarly problem instances where the supplier may prefer markup-only buyer-driven case to supplier-driven.
Re-mining item associations: Methodology and a case study in apparel retailingGurdal Ertek
Association mining is the conventional data mining technique for analyzing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not been integrated into market basket analysis in earlier studies. This paper proposes a new approach to mine price,
time and domain related attributes through re-mining of association mining results. The underlying factors behind positive and negative relationships can be characterized and described through this second data mining stage. The applicability of the methodology is demonstrated through the analysis of data coming from a large apparel retail chain, and its algorithmic complexity is analyzed in comparison to the existing techniques.
http://research.sabanciuniv.edu.
Re-mining Positive and Negative Association Mining ResultsGurdal Ertek
Positive and negative association mining are well-known and
extensively studied data mining techniques to analyze market basket data. Efficient algorithms exist to find both types of association, separately or simultaneously. Association mining is performed by operating on the transaction data. Despite being an integral part of the transaction data, the pricing and time information has not been incorporated into market basket analysis so far, and additional attributes have been handled
using quantitative association mining. In this paper, a new approach is proposed to incorporate price, time and domain related attributes into data mining by re-mining the association mining results. The underlying factors behind positive and negative relationships, as indicated by the association rules, are characterized and described through the second data
mining stage re-mining. The applicability of the methodology is demonstrated by analyzing data coming from apparel retailing industry, where price markdown is an essential tool for promoting sales and generating increased revenue.
https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-positive-and-negative-association-mining-results/
Rule-based expert systems for supporting university studentsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/rule-based-expert-systems-for-supporting-university-students/
There are more than 15 million college students in the US alone. Academic advising for courses and scholarships is typically performed by human advisors, bringing an immense managerial workload to faculty members, as well as other staff at universities. This paper reports and discusses the development of two educational expert systems at a private international university. The first expert system is a course advising system which recommends courses to undergraduate students. The second system suggests scholarships to undergraduate students based on their eligibility. While there have been reported systems for course advising, the literature does not seem to contain any references to expert systems for scholarship recommendation and eligibility checking. Therefore the scholarship recommender that we developed is first of its kind. Both systems have been implemented and tested using Oracle Policy Automation (OPA) software.
Visual and analytical mining of transactions data for production planning f...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/visual-and-analytical-mining-of-sales-transaction-data-for-production-planning-and-marketing/
Recent developments in information technology paved the way for the collection of large amounts of data pertaining to various aspects of an enterprise. The greatest challenge faced in processing these massive amounts of raw data gathered turns out to be the effective management of data with the ultimate purpose of deriving necessary and meaningful information out of it. The following paper presents an attempt to illustrate the combination of visual and analytical data mining techniques for planning of marketing and production activities. The primary phases of the proposed framework consist of filtering, clustering and comparison steps
implemented using interactive pie charts, K-Means algorithm and parallel coordinate plots respectively. A prototype decision support system is developed and a sample analysis session is conducted to demonstrate the applicability of the framework.
İndirmek için Bağlantı > https://ertekprojects.com/gurdal-ertek-publications/blog/depolama-sistemleri/
Depolar, ürünlerin dağıtımı sırasında kullanılan geçici stok noktalarıdır. Depolar, tedarik zincirlerinin hedeflenen amaçlar doğrultusunda çalışmasına ve lojistik faaliyetlerinin etkin yürütülmesine önemli katkıda bulunurlar. Depolar, üretim tesislerinin içinde veya yanında bulunabileceği gibi, ayrı, özel olarak inşa edilmiş yapılar halinde de kurulabilirler. Şekil 4.1’de, tipik bir deponun genel görünüşü sunulmaktadır. Malzeme/ürünler, bu tipik depoda raflarda depolanmakta, malzeme giriş çıkışları depo rampaları üzerinden gerçekleşmekte, yükleme/boşaltma işlemleri forklift olarak adlandırılan araçlar kullanılarak gerçekleştirilmektedir. Deponun yönetimi, Depo Yöneticisi (Warehouse Manager) ya da Depo Müdürüunvanını taşıyan bir lojistik uzmanı tarafından yürütülmektedir.
Financial Benchmarking Of Transportation Companies In The New York Stock Exc...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/financial-benchmarking-of-transportation-companies-in-the-new-york-stock-exchange-nyse-through-data-envelopment-analysis-dea-and-visualization/
In this paper, we present a benchmarking study of industrial transportation companies traded in the New York Stock Exchange (NYSE). There are two distinguishing aspects of our study: First, instead of using operational data for the input and the output items of the developed Data Envelopment Analysis (DEA) model, we use financial data of the companies that are readily available on the Internet. Secondly, we visualize the efficiency scores of the companies in relation to the subsectors and the number of employees. These visualizations enable us to discover interesting insights about the companies within each subsector, and about subsectors in comparison to each other. The visualization approach that we employ can be used in any DEA study that contains subgroups within a group. Thus, our paper also contains a methodological contribution.
Simulation Modeling For Quality And Productivity In Steel Cord Manufacturingertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/simulation-modeling-for-quality-and-productivity-in-steel-cord-manufacturing/
We describe the application of simulation modeling to estimate and improve quality and productivity performance of a steel cord manufacturing system. We describe the typical steel cord manufacturing plant, emphasize its distinguishing characteristics, identify various production settings and discuss applicability of simulation as a management decision support tool. Besides presenting the general structure of the developed simulation model, we focus on wire fractures, which can be an important source of system disruption.
Teaching Warehousing Concepts through Interactive Animations and 3-D Modelsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/teaching-warehousing-concepts-through-interactive-animations-and-3-d-models/
Teaching Warehousing Concepts through Interactive Animations and 3-D Models
Impact of Cross Aisles in a Rectangular Warehouse: A Computational Studyertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/impact-of-cross-aisles-in-a-rectangular-warehouse-a-computational-study/
Order picking is typically the most costly operation in a warehouse and traveling is typically the most time consuming task within order picking. In this study we focus on the layout design for a rectangular warehouse, a warehouse with parallel storage blocks with main aisles separating them. We specifically analyze the impact of adding cross aisles that cut storage blocks perpendicularly, which can reduce travel times during order picking by introducing flexibility in going from one main aisle to the next. We consider two types of cross aisles, those that are equally spaced (Case 1) and those that are unequally spaced, which respectively have equal and unequal distances among them. For Case 2, we extend an earlier model and present a heuristic algorithm for finding the best distances among cross aisles. We carry out extensive computational experiments for a variety of warehouse designs. Our findings suggest that warehouse planners can obtain great travel time savings through establishing equally spaced cross aisles, but little additional savings in unequally-spaced cross isles. We present a look-up table that provides the best number of equally spaced cross aisles when the number of cross aisles (N) and the length of the warehouse (T) are given. Finally, when the values of N and T are not known, we suggest establishing three cross aisles in a warehouse.Order picking is typically the most costly operation in a warehouse and traveling is typically the most time consuming task within order picking. In this study we focus on the layout design for a rectangular warehouse, a warehouse with parallel storage blocks with main aisles separating them. We specifically analyze the impact of adding cross aisles that cut storage blocks perpendicularly, which can reduce travel times during order picking by introducing flexibility in going from one main aisle to the next. We consider two types of cross aisles, those that are equally spaced (Case 1) and those that are unequally spaced, which respectively have equal and unequal distances among them. For Case 2, we extend an earlier model and present a heuristic algorithm for finding the best distances among cross aisles. We carry out extensive computational experiments for a variety of warehouse designs. Our findings suggest that warehouse planners can obtain great travel time savings through establishing equally spaced cross aisles, but little additional savings in unequally-spaced cross isles. We present a look-up table that provides the best number of equally spaced cross aisles when the number of cross aisles (N) and the length of the warehouse (T) are given. Finally, when the values of N and T are not known, we suggest establishing three cross aisles in a warehouse.
Encapsulating And Representing The Knowledge On The Evolution Of An Engineeri...ertekg
1) The document describes a methodology for encapsulating and representing knowledge about the evolution of an engineering system. It involves mathematical formalism to represent design steps, contradictions, goals and TRIZ principles.
2) A graph visualization is proposed to represent the design process, with nodes as possible TRIZ principles and colored nodes indicating principles selected at each step.
3) A database structure is outlined to store information on contradictions, principles, design steps and which principles were selected, along with explanations. This knowledge representation aims to guide design of similar products.
Statistical Scoring Algorithm for Learning and Study Skillsertekg
İndirmek için Bağlantı > https://ertekprojects.com/gurdal-ertek-publications/blog/statistical-scoring-algorithm-for-learning-and-study-skills/
This study examines the study skills and the learning styles of university students by using scoring method. The study investigates whether the study skills can be summarized in a single universal score that measures how hard a student works. The sample consists of 418 undergraduate students of an international university. The presented scoring was method adapted from the domain of risk management. The proposed method computes an overall score that represents the study skills, using a linear weighted summation scheme. From among 50 questions regarding to learning and study skills, the 30 highest weighted questions are suggested to be used in the future studies as a learning and study skills inventor. The proposed scoring method and study yield results and insights that can guide educators regarding how they can improve their students’ study skills. The main point drawn from this study is that the students greatly value opportunities for interaction with instructors and peers, cooperative learning and active engagement in lectures.
Visual Mining of Science Citation Data for Benchmarking Scientific and Techno...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/visual-mining-of-science-citation-data-for-benchmarking-scientific-and-technological-competitiveness-of-world-countries/
In this paper we present a study where we visually analyzed science citation data to investigate the competitiveness of world countries in selected categories of science. The dataset that we worked on in our study includes the number of papers published and the number of citations made in the ESI (Essential Science Indicators) database in 2004. The dataset lists these values for practically every country in the world. In analyzing the data, we employ methods and software tools developed and used in the data mining and information visualization fields of the Computer Science. Some of the questions for which we look for answers in this study are the following: (a) Which countries are most competitive in the selected categories of science? (i.e. Engineering, Computer Science, Economics & Business) (b) What type of correlations exist between different categories of science? For example, do countries with many published papers in the field of Engineering science also have many papers published on Computer Science or Economics & Business? (c) Which countries produce the most influential papers? This analysis is needed since a country may have many papers published but these papers may be cited very rarely. (d) Can we gain useful and actionable insights by combining science citation data with socioeconomic and geographical data?
Design Requirements For a Tendon Rehabilitation Robot: Results From a Survey ...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/design-requirements-for-a-tendon-rehabilitation-robot-results-from-a-survey-of-engineers-and-health-professionals/
Exoskeleton type finger rehabilitation robots are helpful in assisting the treatment of tendon injuries. A survey has been carried out with engineers and health professionals to further develop an existing finger exoskeleton prototype. The goal of the study is to better understand the relative importance of several design criteria through the analysis of survey results and to improve the finger exoskeleton accordingly. The survey questions with strong correlations are identified and the preferences of the two respondent groups are statistically compared. The results of the statistical analysis are interpreted and insights obtained are used to guide the design process. The answers to the qualitative questions are also discussed together with their design implications. Finally, Quality Function Deployment (QFD) has been employed for visualizing these functional requirements in relation to the customer requirements.
Application of local search methods for solving a quadratic assignment proble...ertekg
Ertek, G., Aksu, B., Birbil, S. E., İkikat, M. C., Yıldırmaz, C. (2005). “Application of local search methods for solving a quadratic assignment problem: A case study”, Proceedings of Computers and Industrial Engineering Conference, 2005. Istanbul, Turkey.
Application Of Local Search Methods For Solving A Quadratic Assignment Probl...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/application-of-local-search-methods-for-solving-a-quadratic-assignment-problem-a-case-study/
This paper discusses the design and application of local search methods to a real-life application at a steel cord manufacturing plant. The case study involves a layout problem that can be represented as a Quadratic Assignment Problem (QAP). Due to the nature of the manufacturing process, certain machinery need to be allocated in close proximity to each other. This issue is incorporated into the objective function through assigning high penalty costs to the unfavorable allocations. QAP belongs to one of the most difficult class of combinatorial optimization problems, and is not solvable to optimality as the number of facilities increases. We implement the well-known local search methods, 2-opt, 3-opt and tabu search. We compare the solution performances of the methods to the results obtained from the NEOS server, which provides free access to many optimization solvers on the internet.
Linking Behavioral Patterns to Personal Attributes through Data Re-Miningertekg
Download Link >https://ertekprojects.com/gurdal-ertek-publications/blog/linking-behavioral-patterns-to-personal-attributes-through-data-re-mining/
A fundamental challenge in behavioral informatics is the development of methodologies and systems that can achieve its goals and tasks, including be-havior pattern analysis. This study presents such a methodology, that can be con-verted into a decision support system, by the appropriate integration of existing tools for association mining and graph visualization. The methodology enables the linking of behavioral patterns to personal attributes, through the re-mining of colored association graphs that represent item associations. The methodology is described and mathematically formalized, and is demonstrated in a case study related with retail industry.
Optimizing Waste Collection In An Organized Industrial Region: A Case Studyertekg
This document summarizes a case study that optimizes industrial waste collection from 17 factories located in an organized industrial region in Turkey. The authors developed a mixed integer programming model to determine the optimal waste container locations and transportation routes to minimize total costs. They applied the model to real data from an industrial zone. The optimal solution selected 3 out of 5 candidate locations and had a minimum monthly cost of 70,338 Turkish Lira. The authors also created a visualization of the optimal supply chain network to provide additional insights into the solution.
The Bullwhip Effect In Supply Chain Reflections After A Decadeertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/the-bullwhip-effect-in-supply-chain-reflections-after-a-decade/
A decade has passed since the publication of the two seminal papers by Lee, Padmanabhan and Whang (1997) that describes the “bullwhip effect” in supply chains and characterizes its underlying causes. The bullwhip phenomenon is observed in supply chains where the decisions at the subsequent stages of the supply chain are made greedily based on local information, rather than through coordination based on global information on the state of the whole chain. The first consequence of this information distortion is higher variance in purchasing quantities compared to sales quantities at a particular supply chain stage. The second consequence is increasingly higher variance in order quantities and inventory levels in the upstream stages compared to their downstream stages (buyers). In this paper, we survey a decade of literature on the bullwhip effect and present the key insights reported by researchers and practitioners. We also present our reflections and share our vision of possible future.
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/a-taxonomy-of-logistics-innovations/
In this paper we present a taxonomy of supply chain and logistics innovations, which is based on an extensive literature survey. Our primary goal is to provide guidelines for choosing the most appropriate innovations for a company, such that the company can outrun its competitors. We investigate the factors, both internal and external to the company, that determine the applicability and effectiveness of the listed innovations. We support our suggestions with real world cases reported in literature.
Supplier and Buyer Driven Channels in a Two-Stage Supply Chainertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/supplier-and-buyer-driven-channels-in-a-two-stage-supply-chain/
We explore the impact of power structure on price, sensitivity of market price, and profits in a two-stage supply chain with single product, supplier and buyer, and a price sensitive market. We develop and analyze the case where the supplier has dominant bargaining power and the case where the buyer has dominant bargaining power. We consider a pricing scheme for the buyer that involves both a multiplier and a markup. We show that it is optimal for the buyer to set the markup to zero and use only a multiplier. We also show that the market price and its sensitivity are higher when operational costs (namely distribution and inventory) exist. We observe that the sensitivity of the market price increases non-linearly as the wholesale price increases, and derive a lower bound for it. Through experimental analysis, we show that marginal impact of increasing shipment cost and carrying charge (interest rate) on prices and profits are decreasing in both cases. Finally, we show that there exist problem instances where the buyer may prefer supplier-driven case to markup-only buyer-driven and similarly problem instances where the supplier may prefer markup-only buyer-driven case to supplier-driven.
Re-mining item associations: Methodology and a case study in apparel retailingGurdal Ertek
Association mining is the conventional data mining technique for analyzing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not been integrated into market basket analysis in earlier studies. This paper proposes a new approach to mine price,
time and domain related attributes through re-mining of association mining results. The underlying factors behind positive and negative relationships can be characterized and described through this second data mining stage. The applicability of the methodology is demonstrated through the analysis of data coming from a large apparel retail chain, and its algorithmic complexity is analyzed in comparison to the existing techniques.
http://research.sabanciuniv.edu.
Re-mining Positive and Negative Association Mining ResultsGurdal Ertek
Positive and negative association mining are well-known and
extensively studied data mining techniques to analyze market basket data. Efficient algorithms exist to find both types of association, separately or simultaneously. Association mining is performed by operating on the transaction data. Despite being an integral part of the transaction data, the pricing and time information has not been incorporated into market basket analysis so far, and additional attributes have been handled
using quantitative association mining. In this paper, a new approach is proposed to incorporate price, time and domain related attributes into data mining by re-mining the association mining results. The underlying factors behind positive and negative relationships, as indicated by the association rules, are characterized and described through the second data
mining stage re-mining. The applicability of the methodology is demonstrated by analyzing data coming from apparel retailing industry, where price markdown is an essential tool for promoting sales and generating increased revenue.
https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-positive-and-negative-association-mining-results/
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...ertekg
İndirmek için Bağlantı > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-association-mining-results-through-visualization-data-envelopment-analysis-and-decision-trees/
Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding how re-mining can be carried out on association mining results, are answered in the case study through empirical analysis.
Re-Mining Association Mining Results through Visualization, Data Envelopment ...Gurdal Ertek
Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding how re-mining can be carried out on association mining results, are answered in the case study through empirical analysis.
http://research.sabanciuniv.edu.
An Ontological Approach for Mining Association Rules from Transactional DatasetIJERA Editor
This document discusses using an ontology relational weights measure (ORWM) approach to mine interesting infrequent item sets from transactional datasets. It introduces three algorithms: 1) Infrequent Weighted Item Set Miner, which mines infrequent item sets using a FP-growth-like approach; 2) Minimal Infrequent Weighted Item Set Miner, which avoids extracting non-minimal item sets; and 3) ORWM, which integrates user knowledge, prunes rules to minimize the number generated, and uses a weighted support measure to discover interesting patterns. The ORWM represents items as a directed graph and applies the HITS model to rank items and discover high authority/hub infrequent item sets of interest to users.
Linking Behavioral Patterns to Personal Attributes through Data Re-MiningGurdal Ertek
A fundamental challenge in behavioral informatics is the development of methodologies and systems that can achieve its goals and tasks, including behavior pattern analysis. This study presents such a methodology, that can be converted
into a decision support system, by the appropriate integration of existing tools for association mining and graph visualization. The methodology enables the linking of behavioral patterns to personal attributes, through the re-mining of colored association graphs that represent item associations. The methodology is described and mathematically formalized, and is demonstrated in a case study related with retail industry.
http://research.sabanciuniv.edu.
This document discusses how data mining is used in the retail industry to gain insights about customers from large datasets. It explains that data mining can help retailers identify high-value customers, determine which new products customers may be interested in, and enable better decision making. Specific techniques discussed include market basket analysis to find common purchasing patterns, association rule mining to link frequently bought item combinations, and k-means clustering to organize customers into groups. The goal of these applications is to support customer relationship management and improve business strategies.
IRJET- Minning Frequent Patterns,Associations and CorrelationsIRJET Journal
This document discusses mining frequent patterns, associations, and correlations from data. It begins by defining frequent patterns as patterns that occur often in a dataset. It then discusses market basket analysis and how it is used to find associations between frequently purchased items. The document outlines key concepts for mining patterns including support, confidence, and association rules. It also discusses different types of patterns that can be mined such as closed, maximal and approximate patterns. Finally, it provides an overview of the different methodologies used for pattern mining and applications.
A Novel preprocessing Algorithm for Frequent Pattern Mining in MultidatasetsWaqas Tariq
In many database applications, information stored in a database has a built-in hierarchy consisting of multiple levels of concepts. In such a database users may want to find out association rules among items only at the same levels. This task is called multiple-level association rule mining. However, mining frequent patterns at multiple levels may lead to the discovery of more specific and concrete knowledge from data. Initial step to find frequent pattern is to preprocess the multidataset to find the large 1 frequent pattern for all levels. In this research paper, we introduce a new algorithm, called CCB-tree i.e., Category-Content-Brand tree is developed to mine Large 1 Frequent pattern for all levels of abstraction. The proposed algorithm is a tree based structure and it first constructs the tree in CCB order for entire database and second, it searches for frequent pattern in CCB order. This method is using concept of reduced support and it reduces the time complexity.
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...ITIIIndustries
This paper presents a new approach based on genetic algorithms (GAs) to generate maximal frequent itemsets (MFIs) from large datasets. This new algorithm, GeneticMax, is heuristic which mimics natural selection approaches for finding MFIs in an efficient way. The search strategy of this algorithm uses a lexicographic tree that avoids level by level searching which reduces the time required to mine the MFIs in a linear way. Our implementation of the search strategy includes bitmap representation of the nodes in a lexicographic tree and identifying frequent itemsets (FIs) from superset-subset relationships of nodes. This new algorithm uses the principles of GAs to perform global searches. The time complexity is less than many of the other algorithms since it uses a non-deterministic approach. We separate the effect of each step of this algorithm by experimental analysis on real datasets such as Tic-Tac-Toe, Zoo, and a 10000×8 dataset. Our experimental results showed that this approach is efficient and scalable for different sizes of itemsets. It accesses a major dataset to calculate a support value for fewer number of nodes to find the FIs even when the search space is very large, dramatically reducing the search time. The proposed algorithm shows how evolutionary method can be used on real datasets to find all the MFIs in an efficient way.
In this paper we present a taxonomy of supply chain and logistics innovations, which is based on an extensive literature survey. Our primary goal is to provide guidelines for choosing the most appropriate innovations for a company, such that the
company can outrun its competitors. We investigate the factors, both internal and external to the company, that determine the applicability and effectiveness of the listed innovations. We support our suggestions with real world cases reported in literature.
http://research.sabanciuniv.edu.
Data Mining For Supermarket Sale Analysis Using Association Ruleijtsrd
Data mining is the novel technology of discovering the important information from the data repository which is widely used in almost all fields Recently, mining of databases is very essential because of growing amount of data due to its wide applicability in retail industries in improving marketing strategies. Analysis of past transaction data can provide very valuable information on customer behavior and business decisions. The amount of data stored grows twice as fast as the speed of the fastest processor available to analyze it.Its main purpose is to find the association relationship among the large number of database items. It is used to describe the patterns of customers purchase in the supermarket. This is presented in this paper. Rajeshri Shelke"Data Mining For Supermarket Sale Analysis Using Association Rule" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-4 , June 2017, URL: http://www.ijtsrd.com/papers/ijtsrd94.pdf http://www.ijtsrd.com/engineering/computer-engineering/94/data-mining-for-supermarket-sale-analysis-using-association-rule/rajeshri-shelke
The development of data mining is inseparable from the recent developments in information technology that enables the accumulation of large amounts of data. For example, a shopping mall that records every sales transaction of goods using various POS (point of sales). Database data from these sales could reach a large storage capacity, even more being added each day, especially when the shopping center will develop into a nationwide network. The development of the internet at the moment also has a share large enough in the accumulation of data occurs. But the rapid growth of data accumulation it has created conditions that are often referred to as "data rich but information poor" because the data collected can not be used optimally for useful applications. Not infrequently the data set was left just seemed to be a "grave data". There are several techniques used in data mining which includes association, classification, and clustering. In this paper, the author will do a comparison between the performance of the technical classification methods naïve Bayes and C4.5 algorithms.
Actionable Insights Through Association Mining of Exchange Rates: A Case Studyertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/actionable-insights-through-association-mining-of-exchange-rates-a-case-study/
Association mining is the methodology within data mining that researches associations among the elements of a given set, based on how they appear together in multiple subsets of that set. Extensive literature exists on the development of efficient algorithms for association mining computations, and the fundamental motivation for this literature is that association mining reveals actionable insights and enables better policies. This motivation is proven valid for domains such as retailing, healthcare and software engineering, where elements of the analyzed set are physical or virtual items that appear in transactions. However, the literature does not prove this motivation for databases where items are “derived items”, rather than actual items. This study investigates the association patterns in changes of exchange rates of US Dollar, Euro and Gold in the Turkish economy, by representing the percentage changes as “derived items” that appear in “derived market baskets”, the day on which the observations are made. The study is one of the few in literature that applies such a mapping and applies association mining in exchange rate analysis, and the first one that considers the Turkish case. Actionable insights, along with their policy implications, demonstrate the usability of the developed analysis approach.
Top Down Approach to find Maximal Frequent Item Sets using Subset Creationcscpconf
Association rule has been an area of active research in the field of knowledge discovery. Data
mining researchers had improved upon the quality of association rule mining for business
development by incorporating influential factors like value (utility), quantity of items sold
(weight) and more for the mining of association patterns. In this paper, we propose an efficient
approach to find maximal frequent item set first. Most of the algorithms in literature used to find
minimal frequent item first, then with the help of minimal frequent item sets derive the maximal
frequent item sets. These methods consume more time to find maximal frequent item sets. To
overcome this problem, we propose a navel approach to find maximal frequent item set directly using the concepts of subsets. The proposed method is found to be efficient in finding maximal frequent item sets.
This document summarizes an article that proposes a new algorithm for efficiently mining both positive and negative association rules from transactional databases. The algorithm first constructs a frequent pattern tree (FP-tree) to store the transaction information. It then uses an FP-growth approach to iteratively find frequent patterns and generate the positive and negative association rules without candidate generation. The algorithm aims to overcome limitations of previous methods and efficiently find all valid comparative association rules.
This document discusses a hybrid technique for associative classification. It begins with an introduction to data mining processes like classification and association rule mining. The author then discusses the motivation and objectives of developing a framework to generate classification association rules more efficiently. The proposed methodology involves reviewing existing models, implementing a classification system using association rules in Weka, and comparing the performance to other methods. The facilities required are data mining tools like Weka. Finally, the document provides references that were consulted in the literature survey on associative classification and related techniques.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A literature review of modern association rule mining techniquesijctet
This document discusses association rule mining techniques for extracting useful patterns from large datasets. It provides background on association rule mining and defines key concepts like support, confidence and frequent itemsets. The document then reviews several classic association rule mining algorithms like AIS, Apriori and FP-Growth. It explains that these algorithms aim to improve quality and efficiency by reducing database scans, generating fewer candidate itemsets and using pruning techniques.
Usage and Research Challenges in the Area of Frequent Pattern in Data MiningIOSR Journals
This document discusses the usage of frequent patterns in data mining, including for association mining, classification, and clustering. It provides background on foundational approaches for mining associations using frequent patterns, such as the Apriori, FP-growth, and ECLAT algorithms. It also discusses how frequent patterns have been used for classification tasks, such as generating discriminative features for training classifiers. Finally, it covers various ways frequent patterns have been applied to clustering problems, such as using frequent itemsets to represent and group documents for clustering. The document provides an overview of the state-of-the-art in applying frequent pattern mining across different data mining applications.
Similar to Re-Mining Item Associations: Methodology and a Case Study in Apparel Retailing (20)
Optimizing the electric charge station network of EŞARJertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/optimizing-the-electric-charge-station-network-of-esarj/
In this study, we adopt the classic capacitated p-median location model for the solution of a network design problem, in the domain of electric charge station network design, for a leading company in Turkey. Our model encompasses the location preferences of the company managers as preference scores incorporated into the objective function. Our model also incorporates the capacity concerns of the managers through constraints on maximum number of districts and maximum population that can be served from a location. The model optimally selects the new station locations and the visualization of model results provides additional insights.
Competitiveness of Top 100 U.S. Universities: A Benchmark Study Using Data En...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/benchmark-study-using-data-envelopment-analysis/
This study presents a comprehensive benchmarking study of the top 100 U.S. Universities. The methodologies used to come up with insights into the domain are Data Envelopment Analysis (DEA) and information visualization. Various approaches to evaluating academic institutions have appeared in the literature, including a DEA literature dealing with the ranking of universities. Our study contributes to this literature by the extensive incorporation of information visualization and subsequently the discovery of new insights.
Industrial Benchmarking through Information Visualization and Data Envelopmen...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/industrial-benchmarking-through-information-visualization-and-data-envelopment-analysis-a-new-framework/
We present a benchmarking study on the companies in the Turkish food industry based on their financial data. Our aim is to develop a comprehensive benchmarking framework using Data Envelopment Analysis (DEA) and information visualization. Besides DEA, a traditional tool for financial benchmarking based on financial ratios is also incorporated. The consistency/inconsistency between the two methodologies is investigated using information visualization tools. In addition, k-means clustering, a fundamental method from machine learning, is applied to understand the relationship between k-means clustering and DEA.
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/modelling-the-supply-chain-perception-gaps/
This study applies the research of perception gap analysis to supply chain integration and develops a generic model, the 3-Level Gaps Model, with the goal of contributing to harmonization and integration in the supply chain. The model suggests that significant perception gaps may exist among supply chain members with regards to the importance of different performance criteria. The concept of the model is conceived through an empirical and inductive approach, combining the research discipline of supply chain relationship and perception gap analysis. First hand data has been collected through a survey across a key buyer in the motor insurance industry and its eight suppliers. Rigorous statistical analysis testified the research hypotheses, which in turn verified the validity and relevance of the developed 3-Level Gaps Model. The research reveals the significant existence of supply chain perception gaps at all three levels as defined, which could be the root-causes to underperformed supply chain.
Risk Factors and Identifiers for Alzheimer’s Disease: A Data Mining Analysisertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/risk-factors-and-identifiers-for-alzheimers-disease-a-data-mining-analysis/
The topic of this paper is the Alzheimer’s Disease (AD), with the goal being the analysis of risk factors and identifying tests that can help diagnose AD. While there exists multiple studies that analyze the factors that can help diagnose or predict AD, this is the first study that considers only non-image data, while using a multitude of techniques from machine learning and data mining. The applied methods include classification tree analysis, cluster analysis, data visualization, and classification analysis. All the analysis, except classification analysis, resulted in insights that eventually lead to the construction of a risk table for AD. The study contributes to the literature not only with new insights, but also by demonstrating a framework for analysis of such data. The insights obtained in this study can be used by individuals and health professionals to assess possible risks, and take preventive measures.
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/text-mining-with-rapidminer/
The goal of this chapter is to introduce the text mining capabilities of RAPIDMINER through a use case. The use case involves mining reviews for hotels at TripAdvisor.com, a popular web portal. We will be demonstrating basic text mining in RAPIDMINER using the text mining extension. We will present two different RAPIDMINER processes, namely Process01 andProcess02, which respectively describe how text mining can be combined with association mining and cluster modeling. While it is possible to construct each of these processes from scratch by inserting the appropriate operators into the process view, we will instead import these two processes readily from existing model files. Throughout the chapter, we will at times deliberately instruct the reader to take erroneous steps that result in undesired outcomes. We believe that this is a very realistic way of learning to use RAPIDMINER, since in practice, the modeling process frequently involves such steps that are later corrected.
Competitive Pattern-Based Strategies under Complexity: The Case of Turkish Ma...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/competitive-pattern-based-strategies-under-complexity-the-case-of-turkish-managers/
This paper aims to augment current Enterprise Architecture (EA) frameworks to become pattern-based. The main motivation behind pattern-based EA is the support for strategic decisions based on the patterns prioritized in a country or industry. Thus, to validate the need for pattern-based EA, it is essential to show how different patterns gain priority under different contexts, such as industries. To this end, this chapter also reveals the value of alternative managerial strategies across different industries and business functions in a specific market, namely Turkey. Value perceptions for alternative managerial strategies were collected via survey, and the values for strategies were analyzed through the rigorous application of statistical techniques. Then, evidence was searched and obtained from business literature that support or refute the statistically-supported hypothesis. The results obtained through statistical analysis are typically confirmed with reports of real world cases in the business literature. Results suggest that Turkish firms differ significantly in the way they value different managerial strategies. There also exist differences based on industries and business functions. Our study provides guidelines to managers in Turkey, an emerging country, on which strategies are valued most in their industries. This way, managers can have a better understanding of their competitors and business environment, and can develop the appropriate pattern-based EA to cope with complexity and succeed in the market.
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/a-tutorial-on-crossdocking/
In crossdocking, the inbound materials coming in trucks to the crossdock facility are directed to outbound doors and are directly loaded into trucks that will perform shipment, or are staged for a very brief time period before loading. Crossdocking has a great potential to bring savings in logistics: For example, most of the logistics success of Wal-Mart, the world’s leading retailer, is attributed to crossdocking.In this paper,the types of crossdocking are identified, the situations and industries where crossdocking is applicable are explained, prerequisites, advantages and drawbacks are listed, and implementation issues are discussed. Finally a case study that describes the crossdocking applications of a 3rd party logistics firm is presented.
Demonstrating Warehousing Concepts Through Interactive Animationsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/demonstrating-warehousing-concepts-through-interactive-animations/
In this paper, we report development of interactive computer animations to demonstrate warehousing concepts, providing a virtual environment for learning. Almost every company, regardless of its industry, holds inventory of goods in its warehouse(s) to respond to customer demand promptly, to coordinate supply and demand, to realize economies of scale in manufacturing or processing, to add value to its products and to reduce response time. Design, analysis, and improvement of warehouse operations can yield significant savings for a company. Warehousing science can be considered as an important field within the industrial engineering discipline. However, there is very little educational material (including web based media), and only a handful of books available in this field. We believe that the animations that we developed will significantly contribute to the understanding of warehousing concepts, and enable tomorrow’s practitioners to grasp the fundamentals of managing warehouses.
A Framework for Visualizing Association Mining Resultsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/a-framework-for-visualizing-association-mining-results/
Association mining is one of the most used data mining techniques due to interpretable and actionable results. In this study we pro-pose a framework to visualize the association mining results, speci¯cally frequent itemsets and association rules, as graphs. We demonstrate the applicability and usefulness of our approach through a Market Basket Analysis (MBA) case study where we visually explore the data mining results for a supermarket data set. In this case study we derive several
interesting insights regarding the relationships among the items and sug-gest how they can be used as basis for decision making in retailing.
Application of the Cutting Stock Problem to a Construction Company: A Case Studyertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/application-of-the-cutting-stock-problem-to-a-construction-company-a-case-study/
This paper presents an application of the well-known cutting stock problem to a construction firm. The goal of the 1Dimensional (1D) cutting stock problem is to cut the bars of desired lengths in required quantities from longer bars of given length. The company for which we carried out this study encounters 1D cutting stock problem in cutting steel bars (reinforcement bars) for its construction projects. We have developed several solution approaches to solving the company’s problem: Building and solving an integer programming (IP) model in a modeling environment, developing our own software that uses a mixed integer programming (MIP) software library, and testing some of the commercial software packages available on the internet. In this paper, we summarize our experiences with all the three approaches. We also present a benchmark of existing commercial software packages, and some critical insights. Finally, we suggest a visual approach for increasing performance in solving the cutting stock problem and demonstrate the applicability of this approach using the company’s data on two construction projects.
Benchmarking The Turkish Apparel Retail Industry Through Data Envelopment Ana...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/benchmarking-the-turkish-apparel-retail-industry-through-data-envelopment-analysis-dea-and-data-visualization/
This paper presents a benchmarking study of the Turkish apparel retailing industry. We have applied the Data Envelopment Analysis (DEA) methodology to determine the efficiencies of the companies in the industry. In the DEA model the number of stores, number of corners, total sales area and number of employees were included as inputs and annual sales revenue was included as the output. The efficiency scores obtained through DEA were visualized for gaining insights about the industry and revealing guidelines that can aid in strategic decision making.
An Open Source Java Code For Visualizing Supply Chain Problemsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/an-open-source-java-code-for-visualizing-supply-chain-problems/
In this paper, we decribe an open source Java class library for visualizing supply chain problems within a geographical context. The highly competitive markets and recent technological advances make the use of such supply chain network visualizations critical in both strategic and tactical levels. The most important characteristic of our work is its easy integration with any Java application. Our software differs from any other commercial and open source supply chain visualization tool by its simple structure, easy adoption and implementation and high compatibility. The main motivation of our study was to develop a simple – yet effective – library that would not require to learn and apply complicated visualization tools and data structures such as Geographical Information Systems (GIS). In this study, we illustrate the use of our visualization tool through maps of Turkey, Europe, North and South America, the United States and the NAFTA. We believe that ease of visualization offered by our open source tool will contribute to a multitude of projects in supply chain design, as well as increasing productive communication among practitioners, especially involved in strategic level decision making processes. We foresee that our supply chain visualization tool will fill a gap in this area with its simple but effective structure.
This paper discusses fundamental issues in dairy logistics in a tutorial format. We summarize findings of more than twenty student groups who carried out independent literature surveys and interviewed professionals in the industry. The critical issues in carrying out dairy products logistics, the logistics strategies that are employed by dairy producers in the world and some newly introduced products in the industry and in what ways the introduction of these new products changes the logistics operations are pointed out. The importance of hygiene, cooling, time, humidity, cost, distance, flexibility and meeting the demand is emphasized under the subtitle of critical issues. Except those critical issues, there are some others like short shelf life, quality, emulsion, pasteurization, UHT which depend on the characteristics of the milk and milk products. Logistics strategies in dairy industry are studied by dividing it into two subtitles: the ones that are used in the world and the ones in Turkey. A benchmarking between Turkey and the world is also included at the end. As the variety of milk and milk products increase day by day, the new ingredients of new products also affects the transportation plans. Those impacts are also discussed as a part of our paper. Some descriptive drawings and figures are also embodied. Throughout this paper, only the production, warehousing and transportation of milk, cheese, yoghurt, and similar dairy products are discussed. Ice-cream especially is set out of the scope as it completely differs from actual dairy products as milk, cheese and yoghurt in the means of production and distribution.
Innovation in Product Form And Function: Customer Perception Of Their Valueertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/innovation-in-product-form-and-function-customer-perception-of-their-value/
The goal of product design is to obtain the maximum effect with minimum cost in functionality and aesthetic beauty. Consumers are attracted to the designs that reflect their use behaviors and psychological responses more than they are to the simple visual representations. When product functions and qualities are similar across products, customers make their purchasing decision upon aesthetic form. Form presents a significant competitive factor that improves the value of a product. Overall, the purpose of this study is to examine the most important product design factors that affect the market share trends of mobile phone companies. Study uses product characteristics for 1,028 mobile phones released between 2003 and 2008 as a case study. The multiple linear regression analysis is used to select highly correlated variables that influence the market share, and Mallow's Cp method is used to determine the best-fitting model. The Partial Regression Coefficients are used to evaluate the relative importance of design criteria. The nine mobile phone design features that affect the market share were identified, and the block form style is determined as the most important design factor. Using these approaches, this study demonstrates how investments should be directed in the next mobile phone design process.
Developing Competitive Strategies in Higher Education through Visual Data Miningertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/visual-data-mining-for-developing-competitive-strategies-in-higher-education/
Information visualization is the growing field of computer science that aims at visually mining data for knowledge discovery. In this paper, a data mining framework and a novel information visualization scheme is developed and applied to the domain of higher education. The presented framework consists of three main types of visual data analysis: Discovering general insights, carrying out competitive benchmarking, and planning for High School Relationship Management (HSRM). In this paper the framework and the square tiles visualization scheme are described and an application at a private university in Turkey with the goal of attracting bright-est students is demonstrated.
Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...my Pandit
Dive into the steadfast world of the Taurus Zodiac Sign. Discover the grounded, stable, and logical nature of Taurus individuals, and explore their key personality traits, important dates, and horoscope insights. Learn how the determination and patience of the Taurus sign make them the rock-steady achievers and anchors of the zodiac.
[To download this presentation, visit:
https://www.oeconsulting.com.sg/training-presentations]
This presentation is a curated compilation of PowerPoint diagrams and templates designed to illustrate 20 different digital transformation frameworks and models. These frameworks are based on recent industry trends and best practices, ensuring that the content remains relevant and up-to-date.
Key highlights include Microsoft's Digital Transformation Framework, which focuses on driving innovation and efficiency, and McKinsey's Ten Guiding Principles, which provide strategic insights for successful digital transformation. Additionally, Forrester's framework emphasizes enhancing customer experiences and modernizing IT infrastructure, while IDC's MaturityScape helps assess and develop organizational digital maturity. MIT's framework explores cutting-edge strategies for achieving digital success.
These materials are perfect for enhancing your business or classroom presentations, offering visual aids to supplement your insights. Please note that while comprehensive, these slides are intended as supplementary resources and may not be complete for standalone instructional purposes.
Frameworks/Models included:
Microsoft’s Digital Transformation Framework
McKinsey’s Ten Guiding Principles of Digital Transformation
Forrester’s Digital Transformation Framework
IDC’s Digital Transformation MaturityScape
MIT’s Digital Transformation Framework
Gartner’s Digital Transformation Framework
Accenture’s Digital Strategy & Enterprise Frameworks
Deloitte’s Digital Industrial Transformation Framework
Capgemini’s Digital Transformation Framework
PwC’s Digital Transformation Framework
Cisco’s Digital Transformation Framework
Cognizant’s Digital Transformation Framework
DXC Technology’s Digital Transformation Framework
The BCG Strategy Palette
McKinsey’s Digital Transformation Framework
Digital Transformation Compass
Four Levels of Digital Maturity
Design Thinking Framework
Business Model Canvas
Customer Journey Map
3 Simple Steps To Buy Verified Payoneer Account In 2024SEOSMMEARTH
Buy Verified Payoneer Account: Quick and Secure Way to Receive Payments
Buy Verified Payoneer Account With 100% secure documents, [ USA, UK, CA ]. Are you looking for a reliable and safe way to receive payments online? Then you need buy verified Payoneer account ! Payoneer is a global payment platform that allows businesses and individuals to send and receive money in over 200 countries.
If You Want To More Information just Contact Now:
Skype: SEOSMMEARTH
Telegram: @seosmmearth
Gmail: seosmmearth@gmail.com
SATTA MATKA SATTA FAST RESULT KALYAN TOP MATKA RESULT KALYAN SATTA MATKA FAST RESULT MILAN RATAN RAJDHANI MAIN BAZAR MATKA FAST TIPS RESULT MATKA CHART JODI CHART PANEL CHART FREE FIX GAME SATTAMATKA ! MATKA MOBI SATTA 143 spboss.in TOP NO1 RESULT FULL RATE MATKA ONLINE GAME PLAY BY APP SPBOSS
Building Your Employer Brand with Social MediaLuanWise
Presented at The Global HR Summit, 6th June 2024
In this keynote, Luan Wise will provide invaluable insights to elevate your employer brand on social media platforms including LinkedIn, Facebook, Instagram, X (formerly Twitter) and TikTok. You'll learn how compelling content can authentically showcase your company culture, values, and employee experiences to support your talent acquisition and retention objectives. Additionally, you'll understand the power of employee advocacy to amplify reach and engagement – helping to position your organization as an employer of choice in today's competitive talent landscape.
Call8328958814 satta matka Kalyan result satta guessing➑➌➋➑➒➎➑➑➊➍
Satta Matka Kalyan Main Mumbai Fastest Results
Satta Matka ❋ Sattamatka ❋ New Mumbai Ratan Satta Matka ❋ Fast Matka ❋ Milan Market ❋ Kalyan Matka Results ❋ Satta Game ❋ Matka Game ❋ Satta Matka ❋ Kalyan Satta Matka ❋ Mumbai Main ❋ Online Matka Results ❋ Satta Matka Tips ❋ Milan Chart ❋ Satta Matka Boss❋ New Star Day ❋ Satta King ❋ Live Satta Matka Results ❋ Satta Matka Company ❋ Indian Matka ❋ Satta Matka 143❋ Kalyan Night Matka..
The Genesis of BriansClub.cm Famous Dark WEb PlatformSabaaSudozai
BriansClub.cm, a famous platform on the dark web, has become one of the most infamous carding marketplaces, specializing in the sale of stolen credit card data.
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Final ank Satta Matka Dpbos Final ank Satta Matta Matka 143 Kalyan Matka Guessing Final Matka Final ank Today Matka 420 Satta Batta Satta 143 Kalyan Chart Main Bazar Chart vip Matka Guessing Dpboss 143 Guessing Kalyan night
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challengesHolger Mueller
Holger Mueller of Constellation Research shares his key takeaways from SAP's Sapphire confernece, held in Orlando, June 3rd till 5th 2024, in the Orange Convention Center.
Starting a business is like embarking on an unpredictable adventure. It’s a journey filled with highs and lows, victories and defeats. But what if I told you that those setbacks and failures could be the very stepping stones that lead you to fortune? Let’s explore how resilience, adaptability, and strategic thinking can transform adversity into opportunity.
Re-Mining Item Associations: Methodology and a Case Study in Apparel Retailing
1. Demiriz, A., Ertek, G., Kula, U., Atan, T. (2011). “Re-Mining Item Associations: Methodology and a
Case Study in Apparel Retailing”. Decision Support Systems, 52,pp. 284–293.
Note: This is the final draft version of this paper. Please cite this paper (or this final draft) as
above. You can download this final draft from http://research.sabanciuniv.edu.
Re-mining item associations: Methodology and a case study in
apparel retailing
Ayhan Demiriz a , Gürdal Ertek b, Tankut Atan c, Ufuk Kula a
a Sakarya University, Sakarya, Turkey
b Sabancı University, Istanbul, Turkey
c Işık University, Istanbul, Turkey
2. Re-Mining Item Associations: Methodology and a Case
Study in Apparel Retailing
Ayhan Demiriza,∗, G¨rdal Ertekb , Tankut Atanc , Ufuk Kulaa
u
a
Sakarya University, Sakarya, Turkey
b
Sabancı University, Istanbul, Turkey
c
I¸ık University, Istanbul, Turkey
s
Abstract
Association mining is the conventional data mining technique for analyz-
ing market basket data and it reveals the positive and negative associations
between items. While being an integral part of transaction data, pricing
and time information have not been integrated into market basket analy-
sis in earlier studies. This paper proposes a new approach to mine price,
time and domain related attributes through re-mining of association mining
results. The underlying factors behind positive and negative relationships
can be characterized and described through this second data mining stage.
The applicability of the methodology is demonstrated through the analy-
sis of data coming from a large apparel retail chain, and its algorithmic
complexity is analyzed in comparison to the existing techniques.
Keywords: Data Mining, Association Mining, Negative Association,
Apparel Retailing, Inductive Decision Trees, Retail Data
∗
Corresponding author
Email addresses: ademiriz@gmail.com (Ayhan Demiriz), ertekg@sabanciuniv.edu
(G¨rdal Ertek), tatan@isikun.edu.tr (Tankut Atan), ufukkula@gmail.com (Ufuk
u
Kula)
Preprint submitted to Decision Support Systems November 13, 2012
3. 1. Introduction
Association mining is a data mining technique which generates rules
in the form of X ⇒ Y , where X and Y are two non-overlapping discrete
sets. A rule is considered as significant if it is satisfied by at least a cer-
tain percentage of cases (minimum support) and its confidence is above a
certain threshold (minimum confidence). Conventional association mining
considers “positive” relations in the form of X ⇒ Y . However, negative
associations in the form of X ⇒ ¬Y , where ¬Y represents the negation
(absence) of Y , can also be discovered through association mining.
Recent research has positioned association mining as one of the most
popular tools in retail analytics [7]. Association mining primarily generates
positive association rules that reveal complementary effects, suggesting that
the purchase of an item can generate sales of other items. Yet, association
mining can also be used to reveal substitution effects, where substitution
means that a product is purchased instead of another one. Although posi-
tive associations have traditionally been an integral part of retail analytics,
negative associations have not.
Numerous algorithms have been introduced to find positive and nega-
tive associations, following the pioneering work of Agrawal et al. [3]. Market
basket analysis is considered as a motivation, and is used as a test bed for
these algorithms. Price data are readily available within the market bas-
ket data and one would expect to observe its usage in various applications.
Conceptually quantitative association mining (QAM) [14, 22] can handle
pricing data and other attribute data. However, pricing data have not been
utilized before as a quantitative attribute in quantitative association mining
except in [9, 18]. Korn et al. [18] explore a solution with the help of singular
2
4. value decomposition (SVD) and Demiriz et al. [9] extend the results from
SVD to find item associations depending on SVD rule similarities. Quan-
titative association mining is not the only choice for analyzing attribute
data within existing frameworks. Multidimensional association mining [14]
is methodology that can be adapted in analyzing such data. The complexity
of association mining increases with the usage of additional attribute data,
which may include both categorical and quantitative attributes in addition
to the transaction data [4]. Even worse, the attribute data might be denser
compared to transaction data.
The main contribution of this paper is a practical and effective method-
ology that efficiently enables the incorporation of attribute data (e.g. price,
category, sales timeline) in explaining positive and negative item associa-
tions, which respectively indicate the complementarity and substitution ef-
fects. To the best of our knowledge, there exists no methodological research
in the data mining or information science literature that enables such a
multi-faceted analysis to be executed efficiently and is proven on real world
data. The core of the proposed methodology is a new secondary data mining
process to discover new insights regarding positive and negative associations.
As a novel and broadly applicable concept, we introduce and define data
re-mining as the mining of a newly formed data that is constructed upon
the results of an original data mining process. The newly formed data will
contain additional attributes joined with the original data mining results.
In the case study presented in this paper, these attributes are related to
price, item, domain and time. The methodology combines pricing as well as
other information with the original association mining results through a new
mining process, generating new rules to characterize, describe and explain
the underlying factors behind positive and negative associations. Re-mining
3
5. is fundamentally different from post-mining: post-mining only summarizes
the data mining results, such as visualizing the association mining results
[11]. The re-mining methodology extends and generalizes post-mining.
Our work contributes to the field of data mining in four ways:
1. We introduce a new data mining concept and its associated process,
named as Re-Mining, which enables an elaborate analysis of both pos-
itive and negative associations for discovering the factors and explain-
ing the reasons for such associations.
2. We enable the efficient inclusion of price data into the mining process,
in addition to other attributes of the items and the application domain.
3. We illustrate that the proposed methodology is applicable to real world
data, through a case study in apparel retailing industry.
4. Different ways of re-mining, namely exploratory, descriptive and pre-
dictive re-mining, are applied to real world data.
The re-mining framework was first introduced in [10]. This paper elab-
orates on the concept, introduces mathematical formalism and presents the
algorithm for the methodology. The work in this paper also extends the ap-
plication of predictive re-mining in addition to exploratory and descriptive
re-mining, and presents a complexity analysis.
The remainder of the paper is organized as follows. In Section 2, an
overview of the basic concepts in related studies is presented through a
concise literature review. In Section 3, Re-Mining is motivated, defined,
and framed. The methodology is put into use with apparel retail data in
Section 4 and its applicability is demonstrated. In Section 5, the limitations
of quantitative association mining (QAM) are illustrated with regards to
the retail data used in this paper. In Section 6, the complexity of re-mining
4
6. is compared to QAM, illustrating its significant advantage against QAM.
Finally, Section 7 summarizes the study and discusses future directions.
2. Related Literature
One of the most common applications of association mining is literally
market basket analysis (MBA), which can be used in product recommenda-
tion systems [8]. Following the notation used in [14], let I = {I1 , I2 , ..., Im }
be a set of items considered in MBA. Then each transaction (basket) T
will consist of a set of items where T ⊆ I. Each item in transaction T
will have a corresponding price p, which might change from time to time
i.e. p is not necessarily constant over the time span. Let X and Y be two
non-overlapping sets of items, where X ⊂ I and Y ⊂ I, contained in some
transactions. An association rule is an implication of the following form
X ⇒ Y . Assume D is the set of all transactions, then the support (s) of the
rule is defined as the percentage of the transactions in D that contain both
X and Y i.e. the itemset X ∪ Y . Recall that X ∩ Y = ∅. The confidence (c)
of the rule X ⇒ Y is defined as the percentage of transactions within D con-
taining X, that also contain Y . In other words, confidence is the percentage
of transactions containing Y , given that those transactions already contain
X. Notice that this is equivalent to the definition of conditional probability.
Quantitative and multi-dimensional association mining are well-known
techniques [14] that can integrate attribute data into the association min-
ing process, where the associations among these attributes are also found.
However, these techniques introduce significant additional complexity, since
association mining is carried out with the complete set of attributes rather
than just the market basket data. In the case of QAM, quantitative at-
5
7. tributes are transformed into categorical attributes through discretization,
transforming the problem into multi-dimensional association mining with
only categorical attributes. This is an NP-Complete problem as shown by
Angiulli et al. [4], meaning that the running time increases exponentially as
the number of additional attributes increases linearly.
Multi-dimensional association mining works directly towards the genera-
tion of multi-dimensional rules. It relates all the possible categorical values
of all the attributes to each other. Re-mining, on the other hand, expands
single dimensional rules with additional attributes. In re-mining, attribute
values are investigated and computed only for the positively and negatively
associated item pairs, with much less computational complexity that can be
solved in polynomial running time.
2.1. Negative Association Mining
In research and practice, association mining commonly refers to posi-
tive association mining. Since positive association mining has been studied
extensively, only (some of) the approaches for finding negative associations
are reviewed. One innovative approach [21] utilizes the domain knowledge
of item hierarchy (taxonomy), and seeks negative association between items
in a pairwise way. Authors in [21] propose the rule interestingness measure
(RI) based on the difference between expected support and actual support:
E[s(XY )]−s(XY )
RI = s(X) . A minimum threshold is specified for RI for the candi-
date negative itemsets, besides the minimum support threshold. Depending
on the taxonomy (e.g. Figure 1(a)) and the frequent itemsets, candidate
negative itemsets can be generated. For example, assuming that the itemset
{CG} is frequent in Figure 1(a), the dashed curves represent some of the
candidate negative itemsets.
6
8. Tan et al. [24] find negative associations through indirect associations.
Figure 1(b) depicts such an indirect association {BC} via item A. In Figure
1(b) itemsets {AB} and {AC} are both assumed to be frequent, whereas
the itemset {BC} is not. The itemset {BC} is said to have an indirect
association via the item A and thus is considered as a candidate negative
association. Item A in this case is called as a mediator for the itemset
{BC}. Just like the aforementioned method in [21], indirect association
mining also uses an interestingness measure -dependency in this case- as a
threshold. Indirect mining selects as candidates the frequent itemsets that
have strong dependency with their mediator.
Both methods discussed above are suitable for retail analytics and the
approach in [24] is selected in this study to compute negative associations
due to convenience of implementation.
2.2. Quantitative and Multi-Dimensional Association Mining
The traditional method of incorporating quantitative data into associa-
tion mining is to discretize (categorize) the continuous attributes. An early
work by Srikant and Agrawal [22] proposes such an approach where the con-
tinuous attributes are first partitioned and then treated just like categorical
data. For this, consecutive integer values are assigned to each adjacent par-
tition. In case the quantitative attribute has few distinct values, consecutive
integer values can be assigned to these few values to conserve the ordering
of the data. When there is not enough support for a partition, the adjacent
partitions are merged and the mining process is rerun. [22] emphasizes rules
with quantitative attributes only on the left hand side (antecedent) of the
rules. However, since each partition is treated as if it were categorical, it is
also possible to obtain rules with quantitative attributes on the right hand
7
9. side (consequent) of the rules.
An alternative statistical approach is followed by Aumann and Lindell
[6] for finding association rules with quantitative attributes. The rules found
in [6] can contain statistics (mean, variance and median) of the quantitative
attributes, as in our methodology.
In comparison to the aforementioned approaches, re-mining does not
investigate every combination of attribute values, and is much faster than
QAM. For the sake of completeness, QAM is also carried out and is compared
against re-mining in Section 5.
Korn et al. [18] summarize the expenses made on the items through ratio
rules. An example ratio rule would be “Customers who buy bread:milk:butter
spend 1:2:5 dollars on these items.” This is a potentially useful way of
utilizing the price data for unveiling the hidden relationships among the
items in sales transactions. According to this approach, one can basically
form a price matrix from sales transactions and analyze it via singular value
decomposition (SVD) to find positive and negative associations. Ensuring
the scalability of SVD in finding the ratio rules is a significant research
challenge. Demiriz et al. [9] use ratio rule similarities based on price data
to find both positive and negative item associations.
2.3. Learning Association Rules
Yao et al. [27] propose a framework of a learning classifier to explain the
mined results. However, the described framework considers and interprets
only the positive association rules and requires human intervention for la-
beling the generated rules as interesting or not. The framework in [27] is
the closest work in the literature to the re-mining methodology proposed
here. However re-mining is unique in the sense that it also includes negative
8
10. associations and is suitable for the automated rule discovery to explain the
originally mined results. Meanwhile, unlike in [27], the proposed approach is
applied to a real world dataset as a proof of its applicability, and a through
complexity analysis is carried out.
Finally, based on correlation analysis, Antoine and Za¨ [5] propose an
ıne
algorithm to classify associations as positive and negative. However learning
is only based on correlation data and the scope is limited to labelling the
associations as positive or negative.
3. The Methodology
The proposed re-mining methodology, which transforms the post-mining
step into a new data mining process, is introduced in this section. A math-
ematical formalism is introduced and the methodology is presented in the
form of an algorithm in Figure 2. Re-mining can be considered as an addi-
tional data mining step of Knowledge Discovery in Databases (KDD) process
[12] and can be conducted in explanatory, descriptive, and predictive man-
ners. Re-mining process is defined as “combining the results of an original
data mining process with a new additional set of data and then mining the
newly formed data again”.
The methodology consists of the following five steps (Figure 2):
1. Perform association mining.
2. Sort the items in the 2-itemsets.
3. Label the item associations accordingly and append them as new
records.
4. Expand the records with additional attributes for re-mining.
5. Perform exploratory, descriptive, and predictive re-mining.
9
11. Conceptually, re-mining process can be extended to include many more
repeating steps, since each time a new set of the attributes can be introduced
and a new data mining technique can be utilized. However, in this paper,
the re-mining process is limited to only one additional data mining step. In
theory, the new mining step may involve any appropriate set of data mining
techniques. In this paper decision tree, colored scattered plot, and several
classification algorithms have been utilized.
The results of data mining may potentially be full of surprises, since it
does not require any pre-assumptions (hypotheses) about the data. Making
sense of such large body of results and the pressure to find surprising insights
may require incorporating new attributes and subsequently executing a new
data mining step, as implemented in re-mining. The goal of this re-mining
process is to explain and interpret the results of the original data mining
process in a different context, by generating new rules from the consoli-
dated data. The results of re-mining need not necessarily yield an outcome
in parallel with the original outcome. For example, if the original data min-
ing yields frequent itemsets, re-mining does not necessarily yield frequent
itemsets again.
The main contribution of this paper is the introduction of the re-mining
process to discover new insights regarding the positive and negative asso-
ciations and an extensive demonstration of its applicability. However the
usage of re-mining process is not limited to gaining new insights. One can
potentially employ the re-mining process to bring further insights to the
results of any data mining task.
The rationale behind using the re-mining process is to exploit the domain
specific knowledge in a new analysis step. One can understandably argue
that such background knowledge can be integrated into the original data
10
12. mining process by introducing new attributes. However, there might be
certain cases that adding such information would increase the complexity of
the underlying model [4] and diminish the strength of the algorithm. To be
more specific, it might be necessary to find attribute associations when the
item associations are also present, which requires constraint-based mining
[20]. Re-mining may help with grasping the causality effects that exist in
the data as well, since the input of the causality models may be an outcome
of the another data mining process.
4. Case Study
In this section, the applicability of the re-mining process is demonstrated
through a real world case study that involves the store level retail sales
data originating from an apparel retail chain. In the case study, there was
access to complete sales, stock and transshipment data belonging to a single
merchandise group (men’s clothes line) coming from all the stores of the
retail chain (over 200 of them across the country) for the 2007 summer
season. A detailed description of the retail data can be found in [10].
In this section, the application of the re-mining methodology on the
retail dataset is presented. Re-mining reveals the profound effect of pricing
on item associations and frequent itemsets, and provides insights that the
retail company can act upon.
4.1. Conventional Association Mining
As the first step of the re-mining methodology, conventional association
mining has been conducted. Apriori algorithm was run to generate the fre-
quent itemsets with a minimum support count of 100. All the 600 items were
found to be frequent. In re-mining, only the frequent 2-itemsets have been
11
13. investigated and 3930 such pairs have been found. Thus frequent itemsets
were used in the analysis, rather than association rules. For the illustration
purposes, the top-5 frequent pairs can be found in Table 1 of [10].
The frequent pairs were then used in finding the negatively related pairs
via indirect association mining. Negative relation is an indicator of product
substitution. Implementing indirect association mining resulted in 5,386
negative itemsets, including the mediator items. These itemsets were re-
duced to a set of 2,433 unique item pairs when the mediators were removed.
This indeed shows that a considerable portion of the item pairs in the dataset
are negatively related via more than one mediator item.
4.2. Re-Mining the Expanded Data
Following the execution of conventional positive and negative association
mining, a new data set E ∗ (see Figure 2) was formed from the item pairs and
their additional attributes A∗ for performing exploratory, descriptive and
predictive re-mining. Supervised classification methods require the input
data in table format, in which one of the attributes is the class label. The
type of association, positive (‘+’) or negative (‘-’), was selected as the class
label in the analysis.
An item pair can generate two distinct rows for the learning set - e.g.
pairs AB and BA, but this representation ultimately yields degenerate rules
out of learning process. One way of representing the pair data is to order
(rank) items in the pair according to a sort criterion, which is referred to
as sort attr in the re-mining algorithm (Figure 2). In the case study, sort
attribute was selected as the price, which marks the items as higher and
lower priced items, respectively.
For computing price-related statistics (means and standard deviations)
12
14. a price-matrix was formed out of the transaction data D. The price-matrix
resembles the full-matrix format of the transaction data with the item’s price
replacing the value of 1 in the full-matrix. The price-matrix was normal-
ized by dividing each column by its maximum value, enabling comparable
statistics. The value 0 in the price-matrix means that the item is not sold in
that transaction. The full price-matrix has the dimensions 2, 753, 260 × 600.
Besides price related statistics such as minimum, maximum, average and
standard deviations of item prices (MinPriceH, MaxPriceH, MinPriceL, Max-
PriceL, AvgPriceH H1 L0, . . ., StdDevPriceH H1 L0, . . .), attributes related
with time and product hierarchy were appended, totaling to 38 additional
attributes in Step 4 of the algorithm (Figure 2). Attributes related with
time include first and last weeks where the items were sold (StartWeekH,
EndWeekH, StartWeekL, EndWeekL) and item lifetimes (LifeTimeH, Life-
TimeL). Attributes related with product hierarchy include the categories
and the merchandise subgroups of the items (CategoryH, CategoryL, Merch-
SubGrpH, MerchSubGrpL). Together with the record ID, the two item IDs
and the type of association, re-mining was carried out with a total of 42
attributes with 6,363 records. All the additional attributes have been com-
puted for all sorted item-pairs through executing relational queries.
Once the new dataset E ∗ is available for the re-mining step, any decision
tree algorithm can be run to generate the descriptive rules. Decision tree
methods such as C5.0, CHAID, CART are readily available within data
mining software in interactive and automated modes.
If decision tree analysis is conducted with all the attributes, support
related attributes would appear as the most significant ones, and data would
be perfectly classified. Therefore it is necessary to exclude the item support
attributes from the analysis.
13
15. One example of the rule describing the ‘+’ class is Rule 1 in Figure 3,
which can be written as follows:
IF LifeTimeL < 22 AND LifeTimeH ≥ 28 THEN ‘+’.
This rule reveals that if the lifetime of the lower-priced item is less than
22 weeks, and the lifetime of the higher-priced item is greater than or equal
to 28 weeks, then there is a very high probability of this item pair to exhibit
‘+’ association (having the target class ‘+’).
An example of ‘-’ class rule is Rule 2 in Figure 3:
IF LifeTimeL < 22 AND LifeTimeH < 28 AND CorrNormPrice HL <
0.003 THEN ‘-’.
This rule reveals that if the lifetime of the lower-priced item is less than
22 weeks, and the lifetime of the higher-priced item is less than 28 weeks, the
correlation between the prices of these items is less than 0.003 (practically
translating into negative correlation), then there is a very high probability
of this item pair to exhibit ‘-’ association (having the target class ‘-’).
Ceteris paribus, given everything else being the same, retail managers
would prefer positive associations between items. This translates into travers-
ing the darker nodes in Figure 3, where nodes are darkened with increasing
probability of positive associations. In the visual mining of the decision
tree, one is interested in the node transitions where a significant change
takes place in the node color. So in Figure 3, a transition from a light-
colored node to a dark node is interesting. By observing Figure 3, several
policies have been developed and are presented below. The policies listed
below are named based on the level of the tree that they are derived. They
are also marked on the tree in Figure 3.
Policy 1: “LifeTimeL ≥ 22”. The lower-priced item should have a life
time greater than or equal to 22 weeks.
14
16. Policy 2: “LifeTimeL ≥ 28”. It’s even better that the low priced item
has life time greater than or equal to 28 weeks, rather than 22 weeks. This
would increase its probability of having positive associations with higher
priced items from 0.73 to 0.92 (shown with darker node).
Policy 3: “IF LifeTimeL < 22 THEN LifeTimeH ≥ 28”. If it is not
possible to apply Policy 1, one should make sure to have higher priced items
have a life time greater than or equal to 28 weeks. This significantly increases
the probability of having positive associations from 0.50 to 0.86.
Policy 4: “IF LifeTimeL ≥ 22 AND LifeTimeL < 28 THEN CorrNorm-
Price HL ≥ 0.003”. If it is possible to apply Policy 1, but not Policy 2, then
the pricing policy of synchronized price decrease can still increase the prob-
ability of positive associations. Policy 4 suggests that the normalized prices
should have positive correlation, so that positive associations will emerge.
Since the prices are monotonically non-increasing through the season, this
insight suggests that the prices of high-price and low-price items should be
decreased together (so that there will be positive correlation between the
normalized prices).
Policy 5: “IF LifeTimeL < 22 AND LifeTimeH < 28 THEN CorrNorm-
Price HL ≥ 0.026”. If it is possible to apply Policy 1, but not Policy 3, then
one should again apply Policy 4, but marking down the prices of the items
such that a higher correlation (at least 0.026, rather than 0.003) exists be-
tween the prices.
Policy 6: When it is possible to apply Policy 1, but not Policy 2 or
Policy 4, there is still good news. For low-priced items in Category “C006”,
a life time between 22 and 28 weeks yields a fairly high (0.77) probability
of positive association (more than double the probability for items in other
categories that have the same life times). So, applying Policy 1 just for
15
17. Category “C006”, making sure that the low-priced item has a life cycle of
at least 22 weeks, guarantees a 0.77 probability of positive association with
at least one other higher priced item.
The policies obtained through visual analysis of decision tree (Figure 3)
should be handled with caution: The policies suggested above should be
put into computational models for numerically justifying their feasibility.
For example, Policy 1 suggests that the lower-priced item should have a
lifetime of at least 22 weeks. But is such a long lifetime financially justified,
especially when there are many lower-priced items? If the additional costs
that will arise are not high enough to offset the additional generated sales,
then Policy 1 may be justified; but this is not guaranteed a priori, without
a formal fact-based numerical analysis. Thus, the policies should not be
applied in isolation, but taking into consideration their interactions.
4.3. Exploratory Re-Mining
Exploratory re-mining is also conducted by creating potentially infor-
mative scatter plots. As seen from Figure 4, positive (“TargetClass=+”)
and negative (“TargetClass=-”) associations are well-separated in different
regions of the plot. For example, if StartWeekL is around the fifth week of
the season then the associations are mainly positive. In addition, a body
of positive associations up to ‘StartWeekL=15’ can be seen in Figure 4. It
means that there is a high possibility of having a positive association be-
tween two items when the lower priced item is introduced early in the season.
Since basic items are usually introduced early in the season, higher chance of
positive association between two items can be hypothesized when the lower
priced item is a basic one vs. a fashion one.
Visual analysis of Figure 5 yields new policies. In the figure, the dark blue
16
18. points indicate positive associations, and light grey points indicate negative
associations. It is easy to see the very heavy concentration of blue dots in
the region where the initial (maximum) item price is between 14 and 18 TL
(Turkish Liras) (14 < MaxPriceL < 18). The concentration of grey dots in
that region is very dense, suggesting Policy 5. Also, when scanning the y axis
(MaxPriceL), one can observe that items with an initial price tag greater
than 70 TL almost always have positive associations with lower-priced items
in the price range of 38 to 42 TL, suggesting Policy 6.
Policy 7: Keep the initial price of low-priced items in the range of 14 to
18 TL. This will enable them to generate new sales, thanks to their positive
associations with many higher priced items.
Policy 8: Position the items with price tag between 38 to 42 TL together
with (related) items that have an initial price tag over 70 TL.
It is clear that the policies generated by the visual analysis of the decision
tree and a sample colored scatter plot complement each other. This suggests
that one should carry out a variety of analyses in the re-mining stage.
4.4. Predictive Re-Mining
In predictive re-mining, the aim is to predict (based on their attributes)
whether two items have positive or negative associations. To show the ap-
plicability of re-mining, various predictive methods in SAS Enterprise Miner
are used in this study. Support Vector Machines (SVM) [23], Memory Based
Reasoning (MBR), Neural Networks (NN) [26], Gradient Boosting [13], Au-
toNeural and Decision Tree [28] nodes are utilized in predictive analysis.
The data set is partitioned to training set (50%), validation set (20%) and
test set (30%) before predictive methods.
The accuracy results are reported in Table 1. NN model is the clear
17
19. winner among the predictive models. Accuracy rates of the test set from
four different models over 0.70 are an indication of accurate item association
predictions based on the attribute data. Both positive and negative asso-
ciations can be predicted accurately by utilizing attribute data, suggesting
that re-mining approach can be successfully used for predictive purposes.
In addition to accuracy rates, receiver operating characteristic (ROC)
curves (see [28]) are given in Figure 6. Sensitivity and specificity are defined
as the true positive rate and the true negative rate, respectively. By defini-
tion, better classifiers have ROC curves towards the upper left of the plots.
Therefore NN model has the best ROC curve and the decision tree model
has the second best ROC curve among the predictive models. Similarly the
cumulative lift plots [17] are depicted in Figure 7. The most upper right plot
represents the best classifier in terms of lift. NN is again the best classifier
and SVM performs as the next best model.
Variable importance is another statistic regarding classification models.
Table 2 reports importance statistics of training and validation sets sepa-
rately from the decision tree model. In light of Table 2 one can conclude
that the correlation-based attributes play important roles along with the
price-related attributes. Nevertheless, item attributes have predictive prop-
erties in terms of item associations. The attribute data can and should be
used in determining the item associations. The plain transaction data might
be less informative in terms of item associations.
The evidence presented in this section suggests that the use of classifica-
tion models is justified and Policy 9 presents an algorithm for incorporating
classification models into retail markdown planning for the next season:
Policy 9: For every item, compute the values of the important variables
in Table 2. Then put these into the classification model and predict the
18
20. sign of association between every item pair. For items that are predicted to
have positive associations, create clusters based on merchandise group, start
time, and initial price. Present the clusters of items together in marketing
campaigns and in stores.
5. Comparison with Quantitative Association Mining
As mentioned earlier, as an alternative to re-mining, additional attribute
data used can also be incorporated through QAM. This alternative analy-
sis has also been conducted to illustrate its limitations on analyzing retail
data. Quantitative association mining has been studied extensively in the
literature and some of its major applications, such as [22], were reviewed
in Section 2.2. In the case study, an initial analysis of the price data sug-
gested that there are few price levels for each of the products. Therefore a
straight-forward discretization, which does not require a complex transfor-
mation, was possible. One of the most important steps in the QAM is the
discretization step. Instead of utilizing a more complex scheme for QAM,
in the re-mining case study, the item and price information were conjoined
into a new entity.
The two main seasons in apparel retailing, winter and summer, have
approximately equal length. As a common business rule, prices are not
marked down too often. Usually, at least two weeks pass by between sub-
sequent markdowns. Thus, there exist a countable set of price levels within
each season. There might be temporary sales promotions during the season,
but marked down prices remain the same until the next price markdown.
Prices are set at the highest level in the beginning of each season, falling
down by each markdown. If the price data of a product is normalized by
19
21. dividing by the highest price, normalized prices will be less than or equal
to one. When two significant digits are used after the decimal point, prices
can be easily discretized. For example, after an initial markdown of 10%,
the normalized price will be 0.90. Markdowns are usually computed on the
original highest price. In other words, if the second markdown is 30% then
the normalized price is computed as 0.70.
After using this discretization scheme, 3,851 unique product-normalized
price pairs have been obtained for the 600 unique products of the original
transaction data. Each product has six price levels on the average. The
highest number of price levels for a product is 14 and the lowest number of
price levels is two. This shows that markdowns were applied to all the prod-
ucts and no product has been sold at its original price throughout the whole
season. Technically, a discretized price can be appended to a corresponding
product name to create a new unique entity for the QAM. For example ap-
pending 0.90 to the product name ‘A’ after an underscore will create the new
entity ‘A 0.90’. One can easily utilize the conventional association mining
to conduct a discretized QAM after this data transformation.
The top-10 frequent pairs and their support counts can be found in Table
2 of [10]. As can be observed from that table, a large portion of the frequent
purchases occurs at the discounted prices. Among the top-10 frequent item
pairs, only the sixth one has full prices for both of the items. The remaining
pairs are purchased at marked down prices.
The retail company does not allow price differentiations by locations. In
other words, an item will have the same price across all the stores. Quan-
titative association mining can identify negative associations between items
for value combinations that actually never occur. For example, even though
an item A is sold only at the normalized price of 0.70 in the time interval
20
22. that B is sold, QAM can still suggest other price combinations of A and B,
such as (A 1.00, B 1.00) as negative item pairs. Thus, many item-price com-
binations will have negative associations due to the nature of the business
and the way QAM operates, yielding misleading results.
Even though both positive and negative QAM can be run on any given
dataset conceptually, it is not guaranteed to yield useful outcomes. Alter-
natively, re-mining operates only on the confirmed positive and negative
quantitative associations and does not exhibit the discussed problem.
6. Complexity Analysis
In this section, the complexity of the re-mining algorithm is analyzed
in comparison to the conventional QAM. First, the complexity of re-mining
is discussed, based on results from the literature on the Apriori algorithm.
Then, the complexity of QAM is discussed, based on two milestone papers
[4, 25], and results from the computational complexity literature.
Before proceeding with the complexity analysis, it should be remarked
that QAM computes detailed itemsets/rules, that contain specific informa-
tion that contain atomic conditions, such as “Attribute A1 taking the value
of u1j , and attribute A2 taking the value of u2j ”. An atomic condition is
in the form Ai = uij for attribute Ai , for categorical attributes, and in the
form Ai ∈ [l, u] for numeric attributes. A frequent itemset in the context of
QAM is as a condition C on a set of distinct attributes A1 , . . . , AN , where
N is the total number of attributes. C is in the form C = C1 ∧ C2 . . . ∧ CN ,
where is Ci is an atomic condition on Ai , for each i = 1, . . . N [4]. In [25]
the term pattern is used instead of condition. An association rule in the
context of QAM is an expression in the form Antecedent ⇒ Consequent,
21
23. where both Antecedent and Consequent are conditions. However, this type
of an output still requires aggregating the results to obtain the summary
statistics that re-mining provides. Alternatively, re-mining starts with only
the interesting rules (frequent itemsets in the case study) obtained through
Apriori, and then enriches these results with statistics computed for the
itemsets through database queries. So re-mining eliminates the computa-
tion of specific detailed rules, and computes only the interesting itemsets
and their statistics.
6.1. Complexity of Re-mining
The problem of finding the frequent itemsets (and the related association
rules), itemsets that appear in at least s percent of the transactions, is the
most fundamental analysis in association mining. The standard algorithm
for this problem is the Apriori algorithm, first proposed in [3], and then
improved in numerous succeeding studies, including [2, 15].
A new line of research approaches the frequent itemset discovery prob-
lem from a graph-theoretic perspective, achieving even further efficiency in
running times. [19] extends earlier work in [16], posing the problem as a
frequent subgraph discovery problem. The goal is finding all connected sub-
graphs that appear frequently in a large graph database. The authors report
that their algorithm scales linearly with respect to the size of the data set.
Even though new algorithms are continuously being developed with effi-
ciency increases, the reference complexity result regarding the Apriori algo-
rithm (considering its numerous variants) is that it has an upper bound of
O(|C| · |D|) and a lower bound of O(k · log(|I|/k)) on running time [1]. In
these expressions, |C| denotes the sum of sizes of candidates considered, |D|
denotes the size of the transactions database D, |I| is the number of items,
22
24. and k is the size of the largest set.
Now the complexity of re-mining will be analyzed following a fixed-
schema complexity perspective [25], where the set of attributes A is fixed
(with size N ) and the complexity is derived relative to the number of tuples
in the set of items I. In the presentation of the re-mining methodology,
only the itemsets of size 2 were selected. Thus the total number of frequent
itemset candidates is |I|(|I|−1)/2, and its order is O(|I|2 ). Hence |C| is also
by O(|I|2 ), and the upper bound for the running of the Apriori algorithm
is O(|I|2 · |D|). In the re-mining phase, the data is enriched with statistics
for the two-itemsets. There are N attributes for which statistics will be
computed. For each of these attributes, there will be a query that filters
the transactions with the itemset of interest, which will take O(|D|). Then
there will be the computation of the statistic, which will require a single pass
over the selected transactions for computing the mean, and two passes for
computing the standard deviation. So the upper bound for the computation
of each statistic is O(|D|2 ), and this overrides the time for the filter query.
Since there are N attributes, the running time for the computation of the
statistics is O(N · |D|2 ). When this is combined with the running time of
the Apriori, the upper bound for the whole re-mining process with k = 2 is
O(|I|2 · |D| + N · |D|2 ).
6.2. Complexity of Quantitative Association Mining
The alternative to re-mining is carrying out quantitative association min-
ing (QAM), and aggregating the results to obtain statistics for the itemsets.
The two most comprehensive studies in the literature regarding the com-
plexity of QAM are [4, 25].
The most fundamental result in both papers is that QAM is NP-complete,
23
25. even in databases without nulls. The proof of this theorem is through re-
ducing the CLIQUE problem to the QAM problem. The decision version of
the CLIQUE problem tests whether a given graph contains a k-clique, i.e.,
a complete subgraph with size k, that has all its elements connected.
Now the complexity of QAM is analyzed following a fixed-schema com-
plexity perspective, just as was done for re-mining. The additional effort to
compute the summary statistics from the quantitative association rules will
also be considered.
According to Wijsen and Meersman, the fixed-schema complexity of
QAM is O(|D|(2(N +1)+1)) ) = O(|D|(2N +3) ) = O(|D|2N ). The number of
attributes has been taken as N + 1, rather than N , to include a new variable
for the item label; but this does not change the complexity much. Once all
the rules are generated, the time to filter for two-itemsets and compute the
statistics is the same as in re-mining, namely O(N · |D|2 ). So the complexity
of the whole process is O(|D|2N + N · |D|2 ). Recall that the complexity of
re-mining for k = 2 is O(|I|2 · |D| + N · |D|2 ).
The running time of QAM increases exponentially with N . This sug-
gests that re-mining becomes much more efficient as N increases. Also, the
running time of QAM takes power of |D|, which suggests that re-mining
becomes much more efficient for larger databases.
7. Conclusion
A novel methodology, namely re-mining, has been introduced in this
study to enrich the original data mining process. A new set of data is added
to the results of the traditional association mining and an additional data
mining step is conducted. The goal is to describe and explore the factors
24
26. behind positive and negative association mining and to predict the type of
associations based on attribute data. It is shown that not only categorical
attributes (e.g. category of the product), but also quantitative attributes
such as price, lifetime of the products in weeks and some derived statistics,
can be included in the study while avoiding NP-completeness.
The re-mining methodology has been demonstrated through a case study
in apparel retail industry and its practicality has been proven for the prob-
lem at hand. Descriptive, exploratory and predictive re-mining can be im-
plemented to reveal interesting patterns previously hidden in data. The case
study revealed interesting outcomes such as “negative associations are usu-
ally seen between fashion items” and “the price of an item is an important
factor for the item associations in apparel retailing.”
As a future study, re-mining can be applied for different merchandise
groups and for successive seasons, testing its robustness w.r.t. sampling
bias and temporal change, respectively. Data coming from other retailers,
with diverse characteristics, can be used for further experimental evidence.
Finally, data coming from different domains can be used for conforming the
applicability of the re-mining process in those domains.
Acknowledgement
This work is financially supported by the Turkish Scientific Research Council
under Grant TUBITAK 107M257.
[1] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Inkeri
Verkamo. Fast discovery of association rules. In U. Fayyad and et al,
editors, Advances in Knowledge Discovery and Data Mining, pages 307–
328. AAAI Press, Menlo Park, CA, 1996.
25
27. [2] R. Agrawal and R. Srikant. Fast algorithms for mining association
rules. In Proceedings of the 20th VLDB Conference,Santiago, Chile,
pages 487–499, 1994.
[3] Rakesh Agrawal, Tomasz Imielinski, and Arun N. Swami. Mining asso-
ciation rules between sets of items in large databases. In Peter Buneman
and Sushil Jajodia, editors, SIGMOD Conference, pages 207–216. ACM
Press, 1993.
[4] Fabrizio Angiulli, Giovambattista Ianni, and Luigi Palopoli. On the
complexity of inducing categorical and quantitative association rules.
Theoretical Computer Science, 314(1-2):217–249, February 2004.
[5] Maria-Luiza Antonie and Osmar R. Za¨
ıane. An associative classifier
based on positive and negative rules. In Gautam Das, Bing Liu, and
Philip S. Yu, editors, DMKD, pages 64–69. ACM, 2004.
[6] Yonatan Aumann and Yehuda Lindell. A statistical theory for quanti-
tative association rules. J. Intell. Inf. Syst., 20(3):255–283, 2003.
[7] Tom Brijs, Gilbert Swinnen, Koen Vanhoof, and Geert Wets. Building
an association rules framework to improve product assortment deci-
sions. Data Min. Knowl. Discov., 8(1):7–23, 2004.
[8] Ayhan Demiriz. Enhancing product recommender systems on sparse bi-
nary data. Journal of Data Mining and Knowledge Discovery, 9(2):147–
170, September 2004.
[9] Ayhan Demiriz, Ahmet Cihan, and Ufuk Kula. Analyzing price data
to determine positive and negative product associations. In Chi Leung,
26
28. Minho Lee, and Jonathan Chan, editors, Neural Information Process-
ing, volume 5863 of LNCS, pages 846–855. Springer, 2009.
[10] Ayhan Demiriz, Gurdal Ertek, Tankut Atan, and Ufuk Kula. Re-mining
positive and negative association mining results. In Petra Perner, ed-
itor, Advances in Data Mining. Applications and Theoretical Aspects,
volume 6171 of LNCS, pages 101–114. Springer, 2010.
[11] G¨rdal Ertek and Ayhan Demiriz. A framework for visualizing asso-
u
ciation mining results. In Albert Levi, Erkay Savas, H¨sn¨ Yenig¨n,
u u u
Selim Balcisoy, and Y¨cel Saygin, editors, ISCIS, volume 4263 of Lec-
u
ture Notes in Computer Science, pages 593–602. Springer, 2006.
[12] U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data min-
ing to knowledge discovery: An overview. In Advances in Knowledge
Discovery and Data Mining, pages 1–34, Menlo Park, CA, 1996. AAAI
Press.
[13] Jerome H. Friedman. Stochastic gradient boosting. Computational
Statistics & Data Analysis, 38(4):367 – 378, 2002.
[14] Jiawei Han and Micheline Kamber. Data Mining Concepts and Tech-
niques. Morgan Kaufmann, 2nd edition, 2006.
[15] Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without
candidate generation. In Weidong Chen, Jeffrey F. Naughton, and
Philip A. Bernstein, editors, SIGMOD Conference, pages 1–12. ACM,
2000.
[16] Akihiro Inokuchi, Takashi Washio, and Hiroshi Motoda. An apriori-
based algorithm for mining frequent substructures from graph data.
27
29. In Djamel A. Zighed, Henryk Jan Komorowski, and Jan M. Zytkow,
editors, PKDD, volume 1910 of Lecture Notes in Computer Science,
pages 13–23. Springer, 2000.
[17] YongSeog Kim and W. Nick Street. An intelligent system for cus-
tomer targeting: a data mining approach. Decision Support Systems,
37(2):215 – 228, 2004.
[18] Flip Korn, Alexandros Labrinidis, Yannis Kotidis, and Christos Falout-
sos. Quantifiable data mining using ratio rules. VLDB J., 8(3-4):254–
266, 2000.
[19] Michihiro Kuramochi and George Karypis. An efficient algorithm
for discovering frequent subgraphs. IEEE Trans. Knowl. Data Eng.,
16(9):1038–1051, 2004.
[20] Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han, and Alex Pang.
Exploratory mining and pruning optimizations of constrained associa-
tions rules. In SIGMOD ’98: Proceedings of the 1998 ACM SIGMOD
international conference on Management of data, pages 13–24, New
York, NY, USA, 1998. ACM.
[21] A. Savasere, E. Omiecinski, and S.B. Navathe. Mining for strong neg-
ative associations in a large database of customer transactions. In
Proceedings of the 14th International Conference on Data Engineering,
pages 494–502, 1998.
[22] Ramakrishnan Srikant and Rakesh Agrawal. Mining quantitative as-
sociation rules in large relational tables. In H. V. Jagadish and Inder-
28
30. pal Singh Mumick, editors, SIGMOD Conference, pages 1–12. ACM
Press, 1996.
[23] Aixin Sun, Ee-Peng Lim, and Ying Liu. On strategies for imbalanced
text classification using svm: A comparative study. Decision Support
Systems, 48(1):191 – 201, 2009. Information product markets.
[24] Pang-Ning Tan, Vipin Kumar, and Harumi Kuno. Using sas for min-
ing indirect associations in data. In Western Users of SAS Software
Conference, 2001.
[25] Jef Wijsen and Robert Meersman. On the complexity of mining quanti-
tative association rules. Data Min. Knowl. Discov., 2(3):263–281, 1998.
[26] Bo K. Wong, Thomas A. Bodnovich, and Yakup Selvi. Neural network
applications in business: A review and analysis of the literature (1988-
1995). Decision Support Systems, 19(4):301 – 320, 1997.
[27] Yiyu Yao, Yan Zhao, and R. Brien Maguire. Explanation-oriented as-
sociation mining using a combination of unsupervised and supervised
learning algorithms. In Yang Xiang and Brahim Chaib-draa, editors,
Canadian Conference on AI, volume 2671 of Lecture Notes in Computer
Science, pages 527–531. Springer, 2003.
[28] Huimin Zhao. A multi-objective genetic programming approach to
developing pareto optimal decision trees. Decision Support Systems,
43(3):809 – 826, 2007. Integrated Decision Support.
29
31. Table 1: Classification Results
Test Set Training Set Validation Set
Accuracy Accuracy Accuracy
Model Node Rate Rate Rate
Neural Networks 0.82 0.86 0.83
Decision Tree 0.79 0.84 0.85
SVM 0.76 0.79 0.78
MBR 0.70 0.77 0.72
Gradient Boosting 0.62 0.62 0.62
AutoNeural 0.59 0.61 0.60
Figure 1: (a) Taxonomy of Items and Associations [21]; (b) Indirect Association
Table 2: Decision Tree Model: Variable Importance
Variable Number of Validation
Name Rules Importance Importance
CorrNormPrice HL 5 1.00 1.00
LifeTimeH 3 0.90 0.82
CategoryL 2 0.81 0.78
StdDevPriceL H0 L1 1 0.48 0.45
CorrWeeklySales HL 2 0.46 0.52
LifeTimeL 2 0.45 0.31
StdDevPriceH H0 L1 1 0.39 0.23
CategoryH 2 0.37 0.24
EndWeekL 2 0.33 0.32
MerchSubGrpH 1 0.23 0.18
MaxPriceH 1 0.21 0.00
AvgPriceL H1 L0 1 0.18 0.19
30
32. Figure 2: Re-Mining Algorithm
Inputs
I: set of items; i ∈ I
D: set of transactions, containing only the itemset information
min sup: minimum support required for frequent itemsets
min items: minimum number of items in the itemsets
max items: maximum number of items in the itemsets
sort attr: the attribute used to sort the items within the 2-itemsets, e.g. price of an item in
retailing
Definitions
L+ :
k
set of frequent k-itemsets that have positive association; l ∈ L+
k
L− :
k
set of k-itemsets that have negative association; l ∈ L−
k
L∗ :
k set of all k-itemsets from association mining; l ∈ L∗
k
l(j): jth element in itemset l; l ∈ L∗k
i.attr: value of attribute attr for item i; i ∈ I
E+ : set of records for re-mining, that contain positive associations; r ∈ E +
E− : set of records for re-mining, that contain negative associations; r ∈ E −
E∗ : set of all records for re-mining, r = (l(1), l(2), Γ), r ∈ E, Γ ∈ {+, −}
An : additional attribute n introduced for re-mining; n = 1 . . . N
A∗ : set of all attributes introduced for re-mining; A∗ = ∪An
Functions
apriori(D, min items, max items, min sup):
apriori algorithm that operates on D and generates frequent itemsets with minimum of min items
items, maximum of max items, and a minimum support value of min sup
indirect assoc(D, min items, max items):
indirect association mining algorithm that operates on D and generates itemsets that have negative
association, with minimum of min items items and maximum of max items
fAn (l):
function that computes the value of attribute An for a given itemset l, l ∈ L∗
k
swap(a, b) :
function that swaps the values of a and b
Algorithm: Re-mining
1. Perform association mining.
L+ = apriori(D, 2, 2, min sup)
2
L− = indirect assoc(D, 2, 2)
2
2. Sort the items in the 2-itemsets.
for all l ∈ L+ , L−
2 2
if l(1).sort attr < l(2).sort attr then
l = (l(1), l(2)) = swap(l(1), l(2))
3. Label the item associations accordingly and append them as new records.
E + = {r : r = (l(1), l(2), +), ∀l ∈ L+ }
2
E − = {r : r = (l(1), l(2), −), ∀l ∈ L− }
2
E∗ = E+ ∪ E−
4. Expand the records with additional attributes for re-mining.
for n = 1 . . . N
for all r ∈ E ∗
r = (r, fAn (l))
31
5. Perform exploratory, descriptive, and predictive re-mining.
33. Figure 3: An Illustrative Decision Tree Model in Re-Mining
Figure 4: Exploratory Re-Mining Example: Analyzing Item Introduction in Season
32
34. Figure 5: Exploratory Re-Mining Example: Effect of the Maximum Item Price
Figure 6: ROC Curves
33