Download Link >https://ertekprojects.com/gurdal-ertek-publications/blog/linking-behavioral-patterns-to-personal-attributes-through-data-re-mining/
A fundamental challenge in behavioral informatics is the development of methodologies and systems that can achieve its goals and tasks, including be-havior pattern analysis. This study presents such a methodology, that can be con-verted into a decision support system, by the appropriate integration of existing tools for association mining and graph visualization. The methodology enables the linking of behavioral patterns to personal attributes, through the re-mining of colored association graphs that represent item associations. The methodology is described and mathematically formalized, and is demonstrated in a case study related with retail industry.
A Framework for Visualizing Association Mining Resultsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/a-framework-for-visualizing-association-mining-results/
Association mining is one of the most used data mining techniques due to interpretable and actionable results. In this study we pro-pose a framework to visualize the association mining results, speci¯cally frequent itemsets and association rules, as graphs. We demonstrate the applicability and usefulness of our approach through a Market Basket Analysis (MBA) case study where we visually explore the data mining results for a supermarket data set. In this case study we derive several
interesting insights regarding the relationships among the items and sug-gest how they can be used as basis for decision making in retailing.
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION cscpconf
This document presents a study on using similarity measure functions to select engineering materials from a database. Thirteen similarity/distance functions are examined to quantify the similarity between materials based on their properties. A performance index measure is used to evaluate how well a selected material matches the target material. The materials are ranked based on their normalized performance index values. An algorithm is presented that takes a target material, calculates similarities to materials in a database using different functions, computes performance indexes, and selects the material with the minimum normalized performance index as the best match. Experimental results applied the approach to select materials from a database of 5670 materials that best match a target material described by 25 properties.
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...ertekg
İndirmek için Bağlantı > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-association-mining-results-through-visualization-data-envelopment-analysis-and-decision-trees/
Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding how re-mining can be carried out on association mining results, are answered in the case study through empirical analysis.
Polyrepresentation in a Quantum-inspired Information Retrieval FrameworkIngo Frommholz
The document discusses the principle of polyrepresentation in information retrieval. It describes polyrepresentation as using multiple evidence or interpretations from different cognitive perspectives to determine relevance. This helps mitigate the "cognitive freefall" problem. There are different types of polyrepresentation, including document, information need, and algorithm polyrepresentation. Document polyrepresentation uses different features of a document, information need polyrepresentation uses different interpretations of a user's need, and algorithm polyrepresentation uses different retrieval algorithms or weighting schemes. The document argues that combining rankings from different polyrepresentative approaches through restricted fusion, where only the overlapping results are considered, can improve retrieval performance over single models.
Usage and Research Challenges in the Area of Frequent Pattern in Data MiningIOSR Journals
This document discusses the usage of frequent patterns in data mining, including for association mining, classification, and clustering. It provides background on foundational approaches for mining associations using frequent patterns, such as the Apriori, FP-growth, and ECLAT algorithms. It also discusses how frequent patterns have been used for classification tasks, such as generating discriminative features for training classifiers. Finally, it covers various ways frequent patterns have been applied to clustering problems, such as using frequent itemsets to represent and group documents for clustering. The document provides an overview of the state-of-the-art in applying frequent pattern mining across different data mining applications.
COMPARATIVE STUDY OF DISTRIBUTED FREQUENT PATTERN MINING ALGORITHMS FOR BIG S...IAEME Publication
The document presents a comparative study of distributed frequent pattern mining algorithms CDA, FDM, and MR-DARM for mining big sales data. It first preprocesses a large AMUL dairy sales dataset using Hadoop MapReduce to convert it to transactions. It then applies the MR-DARM, CDA, and FDM algorithms to find frequent itemsets and compare their performance. Experimental results on the dairy dataset show that MR-DARM has lower execution times than CDA and FDM, especially when the data is distributed across multiple nodes.
The document proposes using conditional random fields (CRFs) to improve legal document summarization. CRFs are applied to segment legal documents into seven labeled rhetorical components. Feature sets are used to improve CRF performance. A term distribution model and structured domain knowledge are then used to extract key sentences for each rhetorical category. The resulting structured summary is found to be 80% accurate compared to ideal summaries generated by experts.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
A Framework for Visualizing Association Mining Resultsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/a-framework-for-visualizing-association-mining-results/
Association mining is one of the most used data mining techniques due to interpretable and actionable results. In this study we pro-pose a framework to visualize the association mining results, speci¯cally frequent itemsets and association rules, as graphs. We demonstrate the applicability and usefulness of our approach through a Market Basket Analysis (MBA) case study where we visually explore the data mining results for a supermarket data set. In this case study we derive several
interesting insights regarding the relationships among the items and sug-gest how they can be used as basis for decision making in retailing.
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION cscpconf
This document presents a study on using similarity measure functions to select engineering materials from a database. Thirteen similarity/distance functions are examined to quantify the similarity between materials based on their properties. A performance index measure is used to evaluate how well a selected material matches the target material. The materials are ranked based on their normalized performance index values. An algorithm is presented that takes a target material, calculates similarities to materials in a database using different functions, computes performance indexes, and selects the material with the minimum normalized performance index as the best match. Experimental results applied the approach to select materials from a database of 5670 materials that best match a target material described by 25 properties.
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...ertekg
İndirmek için Bağlantı > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-association-mining-results-through-visualization-data-envelopment-analysis-and-decision-trees/
Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding how re-mining can be carried out on association mining results, are answered in the case study through empirical analysis.
Polyrepresentation in a Quantum-inspired Information Retrieval FrameworkIngo Frommholz
The document discusses the principle of polyrepresentation in information retrieval. It describes polyrepresentation as using multiple evidence or interpretations from different cognitive perspectives to determine relevance. This helps mitigate the "cognitive freefall" problem. There are different types of polyrepresentation, including document, information need, and algorithm polyrepresentation. Document polyrepresentation uses different features of a document, information need polyrepresentation uses different interpretations of a user's need, and algorithm polyrepresentation uses different retrieval algorithms or weighting schemes. The document argues that combining rankings from different polyrepresentative approaches through restricted fusion, where only the overlapping results are considered, can improve retrieval performance over single models.
Usage and Research Challenges in the Area of Frequent Pattern in Data MiningIOSR Journals
This document discusses the usage of frequent patterns in data mining, including for association mining, classification, and clustering. It provides background on foundational approaches for mining associations using frequent patterns, such as the Apriori, FP-growth, and ECLAT algorithms. It also discusses how frequent patterns have been used for classification tasks, such as generating discriminative features for training classifiers. Finally, it covers various ways frequent patterns have been applied to clustering problems, such as using frequent itemsets to represent and group documents for clustering. The document provides an overview of the state-of-the-art in applying frequent pattern mining across different data mining applications.
COMPARATIVE STUDY OF DISTRIBUTED FREQUENT PATTERN MINING ALGORITHMS FOR BIG S...IAEME Publication
The document presents a comparative study of distributed frequent pattern mining algorithms CDA, FDM, and MR-DARM for mining big sales data. It first preprocesses a large AMUL dairy sales dataset using Hadoop MapReduce to convert it to transactions. It then applies the MR-DARM, CDA, and FDM algorithms to find frequent itemsets and compare their performance. Experimental results on the dairy dataset show that MR-DARM has lower execution times than CDA and FDM, especially when the data is distributed across multiple nodes.
The document proposes using conditional random fields (CRFs) to improve legal document summarization. CRFs are applied to segment legal documents into seven labeled rhetorical components. Feature sets are used to improve CRF performance. A term distribution model and structured domain knowledge are then used to extract key sentences for each rhetorical category. The resulting structured summary is found to be 80% accurate compared to ideal summaries generated by experts.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Detailed structure applicable to hybrid recommendation techniqueIAEME Publication
This document discusses a hybrid recommendation technique that combines content-based filtering, collaborative filtering, and demographic techniques to provide personalized point of interest (POI) recommendations for tourists. The technique uses a three-part process: 1) querying tourists for preferences and trip constraints, 2) using the tourist information to score POIs from each recommendation technique, and 3) clustering the top POIs to find the optimal trip area. The proposed hybrid approach aims to automatically select the most personalized POIs for tourists based on their initial information.
In order to solve the complex decision-making problems, there are many approaches and systems based on the fuzzy theory were proposed. In 1998, Smarandache introduced the concept of single-valued neutrosophic set as a complete development of fuzzy theory. In this paper, we research on the distance measure between single-valued neutrosophic sets based on the H-max measure of Ngan et al. [8]. The proposed measure is also a distance measure between picture fuzzy sets which was introduced by Cuong in 2013 [15]. Based on the proposed measure, an Adaptive Neuro Picture Fuzzy Inference System (ANPFIS) is built and applied to the decision making for the link states in interconnection networks. In experimental evaluation on the real datasets taken from the UPV (Universitat Politècnica de València) university, the performance of the proposed model is better than that of the related fuzzy methods.
Jasmine is a hybrid reasoning tool that discovers causal links in biological data through deterministic machine learning. It generates hypotheses to explain observations, verifies hypotheses by finding data that satisfies them, and falsifies some hypotheses by revealing inconsistencies. Jasmine represents domain knowledge as an ontology and uses abductive, inductive, and deductive reasoning. It predicts targets for objects by finding common causes between similar objects based on their satisfied features.
The document provides a literature review of different clustering techniques. It begins by defining clustering and its applications. It then categorizes and describes several clustering methods including hierarchical (BIRCH, CURE, CHAMELEON), partitioning (k-means, k-medoids), density-based (DBSCAN, OPTICS, DENCLUE), grid-based (CLIQUE, STING, MAFIA), and model-based (RBMN, SOM) methods. For each method, it discusses the algorithm, advantages, disadvantages and time complexity. The document aims to provide an overview of various clustering techniques for classification and comparison.
SEARCH OF INFORMATION BASED CONTENT IN SEMI-STRUCTURED DOCUMENTS USING INTERF...ijcsitcejournal
This paper proposes a semi-structured information retrieval model based on a new method for calculation
of similarity. We have developed CASISS (Calculation of Similarity of Semi-Structured documents)
method to quantify how two given texts are similar. This new method identifies elements of semi-structured
documents using elements descriptors. Each semi-structured document is pre-processed before the
extraction of a set of descriptors for each element, which characterize the contents of elements.It can be
used to increase the accuracy of the information retrieval process by taking into account not only the
presence of query terms in the given document but also the topology (position continuity) of these terms.
Re-Mining Item Associations: Methodology and a Case Study in Apparel Retailingertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-item-associations-methodology-and-a-case-study-in-apparel-retailing/
Association mining is the conventional data mining technique for analyz-ing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not been integrated into market basket analy-sis in earlier studies. This paper proposes a new approach to mine price, time and domain related attributes through re-mining of association mining results. The underlying factors behind positive and negative relationships can be characterized and described through this second data mining stage. The applicability of the methodology is demonstrated through the analy-sis of data coming from a large apparel retail chain, and its algorithmic complexity is analyzed in comparison to the existing techniques.
This presentation is based on ``Statistical Modeling: The two cultures'' from Leo Breiman. It compares the data modeling culture (statistics) and the algorithmic modeling culture (machine learning).
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document proposes using a b-colouring technique in graph theory to perform clustering analysis on the Pima Indian Diabetes database. B-colouring involves assigning colors (clusters) to graph vertices such that no adjacent vertices share a color, and each color class has a dominating vertex connected to all other color classes. This guarantees separation between clusters. The document applies b-colouring clustering to the Pima dataset and evaluates classification accuracy compared to other methods, achieving favorable results. Experimental analysis demonstrates the b-colouring approach and reviews cluster validity indices for determining the optimal partition number.
Semi-Supervised Discriminant Analysis Based On Data Structureiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
Abstract Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this research work, various classification rule techniques are used to classify the computer files based on their extensions. For example, the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the experimental results, DTNB proves to be more efficient than other two techniques. Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR
1. The document analyzes kitchenware product sales data using exploratory data analysis (EDA) techniques like histograms and Monte Carlo simulations.
2. EDA was used to identify patterns in the profit data based on variables like sales volume, costs, and prices. This revealed a wide possible profit range from £-120,000 to £330,000, indicating high business risk.
3. Monte Carlo simulation of 1,000 random data points allowed comparison of mean, median, and mode profit values to assess average returns versus risk levels. The analysis provided insight into likely profit zones under different input conditions.
İndirmek için Bağlantı > https://ertekprojects.com/gurdal-ertek-publications/blog/depolama-sistemleri/
Depolar, ürünlerin dağıtımı sırasında kullanılan geçici stok noktalarıdır. Depolar, tedarik zincirlerinin hedeflenen amaçlar doğrultusunda çalışmasına ve lojistik faaliyetlerinin etkin yürütülmesine önemli katkıda bulunurlar. Depolar, üretim tesislerinin içinde veya yanında bulunabileceği gibi, ayrı, özel olarak inşa edilmiş yapılar halinde de kurulabilirler. Şekil 4.1’de, tipik bir deponun genel görünüşü sunulmaktadır. Malzeme/ürünler, bu tipik depoda raflarda depolanmakta, malzeme giriş çıkışları depo rampaları üzerinden gerçekleşmekte, yükleme/boşaltma işlemleri forklift olarak adlandırılan araçlar kullanılarak gerçekleştirilmektedir. Deponun yönetimi, Depo Yöneticisi (Warehouse Manager) ya da Depo Müdürüunvanını taşıyan bir lojistik uzmanı tarafından yürütülmektedir.
Optimizing Waste Collection In An Organized Industrial Region: A Case Studyertekg
This document summarizes a case study that optimizes industrial waste collection from 17 factories located in an organized industrial region in Turkey. The authors developed a mixed integer programming model to determine the optimal waste container locations and transportation routes to minimize total costs. They applied the model to real data from an industrial zone. The optimal solution selected 3 out of 5 candidate locations and had a minimum monthly cost of 70,338 Turkish Lira. The authors also created a visualization of the optimal supply chain network to provide additional insights into the solution.
The Bullwhip Effect In Supply Chain Reflections After A Decadeertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/the-bullwhip-effect-in-supply-chain-reflections-after-a-decade/
A decade has passed since the publication of the two seminal papers by Lee, Padmanabhan and Whang (1997) that describes the “bullwhip effect” in supply chains and characterizes its underlying causes. The bullwhip phenomenon is observed in supply chains where the decisions at the subsequent stages of the supply chain are made greedily based on local information, rather than through coordination based on global information on the state of the whole chain. The first consequence of this information distortion is higher variance in purchasing quantities compared to sales quantities at a particular supply chain stage. The second consequence is increasingly higher variance in order quantities and inventory levels in the upstream stages compared to their downstream stages (buyers). In this paper, we survey a decade of literature on the bullwhip effect and present the key insights reported by researchers and practitioners. We also present our reflections and share our vision of possible future.
Application Of Local Search Methods For Solving A Quadratic Assignment Probl...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/application-of-local-search-methods-for-solving-a-quadratic-assignment-problem-a-case-study/
This paper discusses the design and application of local search methods to a real-life application at a steel cord manufacturing plant. The case study involves a layout problem that can be represented as a Quadratic Assignment Problem (QAP). Due to the nature of the manufacturing process, certain machinery need to be allocated in close proximity to each other. This issue is incorporated into the objective function through assigning high penalty costs to the unfavorable allocations. QAP belongs to one of the most difficult class of combinatorial optimization problems, and is not solvable to optimality as the number of facilities increases. We implement the well-known local search methods, 2-opt, 3-opt and tabu search. We compare the solution performances of the methods to the results obtained from the NEOS server, which provides free access to many optimization solvers on the internet.
Simulation Modeling For Quality And Productivity In Steel Cord Manufacturingertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/simulation-modeling-for-quality-and-productivity-in-steel-cord-manufacturing/
We describe the application of simulation modeling to estimate and improve quality and productivity performance of a steel cord manufacturing system. We describe the typical steel cord manufacturing plant, emphasize its distinguishing characteristics, identify various production settings and discuss applicability of simulation as a management decision support tool. Besides presenting the general structure of the developed simulation model, we focus on wire fractures, which can be an important source of system disruption.
Rule-based expert systems for supporting university studentsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/rule-based-expert-systems-for-supporting-university-students/
There are more than 15 million college students in the US alone. Academic advising for courses and scholarships is typically performed by human advisors, bringing an immense managerial workload to faculty members, as well as other staff at universities. This paper reports and discusses the development of two educational expert systems at a private international university. The first expert system is a course advising system which recommends courses to undergraduate students. The second system suggests scholarships to undergraduate students based on their eligibility. While there have been reported systems for course advising, the literature does not seem to contain any references to expert systems for scholarship recommendation and eligibility checking. Therefore the scholarship recommender that we developed is first of its kind. Both systems have been implemented and tested using Oracle Policy Automation (OPA) software.
Design Requirements For a Tendon Rehabilitation Robot: Results From a Survey ...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/design-requirements-for-a-tendon-rehabilitation-robot-results-from-a-survey-of-engineers-and-health-professionals/
Exoskeleton type finger rehabilitation robots are helpful in assisting the treatment of tendon injuries. A survey has been carried out with engineers and health professionals to further develop an existing finger exoskeleton prototype. The goal of the study is to better understand the relative importance of several design criteria through the analysis of survey results and to improve the finger exoskeleton accordingly. The survey questions with strong correlations are identified and the preferences of the two respondent groups are statistically compared. The results of the statistical analysis are interpreted and insights obtained are used to guide the design process. The answers to the qualitative questions are also discussed together with their design implications. Finally, Quality Function Deployment (QFD) has been employed for visualizing these functional requirements in relation to the customer requirements.
Detailed structure applicable to hybrid recommendation techniqueIAEME Publication
This document discusses a hybrid recommendation technique that combines content-based filtering, collaborative filtering, and demographic techniques to provide personalized point of interest (POI) recommendations for tourists. The technique uses a three-part process: 1) querying tourists for preferences and trip constraints, 2) using the tourist information to score POIs from each recommendation technique, and 3) clustering the top POIs to find the optimal trip area. The proposed hybrid approach aims to automatically select the most personalized POIs for tourists based on their initial information.
In order to solve the complex decision-making problems, there are many approaches and systems based on the fuzzy theory were proposed. In 1998, Smarandache introduced the concept of single-valued neutrosophic set as a complete development of fuzzy theory. In this paper, we research on the distance measure between single-valued neutrosophic sets based on the H-max measure of Ngan et al. [8]. The proposed measure is also a distance measure between picture fuzzy sets which was introduced by Cuong in 2013 [15]. Based on the proposed measure, an Adaptive Neuro Picture Fuzzy Inference System (ANPFIS) is built and applied to the decision making for the link states in interconnection networks. In experimental evaluation on the real datasets taken from the UPV (Universitat Politècnica de València) university, the performance of the proposed model is better than that of the related fuzzy methods.
Jasmine is a hybrid reasoning tool that discovers causal links in biological data through deterministic machine learning. It generates hypotheses to explain observations, verifies hypotheses by finding data that satisfies them, and falsifies some hypotheses by revealing inconsistencies. Jasmine represents domain knowledge as an ontology and uses abductive, inductive, and deductive reasoning. It predicts targets for objects by finding common causes between similar objects based on their satisfied features.
The document provides a literature review of different clustering techniques. It begins by defining clustering and its applications. It then categorizes and describes several clustering methods including hierarchical (BIRCH, CURE, CHAMELEON), partitioning (k-means, k-medoids), density-based (DBSCAN, OPTICS, DENCLUE), grid-based (CLIQUE, STING, MAFIA), and model-based (RBMN, SOM) methods. For each method, it discusses the algorithm, advantages, disadvantages and time complexity. The document aims to provide an overview of various clustering techniques for classification and comparison.
SEARCH OF INFORMATION BASED CONTENT IN SEMI-STRUCTURED DOCUMENTS USING INTERF...ijcsitcejournal
This paper proposes a semi-structured information retrieval model based on a new method for calculation
of similarity. We have developed CASISS (Calculation of Similarity of Semi-Structured documents)
method to quantify how two given texts are similar. This new method identifies elements of semi-structured
documents using elements descriptors. Each semi-structured document is pre-processed before the
extraction of a set of descriptors for each element, which characterize the contents of elements.It can be
used to increase the accuracy of the information retrieval process by taking into account not only the
presence of query terms in the given document but also the topology (position continuity) of these terms.
Re-Mining Item Associations: Methodology and a Case Study in Apparel Retailingertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-item-associations-methodology-and-a-case-study-in-apparel-retailing/
Association mining is the conventional data mining technique for analyz-ing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not been integrated into market basket analy-sis in earlier studies. This paper proposes a new approach to mine price, time and domain related attributes through re-mining of association mining results. The underlying factors behind positive and negative relationships can be characterized and described through this second data mining stage. The applicability of the methodology is demonstrated through the analy-sis of data coming from a large apparel retail chain, and its algorithmic complexity is analyzed in comparison to the existing techniques.
This presentation is based on ``Statistical Modeling: The two cultures'' from Leo Breiman. It compares the data modeling culture (statistics) and the algorithmic modeling culture (machine learning).
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document proposes using a b-colouring technique in graph theory to perform clustering analysis on the Pima Indian Diabetes database. B-colouring involves assigning colors (clusters) to graph vertices such that no adjacent vertices share a color, and each color class has a dominating vertex connected to all other color classes. This guarantees separation between clusters. The document applies b-colouring clustering to the Pima dataset and evaluates classification accuracy compared to other methods, achieving favorable results. Experimental analysis demonstrates the b-colouring approach and reviews cluster validity indices for determining the optimal partition number.
Semi-Supervised Discriminant Analysis Based On Data Structureiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
Abstract Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this research work, various classification rule techniques are used to classify the computer files based on their extensions. For example, the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the experimental results, DTNB proves to be more efficient than other two techniques. Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR
1. The document analyzes kitchenware product sales data using exploratory data analysis (EDA) techniques like histograms and Monte Carlo simulations.
2. EDA was used to identify patterns in the profit data based on variables like sales volume, costs, and prices. This revealed a wide possible profit range from £-120,000 to £330,000, indicating high business risk.
3. Monte Carlo simulation of 1,000 random data points allowed comparison of mean, median, and mode profit values to assess average returns versus risk levels. The analysis provided insight into likely profit zones under different input conditions.
İndirmek için Bağlantı > https://ertekprojects.com/gurdal-ertek-publications/blog/depolama-sistemleri/
Depolar, ürünlerin dağıtımı sırasında kullanılan geçici stok noktalarıdır. Depolar, tedarik zincirlerinin hedeflenen amaçlar doğrultusunda çalışmasına ve lojistik faaliyetlerinin etkin yürütülmesine önemli katkıda bulunurlar. Depolar, üretim tesislerinin içinde veya yanında bulunabileceği gibi, ayrı, özel olarak inşa edilmiş yapılar halinde de kurulabilirler. Şekil 4.1’de, tipik bir deponun genel görünüşü sunulmaktadır. Malzeme/ürünler, bu tipik depoda raflarda depolanmakta, malzeme giriş çıkışları depo rampaları üzerinden gerçekleşmekte, yükleme/boşaltma işlemleri forklift olarak adlandırılan araçlar kullanılarak gerçekleştirilmektedir. Deponun yönetimi, Depo Yöneticisi (Warehouse Manager) ya da Depo Müdürüunvanını taşıyan bir lojistik uzmanı tarafından yürütülmektedir.
Optimizing Waste Collection In An Organized Industrial Region: A Case Studyertekg
This document summarizes a case study that optimizes industrial waste collection from 17 factories located in an organized industrial region in Turkey. The authors developed a mixed integer programming model to determine the optimal waste container locations and transportation routes to minimize total costs. They applied the model to real data from an industrial zone. The optimal solution selected 3 out of 5 candidate locations and had a minimum monthly cost of 70,338 Turkish Lira. The authors also created a visualization of the optimal supply chain network to provide additional insights into the solution.
The Bullwhip Effect In Supply Chain Reflections After A Decadeertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/the-bullwhip-effect-in-supply-chain-reflections-after-a-decade/
A decade has passed since the publication of the two seminal papers by Lee, Padmanabhan and Whang (1997) that describes the “bullwhip effect” in supply chains and characterizes its underlying causes. The bullwhip phenomenon is observed in supply chains where the decisions at the subsequent stages of the supply chain are made greedily based on local information, rather than through coordination based on global information on the state of the whole chain. The first consequence of this information distortion is higher variance in purchasing quantities compared to sales quantities at a particular supply chain stage. The second consequence is increasingly higher variance in order quantities and inventory levels in the upstream stages compared to their downstream stages (buyers). In this paper, we survey a decade of literature on the bullwhip effect and present the key insights reported by researchers and practitioners. We also present our reflections and share our vision of possible future.
Application Of Local Search Methods For Solving A Quadratic Assignment Probl...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/application-of-local-search-methods-for-solving-a-quadratic-assignment-problem-a-case-study/
This paper discusses the design and application of local search methods to a real-life application at a steel cord manufacturing plant. The case study involves a layout problem that can be represented as a Quadratic Assignment Problem (QAP). Due to the nature of the manufacturing process, certain machinery need to be allocated in close proximity to each other. This issue is incorporated into the objective function through assigning high penalty costs to the unfavorable allocations. QAP belongs to one of the most difficult class of combinatorial optimization problems, and is not solvable to optimality as the number of facilities increases. We implement the well-known local search methods, 2-opt, 3-opt and tabu search. We compare the solution performances of the methods to the results obtained from the NEOS server, which provides free access to many optimization solvers on the internet.
Simulation Modeling For Quality And Productivity In Steel Cord Manufacturingertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/simulation-modeling-for-quality-and-productivity-in-steel-cord-manufacturing/
We describe the application of simulation modeling to estimate and improve quality and productivity performance of a steel cord manufacturing system. We describe the typical steel cord manufacturing plant, emphasize its distinguishing characteristics, identify various production settings and discuss applicability of simulation as a management decision support tool. Besides presenting the general structure of the developed simulation model, we focus on wire fractures, which can be an important source of system disruption.
Rule-based expert systems for supporting university studentsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/rule-based-expert-systems-for-supporting-university-students/
There are more than 15 million college students in the US alone. Academic advising for courses and scholarships is typically performed by human advisors, bringing an immense managerial workload to faculty members, as well as other staff at universities. This paper reports and discusses the development of two educational expert systems at a private international university. The first expert system is a course advising system which recommends courses to undergraduate students. The second system suggests scholarships to undergraduate students based on their eligibility. While there have been reported systems for course advising, the literature does not seem to contain any references to expert systems for scholarship recommendation and eligibility checking. Therefore the scholarship recommender that we developed is first of its kind. Both systems have been implemented and tested using Oracle Policy Automation (OPA) software.
Design Requirements For a Tendon Rehabilitation Robot: Results From a Survey ...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/design-requirements-for-a-tendon-rehabilitation-robot-results-from-a-survey-of-engineers-and-health-professionals/
Exoskeleton type finger rehabilitation robots are helpful in assisting the treatment of tendon injuries. A survey has been carried out with engineers and health professionals to further develop an existing finger exoskeleton prototype. The goal of the study is to better understand the relative importance of several design criteria through the analysis of survey results and to improve the finger exoskeleton accordingly. The survey questions with strong correlations are identified and the preferences of the two respondent groups are statistically compared. The results of the statistical analysis are interpreted and insights obtained are used to guide the design process. The answers to the qualitative questions are also discussed together with their design implications. Finally, Quality Function Deployment (QFD) has been employed for visualizing these functional requirements in relation to the customer requirements.
Visual Mining of Science Citation Data for Benchmarking Scientific and Techno...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/visual-mining-of-science-citation-data-for-benchmarking-scientific-and-technological-competitiveness-of-world-countries/
In this paper we present a study where we visually analyzed science citation data to investigate the competitiveness of world countries in selected categories of science. The dataset that we worked on in our study includes the number of papers published and the number of citations made in the ESI (Essential Science Indicators) database in 2004. The dataset lists these values for practically every country in the world. In analyzing the data, we employ methods and software tools developed and used in the data mining and information visualization fields of the Computer Science. Some of the questions for which we look for answers in this study are the following: (a) Which countries are most competitive in the selected categories of science? (i.e. Engineering, Computer Science, Economics & Business) (b) What type of correlations exist between different categories of science? For example, do countries with many published papers in the field of Engineering science also have many papers published on Computer Science or Economics & Business? (c) Which countries produce the most influential papers? This analysis is needed since a country may have many papers published but these papers may be cited very rarely. (d) Can we gain useful and actionable insights by combining science citation data with socioeconomic and geographical data?
Impact of Cross Aisles in a Rectangular Warehouse: A Computational Studyertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/impact-of-cross-aisles-in-a-rectangular-warehouse-a-computational-study/
Order picking is typically the most costly operation in a warehouse and traveling is typically the most time consuming task within order picking. In this study we focus on the layout design for a rectangular warehouse, a warehouse with parallel storage blocks with main aisles separating them. We specifically analyze the impact of adding cross aisles that cut storage blocks perpendicularly, which can reduce travel times during order picking by introducing flexibility in going from one main aisle to the next. We consider two types of cross aisles, those that are equally spaced (Case 1) and those that are unequally spaced, which respectively have equal and unequal distances among them. For Case 2, we extend an earlier model and present a heuristic algorithm for finding the best distances among cross aisles. We carry out extensive computational experiments for a variety of warehouse designs. Our findings suggest that warehouse planners can obtain great travel time savings through establishing equally spaced cross aisles, but little additional savings in unequally-spaced cross isles. We present a look-up table that provides the best number of equally spaced cross aisles when the number of cross aisles (N) and the length of the warehouse (T) are given. Finally, when the values of N and T are not known, we suggest establishing three cross aisles in a warehouse.Order picking is typically the most costly operation in a warehouse and traveling is typically the most time consuming task within order picking. In this study we focus on the layout design for a rectangular warehouse, a warehouse with parallel storage blocks with main aisles separating them. We specifically analyze the impact of adding cross aisles that cut storage blocks perpendicularly, which can reduce travel times during order picking by introducing flexibility in going from one main aisle to the next. We consider two types of cross aisles, those that are equally spaced (Case 1) and those that are unequally spaced, which respectively have equal and unequal distances among them. For Case 2, we extend an earlier model and present a heuristic algorithm for finding the best distances among cross aisles. We carry out extensive computational experiments for a variety of warehouse designs. Our findings suggest that warehouse planners can obtain great travel time savings through establishing equally spaced cross aisles, but little additional savings in unequally-spaced cross isles. We present a look-up table that provides the best number of equally spaced cross aisles when the number of cross aisles (N) and the length of the warehouse (T) are given. Finally, when the values of N and T are not known, we suggest establishing three cross aisles in a warehouse.
Financial Benchmarking Of Transportation Companies In The New York Stock Exc...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/financial-benchmarking-of-transportation-companies-in-the-new-york-stock-exchange-nyse-through-data-envelopment-analysis-dea-and-visualization/
In this paper, we present a benchmarking study of industrial transportation companies traded in the New York Stock Exchange (NYSE). There are two distinguishing aspects of our study: First, instead of using operational data for the input and the output items of the developed Data Envelopment Analysis (DEA) model, we use financial data of the companies that are readily available on the Internet. Secondly, we visualize the efficiency scores of the companies in relation to the subsectors and the number of employees. These visualizations enable us to discover interesting insights about the companies within each subsector, and about subsectors in comparison to each other. The visualization approach that we employ can be used in any DEA study that contains subgroups within a group. Thus, our paper also contains a methodological contribution.
Encapsulating And Representing The Knowledge On The Evolution Of An Engineeri...ertekg
1) The document describes a methodology for encapsulating and representing knowledge about the evolution of an engineering system. It involves mathematical formalism to represent design steps, contradictions, goals and TRIZ principles.
2) A graph visualization is proposed to represent the design process, with nodes as possible TRIZ principles and colored nodes indicating principles selected at each step.
3) A database structure is outlined to store information on contradictions, principles, design steps and which principles were selected, along with explanations. This knowledge representation aims to guide design of similar products.
Application of local search methods for solving a quadratic assignment proble...ertekg
Ertek, G., Aksu, B., Birbil, S. E., İkikat, M. C., Yıldırmaz, C. (2005). “Application of local search methods for solving a quadratic assignment problem: A case study”, Proceedings of Computers and Industrial Engineering Conference, 2005. Istanbul, Turkey.
Statistical Scoring Algorithm for Learning and Study Skillsertekg
İndirmek için Bağlantı > https://ertekprojects.com/gurdal-ertek-publications/blog/statistical-scoring-algorithm-for-learning-and-study-skills/
This study examines the study skills and the learning styles of university students by using scoring method. The study investigates whether the study skills can be summarized in a single universal score that measures how hard a student works. The sample consists of 418 undergraduate students of an international university. The presented scoring was method adapted from the domain of risk management. The proposed method computes an overall score that represents the study skills, using a linear weighted summation scheme. From among 50 questions regarding to learning and study skills, the 30 highest weighted questions are suggested to be used in the future studies as a learning and study skills inventor. The proposed scoring method and study yield results and insights that can guide educators regarding how they can improve their students’ study skills. The main point drawn from this study is that the students greatly value opportunities for interaction with instructors and peers, cooperative learning and active engagement in lectures.
Visual and analytical mining of transactions data for production planning f...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/visual-and-analytical-mining-of-sales-transaction-data-for-production-planning-and-marketing/
Recent developments in information technology paved the way for the collection of large amounts of data pertaining to various aspects of an enterprise. The greatest challenge faced in processing these massive amounts of raw data gathered turns out to be the effective management of data with the ultimate purpose of deriving necessary and meaningful information out of it. The following paper presents an attempt to illustrate the combination of visual and analytical data mining techniques for planning of marketing and production activities. The primary phases of the proposed framework consist of filtering, clustering and comparison steps
implemented using interactive pie charts, K-Means algorithm and parallel coordinate plots respectively. A prototype decision support system is developed and a sample analysis session is conducted to demonstrate the applicability of the framework.
Teaching Warehousing Concepts through Interactive Animations and 3-D Modelsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/teaching-warehousing-concepts-through-interactive-animations-and-3-d-models/
Teaching Warehousing Concepts through Interactive Animations and 3-D Models
Supplier and Buyer Driven Channels in a Two-Stage Supply Chainertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/supplier-and-buyer-driven-channels-in-a-two-stage-supply-chain/
We explore the impact of power structure on price, sensitivity of market price, and profits in a two-stage supply chain with single product, supplier and buyer, and a price sensitive market. We develop and analyze the case where the supplier has dominant bargaining power and the case where the buyer has dominant bargaining power. We consider a pricing scheme for the buyer that involves both a multiplier and a markup. We show that it is optimal for the buyer to set the markup to zero and use only a multiplier. We also show that the market price and its sensitivity are higher when operational costs (namely distribution and inventory) exist. We observe that the sensitivity of the market price increases non-linearly as the wholesale price increases, and derive a lower bound for it. Through experimental analysis, we show that marginal impact of increasing shipment cost and carrying charge (interest rate) on prices and profits are decreasing in both cases. Finally, we show that there exist problem instances where the buyer may prefer supplier-driven case to markup-only buyer-driven and similarly problem instances where the supplier may prefer markup-only buyer-driven case to supplier-driven.
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/a-taxonomy-of-logistics-innovations/
In this paper we present a taxonomy of supply chain and logistics innovations, which is based on an extensive literature survey. Our primary goal is to provide guidelines for choosing the most appropriate innovations for a company, such that the company can outrun its competitors. We investigate the factors, both internal and external to the company, that determine the applicability and effectiveness of the listed innovations. We support our suggestions with real world cases reported in literature.
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/text-mining-with-rapidminer/
The goal of this chapter is to introduce the text mining capabilities of RAPIDMINER through a use case. The use case involves mining reviews for hotels at TripAdvisor.com, a popular web portal. We will be demonstrating basic text mining in RAPIDMINER using the text mining extension. We will present two different RAPIDMINER processes, namely Process01 andProcess02, which respectively describe how text mining can be combined with association mining and cluster modeling. While it is possible to construct each of these processes from scratch by inserting the appropriate operators into the process view, we will instead import these two processes readily from existing model files. Throughout the chapter, we will at times deliberately instruct the reader to take erroneous steps that result in undesired outcomes. We believe that this is a very realistic way of learning to use RAPIDMINER, since in practice, the modeling process frequently involves such steps that are later corrected.
Linking Behavioral Patterns to Personal Attributes through Data Re-MiningGurdal Ertek
A fundamental challenge in behavioral informatics is the development of methodologies and systems that can achieve its goals and tasks, including behavior pattern analysis. This study presents such a methodology, that can be converted
into a decision support system, by the appropriate integration of existing tools for association mining and graph visualization. The methodology enables the linking of behavioral patterns to personal attributes, through the re-mining of colored association graphs that represent item associations. The methodology is described and mathematically formalized, and is demonstrated in a case study related with retail industry.
http://research.sabanciuniv.edu.
Re-Mining Association Mining Results through Visualization, Data Envelopment ...Gurdal Ertek
Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding how re-mining can be carried out on association mining results, are answered in the case study through empirical analysis.
http://research.sabanciuniv.edu.
11.software modules clustering an effective approach for reusabilityAlexander Decker
This document summarizes previous work on using clustering techniques for software module classification and reusability. It discusses hierarchical clustering and non-hierarchical clustering methods. Previous studies have used these techniques for software component classification, identifying reusable software modules, course clustering based on industry needs, mobile phone clustering based on attributes, and customer clustering based on electricity load. The document provides background on clustering analysis and its uses in various domains including software testing, pattern recognition, and software restructuring.
A framework for visualizing association mining resultsGurdal Ertek
Association mining is one of the most used data mining tech-
niques due to interpretable and actionable results. In this study we propose a framework to visualize the association mining results, specifically frequent itemsets and association rules, as graphs. We demonstrate the applicability and usefulness of our approach through a Market Basket Analysis (MBA) case study where we visually explore the data mining results for a supermarket data set. In this case study we derive several
interesting insights regarding the relationships among the items and suggest how they can be used as basis for decision making in retailing.
http://research.sabanciuniv.edu.
Re-mining Positive and Negative Association Mining Resultsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-positive-and-negative-association-mining-results/
Positive and negative association mining are well-known and extensively studied data mining techniques to analyze market basket data. Efficient algorithms exist to find both types of association, sepa-rately or simultaneously. Association mining is performed by operating on the transaction data. Despite being an integral part of the transaction data, the pricing and time information has not been incorporated into market basket analysis so far, and additional attributes have been han-dled using quantitative association mining. In this paper, a new approach is proposed to incorporate price, time and domain related attributes into data mining by re-mining the association mining results. The underlying factors behind positive and negative relationships, as indicated by the as-sociation rules, are characterized and described through the second data mining stagere-mining. The applicability of the methodology is demon-strated by analyzing data coming from apparel retailing industry, where price markdown is an essential tool for promoting sales and generating increased revenue.
The document proposes an improved clustering algorithm for social network analysis. It combines BSP (Business System Planning) clustering with Principal Component Analysis (PCA) to group social network objects into classes based on their links and attributes. Specifically, it applies PCA before BSP clustering to reduce the dimensionality of the social network data and retain only the most important variables for clustering. This improves the BSP clustering results by focusing on the key information in the social network.
Ontology Based PMSE with Manifold PreferenceIJCERT
International journal from http://www.ijcert.org
IJCERT Standard on-line Journal
ISSN(Online):2349-7084,(An ISO 9001:2008 Certified Journal)
iso nicir csir
IJCERT (ISSN 2349–7084 (Online)) is approved by National Science Library (NSL), National Institute of Science Communication And Information Resources (NISCAIR), Council of Scientific and Industrial Research, New Delhi, India.
Developing Competitive Strategies in Higher Education through Visual Data MiningGurdal Ertek
Information visualization is the growing field of computer science that aims at visually mining data for knowledge discovery. In this paper, a data mining framework and a novel information visualization scheme is developed and applied
to the domain of higher education. The presented framework consists of three main types of visual data analysis: Discovering general insights, carrying out competitive
benchmarking, and planning for High School Relationship Management (HSRM). In this paper the framework and the square tiles visualization scheme are described and an application at a private university in Turkey with the goal of attracting brightest students is demonstrated.
http://research.sabanciuniv.edu.
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
This document discusses semantic networks and their applications in artificial intelligence. It begins with describing how semantic networks originated in psychology for analyzing relationships between concepts, and were later adopted by AI to analyze textual data. Key applications of semantic networks discussed include idea generation, visual text analysis, and conceptual design. The document also provides historical background on semantic networks and how they are used for knowledge representation.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
The document presents a new model for intelligent social networks based on semantic tag ranking. It uses a multi-agent system approach with agents performing indexing and ranking. For indexing, it uses an enhanced Latent Dirichlet Allocation (E-LDA) model that optimizes LDA parameters. Tags above a threshold from E-LDA output are ranked using Tag Rank. Simulation results showed improvements in indexing and ranking over conventional methods. The model introduces semantics to social networks to improve search and link recommendation.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share
their interests without being at the same geographical location. With the great and rapid growth of Social
Media sites such as Facebook, LinkedIn, Twitter...etc. causes huge amount of user-generated content.
Thus, the improvement in the information quality and integrity becomes a great challenge to all social
media sites, which allows users to get the desired content or be linked to the best link relation using
improved search / link technique. So introducing semantics to social networks will widen up the
representation of the social networks.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
This document summarizes a systematic literature review of 40 empirical studies on learning analytics and educational data mining from 2008-2013. The review aimed to document applied research approaches, identify strengths and weaknesses, and suggest opportunities for future research. Four major directions of LA/EDM empirical research were identified: 1) predicting student performance, 2) understanding student behavior, 3) improving educational systems, and 4) developing analytic methods/tools. The results highlighted the added value of LA/EDM in improving learning and informed decision making, but also identified opportunities to explore new technologies and research questions.
The document discusses mining frequent patterns from object-relational data. It begins with an introduction to data mining and frequent pattern mining. It then reviews related work on data models, including relational, object-oriented, and object-relational data models. Several frequent pattern mining algorithms are described, including Apriori, FP-Growth, ECLAT and RElim. Two approaches for mining object-relational data are proposed: a fundamental approach that treats it similarly to transactional data, and a nested-relations approach to handle nested attributes. The document concludes by outlining parameters for applying frequent pattern mining algorithms to object-relational data.
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...IJAEMSJORNAL
The current trend in the industry is to analyze large data sets and apply data mining, machine learning techniques to identify a pattern. But the challenges with huge data sets are the high dimensions associated with it. Sometimes in data analytics applications, large amounts of data produce worse performance. Also, most of the data mining algorithms are implemented column wise and too many columns restrict the performance and make it slower. Therefore, dimensionality reduction is an important step in data analysis. Dimensionality reduction is a technique that converts high dimensional data into much lower dimension, such that maximum variance is explained within the first few dimensions. This paper focuses on multivariate statistical and artificial neural networks techniques for data reduction. Each method has a different rationale to preserve the relationship between input parameters during analysis. Principal Component Analysis which is a multivariate technique and Self Organising Map a neural network technique is presented in this paper. Also, a hierarchical clustering approach has been applied to the reduced data set. A case study of Air quality measurement has been considered to evaluate the performance of the proposed techniques.
Data processing involves 5 key steps: editing data, coding data, classifying data, tabulating data, and creating data diagrams. It transforms raw collected data into a usable format through these steps of cleaning, organizing, and analyzing the data. First, data is collected from sources and prepared by cleaning errors. It is then inputted and processed using algorithms before being output and interpreted in readable formats. Finally, the processed data is stored for future use and reports.
Data processing involves 5 key steps: 1) editing data to check for errors or omissions, 2) coding data by assigning numerals or symbols to categories, 3) classifying data into groups with common characteristics, 4) tabulating data by organizing it into a table for comparison and analysis, and 5) creating data diagrams or visual representations like graphs. The goal of data processing is to transform raw collected data into a readable and interpretable format that can be analyzed and used within an organization.
Data mining , knowledge discovery is the process
of analyzing data from different perspectives and summarizing it
into useful information - information that can be used to increase
revenue, cuts costs, or both. Data mining software is one of a
number of analytical tools for analyzing data. It allows users to
analyze data from many different dimensions or angles, categorize
it, and summarize the relationships identified. Technically, data
mining is the process of finding correlations or patterns among
dozens of fields in large relational databases. The goal of
clustering is to determine the intrinsic grouping in a set of
unlabeled data. But how to decide what constitutes a good
clustering? It can be shown that there is no absolute “best”
criterion which would be independent of the final aim of the
clustering. Consequently, it is the user which must supply this
criterion, in such a way that the result of the clustering will suit
their needs.
For instance, we could be interested in finding
representatives for homogeneous groups (data reduction), in
finding “natural clusters” and describe their unknown properties
(“natural” data types), in finding useful and suitable groupings
(“useful” data classes) or in finding unusual data objects (outlier
detection).Of late, clustering techniques have been applied in the
areas which involve browsing the gathered data or in categorizing
the outcome provided by the search engines for the reply to the
query raised by the users. In this paper, we are providing a
comprehensive survey over the document clustering.
Similar to Linking Behavioral Patterns to Personal Attributes through Data Re-Mining (20)
Optimizing the electric charge station network of EŞARJertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/optimizing-the-electric-charge-station-network-of-esarj/
In this study, we adopt the classic capacitated p-median location model for the solution of a network design problem, in the domain of electric charge station network design, for a leading company in Turkey. Our model encompasses the location preferences of the company managers as preference scores incorporated into the objective function. Our model also incorporates the capacity concerns of the managers through constraints on maximum number of districts and maximum population that can be served from a location. The model optimally selects the new station locations and the visualization of model results provides additional insights.
Competitiveness of Top 100 U.S. Universities: A Benchmark Study Using Data En...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/benchmark-study-using-data-envelopment-analysis/
This study presents a comprehensive benchmarking study of the top 100 U.S. Universities. The methodologies used to come up with insights into the domain are Data Envelopment Analysis (DEA) and information visualization. Various approaches to evaluating academic institutions have appeared in the literature, including a DEA literature dealing with the ranking of universities. Our study contributes to this literature by the extensive incorporation of information visualization and subsequently the discovery of new insights.
Industrial Benchmarking through Information Visualization and Data Envelopmen...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/industrial-benchmarking-through-information-visualization-and-data-envelopment-analysis-a-new-framework/
We present a benchmarking study on the companies in the Turkish food industry based on their financial data. Our aim is to develop a comprehensive benchmarking framework using Data Envelopment Analysis (DEA) and information visualization. Besides DEA, a traditional tool for financial benchmarking based on financial ratios is also incorporated. The consistency/inconsistency between the two methodologies is investigated using information visualization tools. In addition, k-means clustering, a fundamental method from machine learning, is applied to understand the relationship between k-means clustering and DEA.
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/modelling-the-supply-chain-perception-gaps/
This study applies the research of perception gap analysis to supply chain integration and develops a generic model, the 3-Level Gaps Model, with the goal of contributing to harmonization and integration in the supply chain. The model suggests that significant perception gaps may exist among supply chain members with regards to the importance of different performance criteria. The concept of the model is conceived through an empirical and inductive approach, combining the research discipline of supply chain relationship and perception gap analysis. First hand data has been collected through a survey across a key buyer in the motor insurance industry and its eight suppliers. Rigorous statistical analysis testified the research hypotheses, which in turn verified the validity and relevance of the developed 3-Level Gaps Model. The research reveals the significant existence of supply chain perception gaps at all three levels as defined, which could be the root-causes to underperformed supply chain.
Risk Factors and Identifiers for Alzheimer’s Disease: A Data Mining Analysisertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/risk-factors-and-identifiers-for-alzheimers-disease-a-data-mining-analysis/
The topic of this paper is the Alzheimer’s Disease (AD), with the goal being the analysis of risk factors and identifying tests that can help diagnose AD. While there exists multiple studies that analyze the factors that can help diagnose or predict AD, this is the first study that considers only non-image data, while using a multitude of techniques from machine learning and data mining. The applied methods include classification tree analysis, cluster analysis, data visualization, and classification analysis. All the analysis, except classification analysis, resulted in insights that eventually lead to the construction of a risk table for AD. The study contributes to the literature not only with new insights, but also by demonstrating a framework for analysis of such data. The insights obtained in this study can be used by individuals and health professionals to assess possible risks, and take preventive measures.
Competitive Pattern-Based Strategies under Complexity: The Case of Turkish Ma...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/competitive-pattern-based-strategies-under-complexity-the-case-of-turkish-managers/
This paper aims to augment current Enterprise Architecture (EA) frameworks to become pattern-based. The main motivation behind pattern-based EA is the support for strategic decisions based on the patterns prioritized in a country or industry. Thus, to validate the need for pattern-based EA, it is essential to show how different patterns gain priority under different contexts, such as industries. To this end, this chapter also reveals the value of alternative managerial strategies across different industries and business functions in a specific market, namely Turkey. Value perceptions for alternative managerial strategies were collected via survey, and the values for strategies were analyzed through the rigorous application of statistical techniques. Then, evidence was searched and obtained from business literature that support or refute the statistically-supported hypothesis. The results obtained through statistical analysis are typically confirmed with reports of real world cases in the business literature. Results suggest that Turkish firms differ significantly in the way they value different managerial strategies. There also exist differences based on industries and business functions. Our study provides guidelines to managers in Turkey, an emerging country, on which strategies are valued most in their industries. This way, managers can have a better understanding of their competitors and business environment, and can develop the appropriate pattern-based EA to cope with complexity and succeed in the market.
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/a-tutorial-on-crossdocking/
In crossdocking, the inbound materials coming in trucks to the crossdock facility are directed to outbound doors and are directly loaded into trucks that will perform shipment, or are staged for a very brief time period before loading. Crossdocking has a great potential to bring savings in logistics: For example, most of the logistics success of Wal-Mart, the world’s leading retailer, is attributed to crossdocking.In this paper,the types of crossdocking are identified, the situations and industries where crossdocking is applicable are explained, prerequisites, advantages and drawbacks are listed, and implementation issues are discussed. Finally a case study that describes the crossdocking applications of a 3rd party logistics firm is presented.
Demonstrating Warehousing Concepts Through Interactive Animationsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/demonstrating-warehousing-concepts-through-interactive-animations/
In this paper, we report development of interactive computer animations to demonstrate warehousing concepts, providing a virtual environment for learning. Almost every company, regardless of its industry, holds inventory of goods in its warehouse(s) to respond to customer demand promptly, to coordinate supply and demand, to realize economies of scale in manufacturing or processing, to add value to its products and to reduce response time. Design, analysis, and improvement of warehouse operations can yield significant savings for a company. Warehousing science can be considered as an important field within the industrial engineering discipline. However, there is very little educational material (including web based media), and only a handful of books available in this field. We believe that the animations that we developed will significantly contribute to the understanding of warehousing concepts, and enable tomorrow’s practitioners to grasp the fundamentals of managing warehouses.
Application of the Cutting Stock Problem to a Construction Company: A Case Studyertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/application-of-the-cutting-stock-problem-to-a-construction-company-a-case-study/
This paper presents an application of the well-known cutting stock problem to a construction firm. The goal of the 1Dimensional (1D) cutting stock problem is to cut the bars of desired lengths in required quantities from longer bars of given length. The company for which we carried out this study encounters 1D cutting stock problem in cutting steel bars (reinforcement bars) for its construction projects. We have developed several solution approaches to solving the company’s problem: Building and solving an integer programming (IP) model in a modeling environment, developing our own software that uses a mixed integer programming (MIP) software library, and testing some of the commercial software packages available on the internet. In this paper, we summarize our experiences with all the three approaches. We also present a benchmark of existing commercial software packages, and some critical insights. Finally, we suggest a visual approach for increasing performance in solving the cutting stock problem and demonstrate the applicability of this approach using the company’s data on two construction projects.
Benchmarking The Turkish Apparel Retail Industry Through Data Envelopment Ana...ertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/benchmarking-the-turkish-apparel-retail-industry-through-data-envelopment-analysis-dea-and-data-visualization/
This paper presents a benchmarking study of the Turkish apparel retailing industry. We have applied the Data Envelopment Analysis (DEA) methodology to determine the efficiencies of the companies in the industry. In the DEA model the number of stores, number of corners, total sales area and number of employees were included as inputs and annual sales revenue was included as the output. The efficiency scores obtained through DEA were visualized for gaining insights about the industry and revealing guidelines that can aid in strategic decision making.
An Open Source Java Code For Visualizing Supply Chain Problemsertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/an-open-source-java-code-for-visualizing-supply-chain-problems/
In this paper, we decribe an open source Java class library for visualizing supply chain problems within a geographical context. The highly competitive markets and recent technological advances make the use of such supply chain network visualizations critical in both strategic and tactical levels. The most important characteristic of our work is its easy integration with any Java application. Our software differs from any other commercial and open source supply chain visualization tool by its simple structure, easy adoption and implementation and high compatibility. The main motivation of our study was to develop a simple – yet effective – library that would not require to learn and apply complicated visualization tools and data structures such as Geographical Information Systems (GIS). In this study, we illustrate the use of our visualization tool through maps of Turkey, Europe, North and South America, the United States and the NAFTA. We believe that ease of visualization offered by our open source tool will contribute to a multitude of projects in supply chain design, as well as increasing productive communication among practitioners, especially involved in strategic level decision making processes. We foresee that our supply chain visualization tool will fill a gap in this area with its simple but effective structure.
This paper discusses fundamental issues in dairy logistics in a tutorial format. We summarize findings of more than twenty student groups who carried out independent literature surveys and interviewed professionals in the industry. The critical issues in carrying out dairy products logistics, the logistics strategies that are employed by dairy producers in the world and some newly introduced products in the industry and in what ways the introduction of these new products changes the logistics operations are pointed out. The importance of hygiene, cooling, time, humidity, cost, distance, flexibility and meeting the demand is emphasized under the subtitle of critical issues. Except those critical issues, there are some others like short shelf life, quality, emulsion, pasteurization, UHT which depend on the characteristics of the milk and milk products. Logistics strategies in dairy industry are studied by dividing it into two subtitles: the ones that are used in the world and the ones in Turkey. A benchmarking between Turkey and the world is also included at the end. As the variety of milk and milk products increase day by day, the new ingredients of new products also affects the transportation plans. Those impacts are also discussed as a part of our paper. Some descriptive drawings and figures are also embodied. Throughout this paper, only the production, warehousing and transportation of milk, cheese, yoghurt, and similar dairy products are discussed. Ice-cream especially is set out of the scope as it completely differs from actual dairy products as milk, cheese and yoghurt in the means of production and distribution.
Innovation in Product Form And Function: Customer Perception Of Their Valueertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/innovation-in-product-form-and-function-customer-perception-of-their-value/
The goal of product design is to obtain the maximum effect with minimum cost in functionality and aesthetic beauty. Consumers are attracted to the designs that reflect their use behaviors and psychological responses more than they are to the simple visual representations. When product functions and qualities are similar across products, customers make their purchasing decision upon aesthetic form. Form presents a significant competitive factor that improves the value of a product. Overall, the purpose of this study is to examine the most important product design factors that affect the market share trends of mobile phone companies. Study uses product characteristics for 1,028 mobile phones released between 2003 and 2008 as a case study. The multiple linear regression analysis is used to select highly correlated variables that influence the market share, and Mallow's Cp method is used to determine the best-fitting model. The Partial Regression Coefficients are used to evaluate the relative importance of design criteria. The nine mobile phone design features that affect the market share were identified, and the block form style is determined as the most important design factor. Using these approaches, this study demonstrates how investments should be directed in the next mobile phone design process.
Developing Competitive Strategies in Higher Education through Visual Data Miningertekg
Download Link > https://ertekprojects.com/gurdal-ertek-publications/blog/visual-data-mining-for-developing-competitive-strategies-in-higher-education/
Information visualization is the growing field of computer science that aims at visually mining data for knowledge discovery. In this paper, a data mining framework and a novel information visualization scheme is developed and applied to the domain of higher education. The presented framework consists of three main types of visual data analysis: Discovering general insights, carrying out competitive benchmarking, and planning for High School Relationship Management (HSRM). In this paper the framework and the square tiles visualization scheme are described and an application at a private university in Turkey with the goal of attracting bright-est students is demonstrated.
The report *State of D2C in India: A Logistics Update* talks about the evolving dynamics of the d2C landscape with a particular focus on how brands navigate the complexities of logistics. Third Party Logistics enablers emerge indispensable partners in facilitating the growth journey of D2C brands, offering cost-effective solutions tailored to their specific needs. As D2C brands continue to expand, they encounter heightened operational complexities with logistics standing out as a significant challenge. Logistics not only represents a substantial cost component for the brands but also directly influences the customer experience. Establishing efficient logistics operations while keeping costs low is therefore a crucial objective for brands. The report highlights how 3PLs are meeting the rising demands of D2C brands, supporting their expansion both online and offline, and paving the way for sustainable, scalable growth in this fast-paced market.
The Role of White Label Bookkeeping Services in Supporting the Growth and Sca...YourLegal Accounting
Effective financial management is important for expansion and scalability in the ever-changing US business environment. White Label Bookkeeping services is an innovative solution that is becoming more and more popular among businesses. These services provide a special method for managing financial duties effectively, freeing up companies to concentrate on their main operations and growth plans. We’ll look at how White Label Bookkeeping can help US firms expand and develop in this blog.
Enhancing Adoption of AI in Agri-food: IntroductionCor Verdouw
Introduction to the Panel on: Pathways and Challenges: AI-Driven Technology in Agri-Food, AI4Food, University of Guelph
“Enhancing Adoption of AI in Agri-food: a Path Forward”, 18 June 2024
NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...BBPMedia1
Nathalie zal delen hoe DEI en ESG een fundamentele rol kunnen spelen in je merkstrategie en je de juiste aansluiting kan creëren met je doelgroep. Door middel van voorbeelden en simpele handvatten toont ze hoe dit in jouw organisatie toegepast kan worden.
Discover the Beauty and Functionality of The Expert Remodeling Serviceobriengroupinc04
Unlock your kitchen's true potential with expert remodeling services from O'Brien Group Inc. Transform your space into a functional, modern, and luxurious haven with their experienced professionals. From layout reconfiguration to high-end upgrades, they deliver stunning results tailored to your style and needs. Visit obriengroupinc.com to elevate your kitchen's beauty and functionality today.
The Steadfast and Reliable Bull: Taurus Zodiac Signmy Pandit
Explore the steadfast and reliable nature of the Taurus Zodiac Sign. Discover the personality traits, key dates, and horoscope insights that define the determined and practical Taurus, and learn how their grounded nature makes them the anchor of the zodiac.
SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA MATKA RESULT KALYAN MATKA TIPS SATTA MATKA MATKA COM MATKA PANA JODI TODAY BATTA SATKA MATKA PATTI JODI NUMBER MATKA RESULTS MATKA CHART MATKA JODI SATTA COM INDIA SATTA MATKA MATKA TIPS MATKA WAPKA ALL MATKA RESULT LIVE ONLINE MATKA RESULT KALYAN MATKA RESULT DPBOSS MATKA 143 MAIN MATKA KALYAN MATKA RESULTS KALYAN CHART
SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA MATKA RESULT KALYAN MATKA TIPS SATTA MATKA MATKA COM MATKA PANA JODI TODAY BATTA SATKA MATKA PATTI JODI NUMBER MATKA RESULTS MATKA CHART MATKA JODI SATTA COM INDIA SATTA MATKA MATKA TIPS MATKA WAPKA ALL MATKA RESULT LIVE ONLINE MATKA RESULT KALYAN MATKA RESULT DPBOSS MATKA 143 MAIN MATKA KALYAN MATKA RESULTS KALYAN CHART INDIA MATKA KALYAN SATTA MATKA 420 INDIAN MATKA SATTA KING MATKA FIX JODI FIX FIX FIX SATTA NAMBAR MATKA INDIA SATTA BATTA
During the budget session of 2024-25, the finance minister, Nirmala Sitharaman, introduced the “solar Rooftop scheme,” also known as “PM Surya Ghar Muft Bijli Yojana.” It is a subsidy offered to those who wish to put up solar panels in their homes using domestic power systems. Additionally, adopting photovoltaic technology at home allows you to lower your monthly electricity expenses. Today in this blog we will talk all about what is the PM Surya Ghar Muft Bijli Yojana. How does it work? Who is eligible for this yojana and all the other things related to this scheme?
The Most Inspiring Entrepreneurs to Follow in 2024.pdfthesiliconleaders
In a world where the potential of youth innovation remains vastly untouched, there emerges a guiding light in the form of Norm Goldstein, the Founder and CEO of EduNetwork Partners. His dedication to this cause has earned him recognition as a Congressional Leadership Award recipient.
The Most Inspiring Entrepreneurs to Follow in 2024.pdf
Linking Behavioral Patterns to Personal Attributes through Data Re-Mining
1. Ertek, G., Demiriz, A., Çakmak, F. (2012) “Linking Behavioral Patterns to Personal Attributes
through Data Re-Mining” in Behavior Computing: Modeling, Analysis, Mining and Decision. Eds:
Longbing Cao, Philip S. Yu. Springer.
Note: This is the final draft version of this paper. Please cite this paper (or this final draft) as
above. You can download this final draft from http://research.sabanciuniv.edu.
Linking Behavioral Patterns to Personal Attributes
through Data Re-Mining
Gürdal Ertek1, Ayhan Demiriz2, and Fatih Cakmak3
1Sabancı University, Faculty of Engineering and Natural Sciences
Orhanli, Tuzla, 34956, Istanbul, Turkey.
2Department of Industrial Engineering
Sakarya University, 54187, Sakarya, Turkey.
3Sabancı University, Faculty of Arts and Social Sciences
Orhanli, Tuzla, 34956, Istanbul, Turkey.
2. Linking Behavioral Patterns to Personal Attributes
through Data Re-Mining
G¨ rdal Ertek1 , Ayhan Demiriz2 , and Fatih Cakmak3
u
1Sabancı University, Faculty of Engineering and Natural Sciences
Orhanli, Tuzla, 34956, Istanbul, Turkey. ertekg@sabanciuniv.edu
2 Department of Industrial Engineering
Sakarya University, 54187, Sakarya, Turkey. ademiriz@gmail.com
3 Sabancı University, Faculty of Arts and Social Sciences
Orhanli, Tuzla, 34956, Istanbul, Turkey.
Abstract. A fundamental challenge in behavioral informatics is the development
of methodologies and systems that can achieve its goals and tasks, including be-
havior pattern analysis. This study presents such a methodology, that can be con-
verted into a decision support system, by the appropriate integration of existing
tools for association mining and graph visualization. The methodology enables
the linking of behavioral patterns to personal attributes, through the re-mining
of colored association graphs that represent item associations. The methodology
is described and mathematically formalized, and is demonstrated in a case study
related with retail industry.
1 Introduction
This study aims at understanding the behavioral patterns exhibited by people in relation
to their personal attributes. The research is conducted in the context of retail indus-
try, where consumers engage in purchase transactions at retail shops and stores. The
traditional data mining technique for identifying the patterns in these transactions is
association mining, which enables the discovery of interpretable and actionable results
related with item associations. However, straightforward application of association min-
ing returns only item purchase patterns. An important question, whose answer has been
ignored in literature, is how these patterns are related to consumer attributes, such as
demographic attributes and physical state of the consumer during the purchase. In other
words, the link between the behavioral pattern (consumer purchase behavior) and the
person (consumer) him/herself is missing. Establishing this link requires a methodol-
ogy, as well as domain knowledge to enable domain-driven data mining [5].
A graph-based visualization methodology, namely AssocGraphRM, is proposed for
presenting association mining results, together with summary statistics regarding the
associations. The methodology suggests a visual data re-mining process, based on the
results generated by association mining. In the graph-representation, items and itemsets
are represented as vertices, set membership are represented through edges, and attribute
statistics are linearly mapped to the colors of vertices. The applicability and useful-
ness of the methodology is demonstrated through a market basket analysis (MBA) case
study where data from a consumer survey is analyzed. The survey contains a multitude
3. of personal attributes, as well as preferences for items at Starbucks coffee stores. Sev-
eral actionable insights are derived regarding the relationship between the behavioral
patterns (item purchases) and the personal attributes, and their policy implications are
discussed.
The remainder of the chapter is organized as follows: In Section 2, an overview of
the basic concepts in related studies is presented through a concise literature review. In
Section 3, the AssocGraphRM methodology for visual re-mining on colored association
graphs is described, and framed as an algorithm using mathematical formalism. The
methodology and its applicability is then demonstrated in Section 4, using survey data
from food retail industry. The validity of the methodology is discussed in Section 5.
Finally, in Section 6, the study is summarized and future directions are discussed.
2 Literature
2.1 Behavior Informatics
The field of behavior informatics is introduced by Cao [4], and suggests the analysis of
behavioral patterns and impacts following behavior explicitation, through the extraction
of behavior elements masked in transactional data. The main goals and tasks of behav-
ior informatics are listed in [4] as behavior modeling and representation, construction
of behavioral data, behavior impact modeling, behavior pattern analysis, and behavior
presentation. The main idea in behavioral informatics is to organize the transactional
data into a new form that is constructed in terms of behavior, rather than entity relation-
ships. With the explosion of data that is collected electronically in massive amounts,
the main challenge in behavioral informatics is the development of methodologies and
systems that can achieve its goals and tasks. This study presents such a methodology,
that can be converted into a decision support system, by the appropriate integration of
existing tools for association mining and graph visualization.
2.2 Association Mining
Association mining is an increasingly used data mining and business tool among prac-
titioners and business analysts [10], due to the interpretable and actionable results it
generates. Association mining results can be classified based on several criteria, as out-
lined in [18]. In this chapter, we focus on frequent itemset, which are the sets that define
single-dimensional, single-level boolean association rules. Efficient algorithms such as
Apriori [1] enable the analysis of very large transactional data, frequently from trans-
actional sales data, resulting in a large collection of frequent itemsets.
Association mining is typically presented and discussed in the context of one of
its most common applications, namely market basket analysis (MBA), which can be
used in product recommender systems [10]. Let I = {i1 , i2 , ..., im } be a set of items
considered in MBA. Each transaction (basket) t will consist of a set of items where
t ⊆ I. Let D = ∪ t be the database of all transactions. The support sup( f ) of an itemset
(and also of the rule that contains the items in that itemset) is defined as the percentage
of the transactions in D that contain all the items of the itemset f :
4. ∑ 1{ f ⊆t}
t∈D
sup( f ) = (1)
|D|
A frequent itemset is an itemset that has support value greater than or equal to a
given minimum support threshold: sup( f ) ≥ min sup.
2.3 Visualizing Frequent Itemsets
Commonly, finding the frequent itemsets and association rules from very large data sets
is heavily emphasized, since it is considered as the most challenging step in associa-
tion mining [6, 17, 41, 42]. Results are typically presented in a text (or table) format
with certain degree of querying and sorting functionalities. However, the real objective
of the association mining analysis is to foster the discovery of insights, and there ex-
ists considerably less work that focuses on the interpretation of the association mining
results [14, 14].
Information visualization is the branch of computer science that investigates how
data and information can be visualized to obtain significant, deep, actionable insights
[9, 19, 23, 27]. Within information visualization, graph visualization can be a signifi-
cant source of insights, as demonstrated by numerous case studies in a multitude of
disciplines [8, 26, 30–33, 35, 38, 36]. There is a broad literature on graph visualization,
including the literature on graph drawing, but the use of graphs for the visualization
of association mining results is not well-formalized in academic literature. Still, data
analysis systems such as MS SQL Server [28] and SAS [34] can generate association
graphs.
This study is an extension of earlier work by Ertek and Demiriz [14], where a graph-
based methodology is proposed to visualize and interpret the results of well-known as-
sociation mining algorithms as directed graphs. According to the methodology in [14],
the items (also referred to as 1-itemsets) and the itemsets are represented as vertices
on an association graph. The vertices that represent the items are shown with no color,
whereas the vertices that represent the itemsets are colored reflecting the cardinality of
the itemsets. The sizes (the areas) of the vertices show the support levels. The directed
edges symbolize which items constitute a given frequent itemset.
The main idea in the methodology is to exploit already existing graph drawing algo-
rithms [37] and software in the information visualization literature [19] for visualizing
association mining results which are generated by already existing algorithms and soft-
ware in the data mining literature [18].
In the current study, the methodology in [14] is extended to incorporate additional
attributes, and mathematical formalism is introduced for describing the methodology.
Color is now used to represent the values of a selected additional attribute, instead of
the cardinality of the itemset. The cardinality of the itemset is instead reflected by the
thickness of the vertices. So, this extended study enables linking association mining
results with attributes of the person that carried out the transaction. In the context of
behavior informatics, the methodology establishes the critical link between behavioral
patterns and personal attributes.
5. 2.4 Re-Mining
The re-mining methodology was first introduced by Demiriz et al. [11]. Re-mining pro-
cess is defined as “combining the results of an original data mining process with a
new additional set of data and then mining the newly formed data again”. Re-mining
is fundamentally different from post-mining [7, 22, 25, 43, 44]: post-mining only sum-
marizes the data mining results, such as visualizing the association mining results [14,
21]. The re-mining methodology extends and generalizes post-mining. Re-mining can
be considered as an additional data mining step of Knowledge Discovery in Databases
(KDD) process [24] and can be conducted in explanatory/exploratory, descriptive, and
predictive manners.
In another study, Demiriz et al. [12] elaborate on the re-mining concept, introduce
mathematical formalism and present the algorithm for the methodology. [12] also ex-
tends the application of predictive re-mining in addition to exploratory and descriptive
re-mining, and presents a complexity analysis.
Quantitative and multi-dimensional association mining (QAM&MAM) are well-
known techniques within association mining [18] that can integrate additional attribute
data into the association mining process. The associations among the additional at-
tributes, and among them and itemsets are computed. However, both techniques intro-
duce significant additional complexity, since association mining is carried out with the
complete set of attributes rather than just the market basket data. The techniques work
directly towards the generation of multi-dimensional rules. They relate all the possible
categorical values of all the attributes to each other, which is NP-hard.
Re-mining, on the other hand, conveniently expands single dimensional rules with
additional attributes. In re-mining, attribute values are investigated and computed only
for the associated item pairs, with much less computational complexity that can be
solved in polynomial running time. Running time of QAM&MAM increase exponen-
tially with the number of additional attributes and the number of transactions, and re-
mining is even more preferable in such situations.
Demiriz et al. [11, 12] propose a practical and effective methodology that efficiently
enables the incorporation of attribute data (e.g. price, category, sales timeline) in ex-
plaining positive and negative item associations, which respectively indicate the com-
plementarity and substitution effects.
The work closest to re-mining is by Yao et al. [40], where a framework of a learning
classifier is proposed to explain the mined results. Unlike in [40], the re-mining [11, 12]
and visual re-mining approaches (this study) are applied to real world datasets as a proof
of their applicability.
3 Methodology: Re-Mining on Association Graphs
The fundamental idea in re-mining is to exploit the domain specific knowledge in a
new analysis step. Thus, re-mining is a recipe for domain-driven data mining [5]. The
AssocGraphRM methodology proposed and described in this chapter is a special type of
re-mining: Re-mining is conducted through visually mining colored association graphs.
By introducing mathematical formalism the methodology is presented in the form of an
algorithm.
6. In AssocGraphRM, following the execution of conventional association mining and
the generation of the frequent itemsets, a new database R is formed from the fre-
quent itemsets F and additional attributes A, and then exploratory visual analysis is
performed. Visual re-mining consists of the following main steps:
Step 1: Carry out association mining
Step 2: Compute the statistics for the frequent itemsets
Step 3a: Represent the frequent itemsets as a directed graph
Step 3b: Map the computed statistics linearly to colors
Step 3c: Color the graph with respect to this coloring scheme
Step 4: Visually explore the colored graphs and discover actionable insights
The graph is constructed by following the design specifications below:
a. Each item(set) is represented as a vertex.
b. Area of each vertex is proportional to the support of the corresponding item(set).
c. For itemsets with more than single item, edges are drawn from the (vertices of the)
items of that itemset to the (vertex of the) itemset.
d. Line thickness of each vertex is proportional to the number of items in the corre-
sponding itemset.
e. Color of each vertex reflects the value of a selected attribute for the corresponding
itemset.
The inputs for the algorithm, parameters to be decided before the algorithm run, and
additional definitions involving sets and functions are presented below:
Inputs
I: set of items; i ∈ I
D: set of transactions, containing only the itemset information
An : additional numerical attribute n introduced for re-mining; n = 1 . . . N. Any
non-numerical (categorical/ordinal) attribute can be converted into a nu-
merical attribute by computing the percentage of transactions for a specific
value of the categorical/ordinal attribute. For example, if the original at-
tribute is the gender of the person involved in the transaction, then let An be
the percentage of transactions containing the given itemset and involving a
female.
A: set of all attributes introduced for re-mining; A = ∪An
Parameters
min sup: minimum support required for frequent itemsets
min items: minimum number of items in the itemsets
max items: maximum number of items in the itemsets
7. min diameter: diameter for the item with min sup. Note that the diameter of an itemset
may be smaller than this value, but that of an item can not.
Definitions
F1 : set of frequent items (1-itemsets); f ∈ F1
F>1 : set of frequent k-itemsets (k > 1) that have positive association; f ∈ Fk
F: set of all frequent itemsets (k-itemsets, with k = 1 . . . max items); f ∈ F
min valuen : minimum value for attribute n, over all frequent itemsets F
max valuen : maximum value for attribute n, over all frequent itemsets F
R: set of records for re-mining, that contain positive associations; r ∈ R
G(V, E): association graph that represents the frequent itemsets with vertices V and
edges E
G: graph collection that will be used for visual exploration in the re-mining phase
Functions
apriori(D, min items, max items, min sup):
apriori algorithm that operates on D and generates frequent itemsets with mini-
mum of min items items, maximum of max items, and a minimum support value
of min sup
sup( f ):
support of itemset f
ψ(An , f ):
function that computes the value of attribute An for a given frequent itemset f ,
f ∈ Fk . Once the function runs for a particular itemset, it stores the information in
its corresponding record record of( f ), and returns from the record next time it is
run.
create new vertex(v) :
function that creates a new vertex v
vertex of( f ) :
function that returns the vertex that corresponds to itemset f
create new edge(e) :
function that creates a new edge e
record of( f ) :
function that returns the record that corresponds to itemset f
clone graph(G) :
8. function that creates a clone of graph G
compute color(δ )
function that computes color based on darkness δ ∈ [0, 1]. Besides the argument δ ,
the RGB value for the computed color depends on the base RGB values for δ = 0
and δ = 1.
The complete methodology is formalized as an algorithm below:
Algorithm: AssocGraphRM
1. Perform association mining.
F = apriori(D, 1, max items, min sup)
2. Define frequent items and itemsets.
F1 = { f ∈ F : | f | = 1}
F>1 = F − F1
3. Label the item associations accordingly and append them as new records.
R = {r : r = ( f ), ∀ f ∈ F}
4. Expand the records with the cardinality value for the itemsets, and additional at-
tributes for re-mining.
for all r = ( f ) ∈ R
r = ( f , | f |)
for n = 1 . . . N
r = (r, ψ(An , f ))
5. Create the vertices of the graph, with area being linearly proportional to the sup-
port, and thickness being proportional to the cardinality of the itemset.
V = {}
for all f ∈ F
create new vertex(v)
v.itemset = f
v.record = record of( f )
sup( f )
v.diameter = min diameter min sup
v.thickness = | f |
vertex of( f ) = v
V v
6. Create the edges of the graph, emanating from the items in the itemsets and termi-
nating at the itemsets.
E = {}
for all f ∈ F>1
for all i ∈ f
create new edge(e)
e. f rom = vertex of(i)
e.to = vertex of( f )
E e
9. 7. Apply organic layout on G.
8. Compute the minimum and maximum values for each of the attributes.
min valuen = minn=1...N, f ∈F ψ(An , f )
max valuen = maxn=1...N, f ∈F ψ(An , f )
9. Color the vertices in G with respect to each additional attribute, and add to the
graph collection G . The closer the value for attribute n gets to the maximum value
of that attribute max valuen , the darker the vertex will be colored.
for n = 1 . . . N
G = clone graph(G)
for all v ∈ V
f = v.itemset
ψ(An , f )−min
v.darkness = max valuen −minvaluen n ∈ [0, 1]
value
v.color = compute color(v.darkness)
G G
10. Perform visual re-mining through human-involved exploratory examination of the
graphs in G .
4 Case Study
The proposed methodology is demonstrated through a case study using a survey dataset
collected from coffee retail industry. Several insights are discovered regarding the rela-
tionships among the frequent itemsets and personal attributes, and suggestions are made
on how these actions might be used used as operational policies.
4.1 Retail Industry
Recent research has positioned association mining as one of the most popular tools in
retail analytics [3]. Market basket analysis is considered as a motivation, and is used
as a test bed for these algorithms. Additional data are readily available either within
the market basket data or as additional data, thanks to loyalty cards in retailing, which
enable linking transactions to personal data, such as age, gender, county of residence,
etc.
As of November 2010, the retail industry in US alone runs on a monthly sales of
$377.5 billion, with food services and food retail industry constituting %10 share in
it [39]. Due to its gigantic size, retail industry has been selected as the domain of the
case study. Starbucks is one of the best-known brands in food retail, and the best-known
brand in coffee retail / specialty eateries industry, with 137,000 employees and a global
monthly revenue of nearly $1 billion [39]. Due to the company’s visibility, the products
of Starbucks have been considered for constructing the survey data.
4.2 The Data
A survey has been conducted with 644 respondents, that contain nearly equal distribu-
tion of working people vs. students (all students were assumed non-working), women
10. vs. men, and a multitude of universities and working environments. Each respondent
was questioned for 22 attributes that reflect their demographic characteristics and life
style preferences.
The fields in the dataset include demographic attributes (YearOfBirth, Gender,
EmploymentStatus, IncomeType, etc.), attributes related with life style (FavoriteColor,
SoccerTeam, FavoriteMusicGenre, etc.), educational background (University, En-
glishLevel, FrenchLevel, etc.), perceptional and intentional information (ReasonForGoing,
etc.), and physical status at the time the survey was conducted (HungerLevel, ThirstLevel).
The number of additional attributes to be used in Step 8 of the algorithm totalled to 21.
As the transaction data, each respondent was also asked which items they would
prefer from the menu of Starbucks Turkey stores if they had 15 TL Turkish Liras (ap-
proximately $10). The menu considered was the menu as of October 2009, and the
respondents were limited to select at most four items without exceeding the budget
limit. While the original survey distinguished between the different sizes (tall, grande,
venti), in the data cleaning and preparation phase, the sizes for each type of item (such
as Cafe Latte or Capuchino) were aggregated and considered as a single item (group).
The original survey also contained preferences under a budget of 20 TL, but those pref-
erences were not analyzed in the case study.
Even though market basket analysis is carried out in retail industry with transactions
data that is logged through sales, preference data was collected in the survey and used
instead of the transactions data. This is due to the well-known difficulty of obtaining
real world transactions data from companies, which they consider highly confidential,
even when masked.
4.3 The Process
Data was assembled in MS Excel and cleaned following the guidelines in the taxonomy
of dirty data by Kim et al. [20]. The transactions were given as input into Borgelt’s
apriori software [2], and the apriori algorithm was run with a support value of 2%.
The sizes of the vertices (based on the support values), the statistics for the frequent
itemsets, and the corresponding vertex colors were computed in MS Excel spreadsheet
software through distributed manual processing by 30 Sabancı University students, and
assembled through the EditGrid [13] online spreadsheet service. The association graph
was manually drawn in yEd Graph Editor software and an organic layout was applied.
yEd implements several types of graph drawing algorithms, including those that cre-
ate hierarchical, organic, orthogonal, and circular layouts, and allows customization of
the layouts through structure and parameter selections. Past experience with the soft-
ware in applied research projects has shown that in Classic Organic Layout in yEd
is especially suitable for displaying associations. Organic layout is generated based on
force-directed placement algorithms [16] in graph drawing. This layout selection ends
up placing items that belong to similar frequent itemsets and in close proximity of each
other. After constructing the base graph in yEd, the yEd graphml file was cloned, and
each copy was colored manually according to a different additional attribute. Then the
resulting collection of graphs were analyzed through brain-storming sessions and ac-
tionable insights, together with their policy implications, were determined.
11. 4.4 Analysis and Results
Visual re-mining was performed on the colored association graphs in the collection
G through human-involved exploratory examination of the graphs. Figures 1-4 are se-
lected graphs from G that are constructed for this case study. In the figures, regions of
the graphs are highlighted for illustrating the insights and policies.
Figure 1 shows only the frequent item preferences of the participants in the case
study. In this figure, the large region shows that Mosaic Cake and White Chocolate
Mocha are selected by a large percentage of the people, and they are purchased together
frequently, as represented by the large size of the corresponding vertex F11. The small
region suggests that Chai Tea Latte and Lemon Cake are purchased frequently with
each other, but not with other items. In the context of retailing, these are referred to as
complementary items, and the classic operational policy is to use each of these items
for increasing the sale of the other(s):
Policy 1: “Bundle Mosaic Cake, White Chocolate Mocha and Water together,
and/or target cross-selling by suggesting the complementary item when the other is
ordered.”
Policy 2: “Bundle Chai Tea Latte and Lemon Cake together, and/or target cross-
selling by suggesting the complementary item when the other is ordered.”
Besides identifying the most significant and interdependent items, one can also ob-
serve items that are independent from all items but one. The central item is an attractor,
Fig. 1. Association graph that displays the frequent itemsets.
12. drawing attention to less popular items related to them. In retail industry, attractor items
are placed at visible locations (ex: the aisle ends) to attract customers to the items in
nearby but less visible locations (ex: inside the aisles).
Another type of insight that can be derived from Figure 1 is the identification of
items which form frequent itemsets with the same item(s) but do not form any frequent
itemsets with each other. One such pattern is indicated with the dashed borderline.
Caramel Frappucino and Java Chip Chocolate each independently form frequent
itemsets with Mosaic Cake, but do not form frequent itemsets with one another. These
items may be substitute items and their relationship deserves further investigation.
While Figure 1 illustrates behavioral patterns with regards to item purchases, it does
not tell us how these patterns relate to attributes of people. The mapping of attributes
to colors on the graph in the proceeding figures solves this problem, and enables the
discovery of deeper additional insights in more dimensions.
Figure 2 displays the items and itemsets together with gender attribute (percentage
of females selecting that itemset, computed from the data field Gender) mapped to
color. Darker vertices (itemsets) indicate that among the people that selected that item-
set, the percentage of females is higher compared to males. Even though the vertex sizes
and locations are exactly the same as the earlier figure, new insights are derived due to
the coloring. The selected region shows that Mosaic Cake is purchased with either
Java Chip Chocolate, as represented by the itemset F08, or with Caramel Frappu-
cino, as represented by the itemset F07. However, F08 and F07 are colored clearly
differently, with F08 being darker. This means that the percentage of women among
those that prefer the itemset F08 (Mosaic Cake and Java Chip Chocolate) is higher.
Fig. 2. Gender attribute (percentage of females selecting each itemset) mapped to color.
13. So, if only the content of the selected region is considered, the following policy can be
applied:
Policy 3: “If a male customer orders Mosaic Cake, try to cross-sell to him Caramel
Frappucino by suggesting that item; otherwise, if a female customers orders it, try to
cross-sell to her Java Chip Chocolate.”
Figure 3 displays the knowledge of French language (FrenchLevel) mapped to
color. Darker vertices indicate that among the people that selected that itemset, the
level of knowledge for the French language is higher on the average. The k-itemset
(k > 1) with the darkest color is F07, which is the preference for the items Mosaic
Cake and Caramel Frappucino together. Items that have the darkest colors, in order
of decreasing support, are Very Chocolate Muffin, Doppio Cookie, Orange Juice,
Caramel Machiato, and Java Chip. This discovery can be used in conjunction with
geographical location of the stores, as the next policy suggests:
Policy 4: “If the store is located near a university or high-school where the language
of instruction is French, then emphasize Very Chocolate Muffin, Doppio Cookie,
Orange Juice, Caramel Machiato and Java Chip as stand-alone products, and the
item pair Mosaic Cake and Caramel Frappucino as a bundle.”
Figure 4 displays the hunger level of the customer (HungerLevel) mapped to color,
where darker colors denote higher hunger level on the average. It is observed that none
Fig. 3. Knowledge of the French language (average French knowledge of those selecting each
itemset) mapped to color.
14. of the k-itemsets with k > 1 have white color. This is consistent with what would be
expected, since a person who is not hungry is unlikely to order many items. The items
that can be offered to a person, even if he is not hungry at all, are suggested in the next
policy:
Policy 5: “If, at the point of sale (POS), a customer does not seem to be hungry,
suggest Iced White Chocolate Mocha, Java Chip or Orange Juice.”
The coloring of the vertices revealed insights on the behavioral patterns, explaining
them through personal attributes. The fact that the association graphs can be interpreted
even by the least technical analysts is a big advantage and a great motivation for using
the methodology in the real world.
5 Validity
A fundamental question, regarding the validity of the proposed AssocGraphRM method-
ology, can be posed in the line of the classical dilemma of statistical data analysis [29]:
“Is the discovered relation a result of causality, or is it just correlation?” For example,
considering Policy 4 in Section 4, does a person who is fluent in French order Or-
ange Juice due to his knowledge of French, or due to some other reason, which would
explain both his language proficiency and preference for Orange Juice? For practi-
cal purposes, this is not a problem. Even if the underlying attribute for the behavior
Fig. 4. Hunger level attribute (average hunger level for those selecting each itemset) mapped to
color.
15. is hidden, the visible attribute FrenchLevel signals the presence of the detected spe-
cific behavior. The person should still be offered Orange Juice in the store (Policy 4),
especially if he does not seem to be hungry (Policy 5).
A notable shortcoming of the methodology, related with the above issue, is that it
enables re-mining only on a single additional attribute. However, there may be con-
flicting outcomes, or deeper interactions at deeper levels of analysis that would make
the policies suggested at single-level depth invalid. For example, Policy 4 in the case
study suggests Orange Juice to a person who is fluent in French. However, Policy
5 suggest the same item to customers who are not hungry at all. So what should be
done when a hungry French-speaking person arrives? Due to FrenchLevel attribute,
he should be offered Orange Juice, but due to HungerLevel attribute, he should def-
initely not be offered that item. One way to resolve this conflict would be to offer only
“safe” items, which do not create a conflict. For the described example, Doppio Cookie
and Caramel Machiato are two items that cater to both French-speaking people and to
hungry people. So these items can be suggested to the mentioned customer.
A threat to validity of the analysis in the case study is the validity of the collected
data. Preferences for food items are heavily influenced by the time of the day, as well
as temperature and other conditions under which the data was collected. A French-
speaking person would most probably prefer to drink Orange Juice in a warmer day,
and a hot Caramel Machiato on a colder day, but Policy 4 does not currently differen-
tiate between the two situations. The survey did not record all such conditions and thus
the threat to the validity of the listed policies is indeed pertinent. However, the main
contribution of this study is the AssocGraphRM methodology, rather than the specific
policies, and this threat to validity does not affect the main contribution.
It is crucial that the policies obtained through AssocGraphRM be handled through
a scientific analysis-based approach: They should be put into computational models for
justifying their feasibility quantitatively. For example, Policy 3 in the case study of Sec-
tion 4 suggests that Caramel Frappucino should be offered to a male customer, rather
than Java Chip Chocolate, since he has a higher chance of accepting the former offer.
But what if the profit margin of the latter was much higher? Then it might be feasible to
offer to the customer the same item that is offered to the female customer. So the supe-
riority of this policy can not be guaranteed by our methodology alone, without a formal
fact-based numerical analysis. Thus, the policies should not be applied in isolation, but
taking into consideration other critical information, and their interactions.
6 Conclusion and Future Work
A novel methodology was introduced for knowledge discovery from association mining
results. The applicability of the methodology was illustrated through a market basket
analysis case study, where frequent itemsets derived from transactional preference data
were analyzed with respect to the personal attributes of survey participants. The theo-
retical contribution of the study is the methodology, which is formally described as an
algorithm. The practical contribution of the study is the proof-of-concept demonstration
of the methodology through a case study.
16. In every industry, especially food retail industry, new products emerge and con-
sumer preferences change at a fast pace. Thus one would be interested in laying the
foundation of an analysis framework that can fit to the dynamic nature of retailing data.
The presented methodology can be adapted for analysis of frequent itemsets and as-
sociation rules over time by incorporating latest research on evolving graphs [15] and
statistical tests for measuring the significance of changes over time.
The main motivation of the chapter is the discovery of behavioral patterns in relation
to the human that exhibited the behavior. The methodology can be applied to similar
data from different fields that study the behavior of agents individually and in relation
to each other, including psychology, sociology, behavioral economics, behavior-based
robotics, and ethology.
Acknowledgement
˙
The authors thank Ilhan Karabulut for her work that inspired this research, Ahmet
Sahin¨ z for creating colored graphs with the earlier datasets, that inspired the final
¸ o
form of the graphs. The authors thank Samet Bilgen, Dilara Naibi, Ahmet Memiso˜ lu,
¸ g
and Namık Kerenciler for collecting the data used in the study, and to Didem Cansu
Kurada for her insightful suggestions regarding the study.
References
1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In
Proceedings of the 20th International Conference on Very Large Data Bases, pages 487–499.
Morgan Kaufmann Publishers Inc., 1994.
2. C. Borgelt. http://fuzzy.cs.uni-magdeburg.de/˜borgelt/apriori.
html, 2011.
3. T. Brijs, G. Swinnen, K. Vanhoof, and G. Wets. Building an association rules framework to
improve product assortment decisions. Data Mining and Knowledge Discovery, 8(1):7–23,
2004.
4. L. Cao. Behavior informatics and analytics: Let behavior talk. In Data Mining Workshops,
2008. ICDMW ’08. IEEE International Conference on, pages 87 –96, 2008.
5. L. Cao and C. Zhang. The evolution of KDD: towards domain-driven data mining. Interna-
tional Journal of Pattern Recognition Artificial Intelligence, 21(4):677 – 692, 2007.
6. A. Ceglar and J.F. Roddick. Association mining. ACM Computing Surveys (CSUR), 38(2):5,
2006.
7. S. W. Changchien and T.-C. Lu. Mining association rules procedure to support on-line rec-
ommendation by customers and products fragmentation. Expert Systems with Applications,
20(4):325 – 335, 2001.
8. M. Chatti, M. Jarke, T. Indriasari, and M. Specht. NetLearn: Social Network Analysis and
Visualizations for Learning. Learning in the Synergy of Multiple Disciplines, pages 310–324,
2009.
9. C. Chen. Information visualization. Wiley Interdisciplinary Reviews: Computational Statis-
tics, 2(4):387–403, 2010.
10. A. Demiriz. Enhancing product recommender systems on sparse binary data. Data Mining
and Knowledge Discovery, 9(2):147–170, 2004.
17. 11. A. Demiriz, G. Ertek, T. Atan, and U. Kula. Re-mining positive and negative association
mining results. Lecture Notes in Computer Science, 6171:101–114, 2010.
12. A. Demiriz, G. Ertek, T. Atan, and U. Kula. Re-mining item associations: Methodology and
a case study in apparel retailing. Decision Support Systems, doi:10.1016/j.dss.2011.08.004.
2011.
13. EditGrid. http://www.editgrid.com. 2011.
14. G. Ertek and A. Demiriz. A framework for visualizing association mining results. Lecture
Notes in Computer Science, 4263:593–602, 2006.
15. C. Erten, P.J. Harding, S.G. Kobourov, K. Wampler, and G. Yee. GraphAEL: Graph ani-
mations with evolving layouts. In Lecture Notes in Computer Science, volume 2912, pages
98–110. Springer, 2004.
16. T.M.J. Fruchterman and E.M. Reingold. Graph drawing by force-directed placement. Soft-
ware: Practice and Experience, 21(11):1129–1164, 1991.
17. G. Grahne and J. Zhu. Fast algorithms for frequent itemset mining using fp-trees. IEEE
Transactions on Knowledge and Data Engineering, pages 1347–1362, 2005.
18. J. Han and M. Kamber. Data mining: concepts and techniques. Morgan Kaufmann, 2006.
19. I. Herman, G. Melancon, and M.S. Marshall. Graph visualization and navigation in infor-
¸
mation visualization: A survey. IEEE Transactions on Visualization and Computer Graphics,
6(1):24–43, 2000.
20. W. Kim, B.J. Choi, E.K. Hong, S.K. Kim, and D. Lee. A taxonomy of dirty data. Data
Mining and Knowledge Discovery, 7(1):81–99, 2003.
21. S. Kimani, S. Lodi, T. Catarci, G. Santucci, and C. Sartori. VidaMine: a visual data mining
environment. Journal of Visual Languages & Computing, 15(1):37 – 67, 2004.
22. B. Liu, W. Hsu, and Y. Ma. Pruning and summarizing the discovered associations. In Pro-
ceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and
data mining, pages 125–134. ACM, 1999.
23. H. Ltifi, B. Ayed, A.M. Alimi, and S. Lepreux. Survey of information visualization tech-
niques for exploitation in KDD. In Computer Systems and Applications, 2009. AICCSA 2009.
IEEE/ACS International Conference on, pages 218–225. IEEE, 2009.
24. O.Z. Maimon and L. Rokach. Data mining and knowledge discovery handbook. Springer-
Verlag New York Inc, 2005.
25. G. Mansingh, K.M. Osei-Bryson, and H. Reichgelt. Using ontologies to facilitate post-
processing of association rules by domain experts. Information Sciences, 2010.
26. F. Mansmann, F. Fischer, D.A. Keim, and S.C. North. Visual support for analyzing network
traffic and intrusion detection events using treeMap and graph representations. In Proceed-
ings of the Symposium on Computer Human Interaction for the Management of Information
Technology, pages 19–28. ACM, 2009.
27. R. Mazza. Introduction to information visualization. Springer-Verlag New York Inc, 2009.
28. Microsoft. MS SQL Server, Analysis Services. http://tinyurl.com/6gudq23.
2011.
29. S. Nowak. Some problems of causal interpretation of statistical relationships. Philosophy of
science, 27(1):23–38, 1960.
30. S. OHare, S. Noel, and K. Prole. A graph-theoretic visualization approach to network risk
analysis. Visualization for Computer Security, pages 60–67, 2008.
31. G.A. Pavlopoulos, A.L. Wegener, and R. Schneider. A survey of visualization tools for
biological network analysis. Biodata mining, 1:12, 2008.
32. A. Perer and B. Shneiderman. Integrating statistics and visualization: case studies of gaining
clarity during exploratory data analysis. In Proceeding of the twenty-sixth annual SIGCHI
conference on Human factors in computing systems, pages 265–274. ACM, 2008.
33. R. Santamar´a and R. Ther´ n. Overlapping clustered graphs: co-authorship networks visual-
ı o
ization. In Smart Graphics, pages 190–199. Springer, 2008.
18. 34. SAS. http://www.sas.com. 2011.
35. Z. Shen and K.L. Ma. Mobivis: A visualization system for exploring mobile data. In IEEE
Pacific Visualization Symposium, 2008, PacificVIS’08, pages 175–182. IEEE, 2008.
36. R. Tamassia, B. Palazzi, and C. Papamanthou. Graph drawing for security visualization. In
Graph Drawing, pages 2–13. Springer Berlin/Heidelberg, 2009.
37. I. Tollis, P. Eades, G. Di Battista, and L. Tollis. Graph drawing: algorithms for the visual-
ization of graphs, Prentice Hall. 1998.
38. M. Wattenberg. Visual exploration of multivariate graphs. In Proceedings of the SIGCHI
conference on Human Factors in computing systems, pages 811–819. ACM, 2006.
39. WolframAlpha. http://www.wolframalpha.com, 2011.
40. Y. Yao, Y. Zhao, and R. Maguire. Explanation-oriented association mining using a combina-
tion of unsupervised and supervised learning algorithms. Lecture Notes in Computer Science,
2671:527–532, 2003.
41. M.J. Zaki. Scalable algorithms for association mining. Knowledge and Data Engineering,
IEEE Transactions on, 12(3):372–390, 2002.
42. M.J. Zaki and C.J. Hsiao. Efficient algorithms for mining closed itemsets and their lattice
structure. Knowledge and Data Engineering, IEEE Transactions on, 17(4):462–478, 2005.
43. Y. Zhao, H. Zhang, L. Cao, C. Zhang, and H. Bohlscheid. Combined pattern mining: from
learned rules to actionable knowledge. Proc. of the 21st Australasian Joint Conference on
Artificial Intelligence (AI 08), pages 393–403, 2008.
44. P.D. McNicholas and Y. Zhao Association Rules: An Overview, in Post-Mining of Association
Rules: Techniques for Effective Knowledge Extraction, Yanchang Zhao, Chengqi Zhang and
Longbing Cao (Eds.), ISBN 978-1-60566-404-0, May 2009. Information Science Reference.
pp. 1-10. IGI Publishing Hershey, PA. 2009.