Retail company’s data may be geographically spread in different locations due to huge amount of data and
rapid growth in transactions. But for decision making, knowledge workers need integrated data of all sites.
Therefore the main challenge is to get generalized patterns or knowledge from the transactional data
which is spread at various locations. Transporting data from those locations to server site increases the
cost of transportation of data and at the same time finding patterns from huge data on the server increases
the time and space complexity. Thus multi-database mining plays a vital role to extract knowledge from
different data sources. Thus the technique proposed finds the patterns on various sites and instead of
transporting the data, only the patterns from various locations get transported to the server to find final
deliverable pattern. The technique uses the ranking algorithm to rank the items based on their profit, date
of expiry and stock available at each location. Then association rule mining (ARM) is used to extract
patterns based on ranking of items. Finally all the patterns discovered from various locations are merged
using pattern merger algorithm. Proposed algorithm is implemented and experimental results are taken
for both classical association rule mining on integrated data and for datasets at various sources. Finally
all patterns are combined to discover actionable patterns using pattern merger algorithm given in section
Data Mining is an important aspect for any business. Most of the management level decisions are based on the process of Data Mining. One of such aspect is the association between different sale products i.e. what is the actual support of a product respected to the other product. This concept is called Association Mining. According to this concept we define the process of estimating the sale of one product respective to the other product. We are proposing an association rule based on the concept of Hardware support. In this concept we first maintain the database and compare it with systolic array after this a pruning process is being performed to filter the database and to remove the rarely used items. Finally the data is indexed according to hashing technique and the decision is performed in terms of support count. Krishan Rohilla | Shabnam Kumari | Reema"Data Mining based on Hashing Technique" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-4 , June 2017, URL: http://www.ijtsrd.com/papers/ijtsrd82.pdf http://www.ijtsrd.com/computer-science/data-miining/82/data-mining-based-on-hashing-technique/krishan-rohilla
Data Mining is an important aspect for any business. Most of the management level decisions are based on the process of Data Mining. One of such aspect is the association between different sale products i.e. what is the actual support of a product respected to the other product. This concept is called Association Mining. According to this concept we define the process of estimating the sale of one product respective to the other product. We are proposing an association rule based on the concept of Hardware support. In this concept we first maintain the database and compare it with systolic array after this a pruning process is being performed to filter the database and to remove the rarely used items. Finally the data is indexed according to hashing technique and the decision is performed in terms of support count. Krishan Rohilla | Shabnam Kumari | Reema"Data Mining based on Hashing Technique" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-4 , June 2017, URL: http://www.ijtsrd.com/papers/ijtsrd82.pdf http://www.ijtsrd.com/computer-science/data-miining/82/data-mining-based-on-hashing-technique/krishan-rohilla
Top Data Mining Techniques and Their ApplicationsPromptCloud
In this presentation we have covered why data mining is important and various techniques used for data mining. Apart from that, examples of applications have been given for each technique. This presentation also explains how an enterprise can source web data via crawling services to bolster data mining models.
Data Mining is newly technology and it's very useful for Data analytics for business analysis purpose and decision making data. This PPT described Data Mining in very easy way.
This presentation includes major application areas of data mining and its techniques in real world.This ppt includes various field where data mining is playing a crucial role in the development of every sector by its techniques.i hope it would be helpful to everyone.
One of the most important problems in modern finance is finding efficient ways to summarize and visualize
the stock market data to give individuals or institutions useful information about the market behavior for
investment decisions Therefore, Investment can be considered as one of the fundamental pillars of national
economy. So, at the present time many investors look to find criterion to compare stocks together and
selecting the best and also investors choose strategies that maximize the earning value of the investment
process. Therefore the enormous amount of valuable data generated by the stock market has attracted
researchers to explore this problem domain using different methodologies. Therefore research in data
mining has gained a high attraction due to the importance of its applications and the increasing generation
information. So, Data mining tools such as association rule, rule induction method and Apriori algorithm
techniques are used to find association between different scripts of stock market, and also much of the
research and development has taken place regarding the reasons for fluctuating Indian stock exchange.
But, now days there are two important factors such as gold prices and US Dollar Prices are more
dominating on Indian Stock Market and to find out the correlation between gold prices, dollar prices and
BSE index statistical correlation is used and this helps the activities of stock operators, brokers, investors
and jobbers. They are based on the forecasting the fluctuation of index share prices, gold prices, dollar
prices and transactions of customers. Hence researcher has considered these problems as a topic for
research.
Top Data Mining Techniques and Their ApplicationsPromptCloud
In this presentation we have covered why data mining is important and various techniques used for data mining. Apart from that, examples of applications have been given for each technique. This presentation also explains how an enterprise can source web data via crawling services to bolster data mining models.
Data Mining is newly technology and it's very useful for Data analytics for business analysis purpose and decision making data. This PPT described Data Mining in very easy way.
This presentation includes major application areas of data mining and its techniques in real world.This ppt includes various field where data mining is playing a crucial role in the development of every sector by its techniques.i hope it would be helpful to everyone.
One of the most important problems in modern finance is finding efficient ways to summarize and visualize
the stock market data to give individuals or institutions useful information about the market behavior for
investment decisions Therefore, Investment can be considered as one of the fundamental pillars of national
economy. So, at the present time many investors look to find criterion to compare stocks together and
selecting the best and also investors choose strategies that maximize the earning value of the investment
process. Therefore the enormous amount of valuable data generated by the stock market has attracted
researchers to explore this problem domain using different methodologies. Therefore research in data
mining has gained a high attraction due to the importance of its applications and the increasing generation
information. So, Data mining tools such as association rule, rule induction method and Apriori algorithm
techniques are used to find association between different scripts of stock market, and also much of the
research and development has taken place regarding the reasons for fluctuating Indian stock exchange.
But, now days there are two important factors such as gold prices and US Dollar Prices are more
dominating on Indian Stock Market and to find out the correlation between gold prices, dollar prices and
BSE index statistical correlation is used and this helps the activities of stock operators, brokers, investors
and jobbers. They are based on the forecasting the fluctuation of index share prices, gold prices, dollar
prices and transactions of customers. Hence researcher has considered these problems as a topic for
research.
Siguiendo con nuestro curso, aqui te presento la primera clase del curso de psicologia, si te interesa conocer mas, visita www.ageacac-chapala.blogspot.mx
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...ijsrd.com
Data mining can be defined as the process of uncovering hidden patterns in random data that are potentially useful. The discovery of interesting association relationships among large amounts of business transactions is currently vital for making appropriate business decisions. Association rule analysis is the task of discovering association rules that occur frequently in a given transaction data set. Its task is to find certain relationships among a set of data (itemset) in the database. It has two measurements: Support and confidence values. Confidence value is a measure of rule’s strength, while support value corresponds to statistical significance. There are currently a variety of algorithms to discover association rules. Some of these algorithms depend on the use of minimum support to weed out the uninteresting rules. Other algorithms look for highly correlated items, that is, rules with high confidence. Traditional association rule mining techniques employ predefined support and confidence values. However, specifying minimum support value of the mined rules in advance often leads to either too many or too few rules, which negatively impacts the performance of the overall system. This work proposes a way to efficiently mine association rules over dynamic databases using Dynamic Matrix Apriori technique and Multiple Support Apriori (MSApriori). A modification for Matrix Apriori algorithm to accommodate this modification is proposed. Experiments on large set of data bases have been conducted to validate the proposed framework. The achieved results show that there is a remarkable improvement in the overall performance of the system in terms of run time, the number of generated rules, and number of frequent items used.
COMPARATIVE STUDY OF DISTRIBUTED FREQUENT PATTERN MINING ALGORITHMS FOR BIG S...IAEME Publication
Association rule mining plays an important role in decision support system. Nowadays in the era of internet, various online marketing sites and social networking sites are generating enormous amount of structural/semi structural data in the form of sales data, tweets, emails, web pages and so on. This online generated data is too large that it becomes very complex to process and analyze it using traditional systems which consumes more time. This paper overcomes the main memory bottleneck in single computing system. There are two major goals of this paper. In this paper, big sales dataset of AMUL dairy is preprocessed using Hadoop Map Reduce that convert it into the transactional dataset. Then, after removing the null transactions; distributed frequent pattern mining algorithm MR-DARM (Map Reduce based Distributed Association Rule Mining) is used to find most frequent item set. Finally, strong association rules are generated from frequent item sets. The paper also compares the time efficiency of MR-DARM algorithm with existing Count Distributed Algorithm (CDA) and Fast Distributed Mining (FDM) distributed frequent pattern mining algorithms. The compared algorithms are presented together with experimental results that lead to the final conclusions.
BINARY DECISION TREE FOR ASSOCIATION RULES MINING IN INCREMENTAL DATABASESIJDKP
This research paper proposes an algorithm to find association rules for incremental databases. Most of the
transaction databases are often dynamic. Suppose consider super market customers daily purchase
transactions. Day to day customer’s behaviour to purchase items may change and new products replace
old products. In this scenario static data mining algorithms doesn't make good significance. If an algorithm
continuously learns day to day, then we can get most updated knowledge. This is very much helpful in
present fast updating world. Famous and benchmarked algorithms for Association rules mining are
Apriory and FP- Growth. However, the major drawback in Appriory and FP-Growth is, they must be
rebuilt all over again once the original database is changed. Therefore, in this paper we introduce an
efficient algorithm called Binary Decision Tree (BDT) to process incremental data. To process
continuously data we need so much of processing and storage resources. In this algorithm we scan data
base only once by which we construct dynamic growing binary tree to find association rules with better
performance and optimum storage. We can apply for static data also, but our main intension is to give
optimum solution for incremental data.
BINARY DECISION TREE FOR ASSOCIATION RULES MINING IN INCREMENTAL DATABASES IJDKP
This research paper proposes an algorithm to find association rules for incremental databases. Most of the
transaction databases are often dynamic. Suppose consider super market customers daily purchase
transactions. Day to day customer’s behaviour to purchase items may change and new products replace
old products. In this scenario static data mining algorithms doesn't make good significance. If an algorithm
continuously learns day to day, then we can get most updated knowledge. This is very much helpful in
present fast updating world. Famous and benchmarked algorithms for Association rules mining are
Apriory and FP- Growth. However, the major drawback in Appriory and FP-Growth is, they must be
rebuilt all over again once the original database is changed. Therefore, in this paper we introduce an
efficient algorithm called Binary Decision Tree (BDT) to process incremental data. To process
continuously data we need so much of processing and storage resources. In this algorithm we scan data
base only once by which we construct dynamic growing binary tree to find association rules with better
performance and optimum storage. We can apply for static data also, but our main intension is to give
optimum solution for incremental data.
An Efficient Compressed Data Structure Based Method for Frequent Item Set Miningijsrd.com
Frequent pattern mining is very important for business organizations. The major applications of frequent pattern mining include disease prediction and analysis, rain forecasting, profit maximization, etc. In this paper, we are presenting a new method for mining frequent patterns. Our method is based on a new compact data structure. This data structure will help in reducing the execution time.
In the recent years the scope of data mining has evolved into an active area of research because of the previously unknown and interesting knowledge from very large database collection. The data mining is applied on a variety of applications in multiple domains like in business, IT and many more sectors. In Data Mining the major problem which receives great attention by the community is the classification of the data. The classification of data should be such that it could be they can be easily verified and should be easily interpreted by the humans. In this paper we would be studying various data mining techniques so that we can find few combinations for enhancing the hybrid technique which would be having multiple techniques involved so enhance the usability of the application. We would be studying CHARM Algorithm, CM-SPAM Algorithm, Apriori Algorithm, MOPNAR Algorithm and the Top K Rules.
Review on: Techniques for Predicting Frequent Itemsvivatechijri
Electronic commerce(E- Commerce) is the trading or facilitation of trading in products or services
using computer networks, such as the Internet. It comes under a part of Data Mining which takes large amount
of data and extracts them. The paper uses the information about the techniques and methods used in the
shopping cart for prediction of product that the customer wants to buy or will buy and shows the relevant
products according to the cost of the product. The paper also summarizes the descriptive methods with
examples. For predicting the frequent pattern of itemset, many prediction algorithms, rule mining techniques
and various methods have already been designed for use of retail market. This paper examines literature
analysis on several techniques for mining frequent itemsets.The survey comprises various tree formations like
Partial tree, IT tree and algorithms with its advantages and its limitations.
Data Mining For Supermarket Sale Analysis Using Association Ruleijtsrd
Data mining is the novel technology of discovering the important information from the data repository which is widely used in almost all fields Recently, mining of databases is very essential because of growing amount of data due to its wide applicability in retail industries in improving marketing strategies. Analysis of past transaction data can provide very valuable information on customer behavior and business decisions. The amount of data stored grows twice as fast as the speed of the fastest processor available to analyze it.Its main purpose is to find the association relationship among the large number of database items. It is used to describe the patterns of customers purchase in the supermarket. This is presented in this paper. Rajeshri Shelke"Data Mining For Supermarket Sale Analysis Using Association Rule" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-4 , June 2017, URL: http://www.ijtsrd.com/papers/ijtsrd94.pdf http://www.ijtsrd.com/engineering/computer-engineering/94/data-mining-for-supermarket-sale-analysis-using-association-rule/rajeshri-shelke
The challenges with respect to mining frequent items over data streaming engaging variable window size
and low memory space are addressed in this research paper. To check the varying point of context change
in streaming transaction we have developed a window structure which will be in two levels and supports in
fixing the window size instantly and controls the heterogeneities and assures homogeneities among
transactions added to the window. To minimize the memory utilization, computational cost and improve the
process scalability, this design will allow fixing the coverage or support at window level. Here in this
document, an incremental mining of frequent item-sets from the window and a context variation analysis
approach are being introduced. The complete technology that we are presenting in this document is named
as Mining Frequent Item-sets using Variable Window Size fixed by Context Variation Analysis (MFI-VWSCVA).
There are clear boundaries among frequent and infrequent item-sets in specific item-sets. In this
design we have used window size change to represent the conceptual drift in an information stream. As it
were, whenever there is a problem in setting window size effectively the item-set will be infrequent. The
experiments that we have executed and documented proved that the algorithm that we have designed is
much efficient than that of existing.
A SURVEY ON DATA MINING IN STEEL INDUSTRIESIJCSES Journal
In Industrial environments, huge amount of data is being generated which in turn collected indatabase anddata warehouses from all involved areas such as planning, process design, materials, assembly, production, quality, process control, scheduling, fault detection,shutdown, customer relation management, and so on. Data Mining has become auseful tool for knowledge acquisition for industrial process of Iron and steel making. Due to the rapid growth in Data Mining, various industries started using data mining technology to search the hidden patterns, which might further be used to the system with the new knowledge which might design new models to enhance the production quality, productivity optimum cost and maintenance etc. The continuous improvement of all steel production process regarding the avoidance of quality deficiencies and the related improvement of production yield is an essential task of steel producer. Therefore, zero defect strategy is popular today and to maintain it several quality assurancetechniques areused. The present report explains the methods of data mining and describes its application in the industrial environment and especially, in the steel industry.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANK
1. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.1, January 2017
DOI: 10.5121/ijdkp.2017.7104 37
PATTERN DISCOVERY FOR MULTIPLE DATA
SOURCES BASED ON ITEM RANK
Arti Deshpande1
, Anjali Mahajan2
and A Thomas1
1
Department of CSE, G. H. Raisoni College of Engineering, Nagpur, Maharashtra, India
2
Department of IT, Government Polytechnic, Nagpur, Maharashtra, India
ABSTRACT
Retail company’s data may be geographically spread in different locations due to huge amount of data and
rapid growth in transactions. But for decision making, knowledge workers need integrated data of all sites.
Therefore the main challenge is to get generalized patterns or knowledge from the transactional data
which is spread at various locations. Transporting data from those locations to server site increases the
cost of transportation of data and at the same time finding patterns from huge data on the server increases
the time and space complexity. Thus multi-database mining plays a vital role to extract knowledge from
different data sources. Thus the technique proposed finds the patterns on various sites and instead of
transporting the data, only the patterns from various locations get transported to the server to find final
deliverable pattern. The technique uses the ranking algorithm to rank the items based on their profit, date
of expiry and stock available at each location. Then association rule mining (ARM) is used to extract
patterns based on ranking of items. Finally all the patterns discovered from various locations are merged
using pattern merger algorithm. Proposed algorithm is implemented and experimental results are taken
for both classical association rule mining on integrated data and for datasets at various sources. Finally
all patterns are combined to discover actionable patterns using pattern merger algorithm given in section
V.
KEYWORDS
Association Rule Mining, Pattern Discovery, Data Mining, Ranking
1. INTRODUCTION
The conventional methods for mining numerous databases is to integrate the databases, then
apply association mining to discover patterns. It is frequently difficult to coordinate
heterogeneous databases or to move one entire database to another site, reason for both efficiency
and protection concerns. Along these lines in multi-database mining we require efficient
approaches that can deliver great mining results with low cost of database correspondence.
Association Rule Mining discovers application in business sector to get frequently purchased
items, unseen patterns, associated events etc. The business sector experts would be keen on
recognizing habitually bought things, so that profitable deals can be given to the customers.
Retailers and supermarket maintains large databases which contain valuable information related
to customers and items purchased by them. Customer buying patterns can be discovered from the
large volumes of data which is scattered at various locations. Transactional data of each outlet of
supermarket is local to the location. So every outlet maintains the local server for day to day
transactions. If the supermarket has outlet at 100 locations, all 100 local servers are maintained at
each site. But when organization decides any policy or campaign or promotions for any product,
it is same for all 100 locations. Demographic data and behavioral patterns vary from location to
location. So transactional data of all locations play vital role to discover final patterns. When
2. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.1, January 2017
38
organization decides any offer on particular product, it basically checks for most frequently
purchased or profitable or about to expire date product. For that, all transactional data at various
locations get considered. If such data is transported to main server of organization and then
association rule mining is applied to discover frequent patterns, then the transportation cost and
space required get increased. Proposed technique suggest that instead of transporting data to the
server, patterns can be discovered at local site and then those patterns are transported to the main
server to discover final patters. Ranking algorithm is also used to give rank to each items
available in transaction list.
2. LITERATURE REVIEW
ARM was first introduced by Agrawal et al. 1993[1]. ARM aims to extract interesting patterns
from large dataset. Based on minimum support and confidence, frequent items are discovered and
association rules are generated. Market basket analysis is a modelling technique which is also
called as affinity analysis, it helps identifying which items are likely to be purchased together.
The market-basket problem assumes we have some large number of items, e.g., “bread”, “milk.”,
etc. Customers buy the subset of items as per their need and marketer gets the information that
“which things customers have taken together”. So the marketers can use this information to sale
the items together or give some promotional campaign on those items.
For example: If someone buys a packet of milk also tends to buy bread at the same time and it is
represented as Milk=>Bread. Market basket analysis algorithms are straightforward; difficulties
arise mainly in dealing with large amounts of transactional data, where after applying algorithm it
may give rise to large number of rules which may be trivial in nature. The problem of large
volume of trivial results can be overcome with the help of differential market basket analysis
which enables in finding interesting results and eliminates the large volume. Using differential
analysis it is possible to compare results between various stores, between customers in various
demographic groups. Support, confidence and lift are strategically measures that control the
process of association rule.
Support: The support of the rule, that is, the relative frequency of transactions that contain x ^ y
[2] support(x → y) = support(x+y)
Confidence: The confidence of the rule defines how many number of times transactions contains
x also contains y.
confidence (x→ y) = support(x+y) / support(x)
Lift: The lift value of the rule is the additional interestingness measures on the rules. These
measures can then be used to either rank the rules by importance (or present a sorted list to the
user) or as an additional pruning criterion.
lift(x → y) = confidence(x → y) / support(y)
Jia Hu et al. [3] presented a conceptual model with dynamic multi-level workflows corresponding
to mining-grid centric multi-layer grid architecture which govern enterprise application
integration. The technique helps to mine multiple data sources for customer segmentation based
on their online behaviour. One to one marketing and personalization is possible due to the model
given.
Mayssam Sayyadian [4] described the prediction model assuming that all the needed data to build
an accurate prediction model resides in a single database. Tuples from various autonomous and
heterogeneous databases are combined to collect all information which is required to build
suitable classification model. HeteroClass framework is proposed for effective classification
based on heterogeneous databases. They have also suggested the method to combine schema
matching and structure discovery techniques.
3. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.1, January 2017
39
The Author [5] proposed a specialised technique to improve mining multiple databases using
local pattern analysis. A new generalized technique is given for mining multiple large databases
which improves the quality of synthesized global patterns considerably. In [7] the method
proposed synthesize high frequency rules from different data sources by calculating weights of
different sources based on their transaction population. These weights can be allocated based on
turnover or quantity of items sold or profit gain. They have also introduced global confidence to
get final global patterns.
Multi-Database Mining is given by Shichao Zhang et. al. [8].The author have mainly focused on
mono and multi database mining. They introduced various types of patterns in multi-database like
(1) local patterns (2) high-vote patterns (3) exceptional patterns (4) suggested patterns. They
suggested the classification of multi database and the problems related to make clusters of those.
3. PROPOSED ARCHITECTURE
The transactional data is spread over the chain of retail store at various locations where the store
is located. The transactional database consists of two sections; first section includes the “Sales”
binary data per transaction depending on the items purchased by the customers. The second
section contains the “Stock” data per item which includes the total stock available and the number
of days left for the item to expire at each site. The frequent itemset generation algorithm Apriori
[6] is applied to generate local patterns at each site. Instead of transporting the local transactional
data, only the patterns generated at local site are transported to server site. To get top n items,
Simple Additive Weighting (SAW) [9] method is applied on server site based on the integrated
patterns from various locations. Finally collaborative Association Rule Mining (CARM) is used
to generate deliverable patterns based on top n selected items. The proposed architecture diagram
is given in fig 1.
Figure1. Architecture of proposed system
4. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.1, January 2017
40
3.1 Frequent Itemset Generation
For each local site same minimum support is set and Apriori [1] algorithm is applied on
transactional data. Two support values are considered to get frequent item set. Idea is to find the
patterns with maximum purchased item with minimum purchased items. So two support values
are set as (1) Lower Support (2) Upper Support. Thus only the items less than or equal to lower
support and the items more than or equal to upper support are considered to generate frequent
item sets. So minimum sold and maximum sold items are combined together. Instead of
transporting transactional data, frequent item set along with stock and expiry days of each item is
transported to server site. On server site ranking method SAW [9] is applied to get the rank for
each item. SAW uses the different criteria like total number of stock available, Number of days
to expire and Profit for each item to give the rank to each item. So only the top n items are
selected based on the rank of items. Finally frequent itemsets from each site are selected which
are having only the items from those top n items.
3.2 Proposed Algorithm
Input : transactional Dataset D1, D2,…….., Dk for k sites, Lower Support, Upper Support
Algorithm:
1. For each site from 1 to k
a. Generate frequent ItemSets Using Apriori by considering Lower Support
and Upper Support
b. Transport Frequent itemets with support to Server Site
c. Transport Stock Available, Days to expire of each item and profit for the
item to Server Site
End For
2. Merge All Frequent items generated from step 1.
3. Apply SAW on combined table of all items
a. Normalize the data using min-max normalization
b. Assign weights to Stock Available, Days to expire of each item and
profit for the item
c. Assign Rank to each item
4. Get TOP n items
5. Filter Out frequent itemsets from each site which contains only items from n
items
6. Merge all filtered frequent itemsets
Output: Deliverable Itemsets generated from step 6
4. EXPERIMENTS AND RESULTS
For experimental purpose two sites S1 and S2 are considered. Synthetic data for 20 items are
generated using SQL Data Generator [10] for both the sites. Algorithms are implemented using
Java 7 and database used is MySQL .By considering Upper Support 40% and Lower Support
2%, frequent itemsets are generated using Apriori for both site S1 and S2 as shown in Table 1 and
2 respectively.
5. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.1, January 2017
41
Number Elements Support
1 [Juice] 0.53
2 [Biscuits] 0.71
3 [Bread] 0.54
4 [Butter] 0.41
5 [Biscuits, Bread] 0.41
Table 1. Frequent Itemsets at Source 1 (S1)
Number Elements Support
1 [Biscuits] 0.72
2 [Bread] 0.59
3 [Milk] 0.41
4 [Oats] 0.41
5 [Biscuits, Bread] 0.48
Table 2. Frequent Itemsets at Source 2 (S2)
Stock available, Expiry in days and Profit on each item is also transported from site S1 and S2 to
server site as shown in Table 3 and 4.
Item Expiry
in Days
Stock Cost
Price
Selling
Price
Stock
Value
Stock
Sale
Value
Profit
per
Item
Profit
on Full
Sale
Juice 8 25 80.0 99.0 2000.0 2475.0 19.0 475.0
Tea 15 62 42.0 58.0 2604.0 3596.0 16.0 992.0
Biscuits 6 48 20.0 32.0 960.0 1536.0 12.0 576.0
Chicken 12 50 160.0 190.0 8000.0 9500.0 30.0 1500.0
Bread 5 35 20.0 30.0 700.0 1050.0 10.0 350.0
Table 3. Data Fetched from Site S1
Item Expiry
in Days
Stock Cost
Price
Selling
Price
Stock
Value
Stock
Sale
Value
Profit
per
Item
Profit
on Full
Sale
Juice 10 50 80.0 99.0 4000.0 4950.0 19.0 950.0
Tea 12 90 42.0 58.0 3780.0 5220.0 16.0 1440.0
Biscuits 7 40 20.0 32.0 800.0 1280.0 12.0 480.0
Chicken 6 40 160.0 190.0 6400.0 7600.0 30.0 1200.0
Bread 4 50 20.0 30.0 1000.0 1500.0 10.0 500.0
Table 4. Data Fetched from Site S2
Data from table 3 and 4 are integrated and normalized before applying the ranking method SAW.
Rank for each item is calculated using SAW based on Stock available, Expiry in days and Profit
criteria. Top 6 items are shown below in Table 5.
6. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.1, January 2017
42
Item Value Rank
Chicken 0.83 1
Milk 0.54 2
Jam 0.53 3
Bread 0.46 4
Juice 0.4 5
Biscuits 0.39 6
Table 5. Top 6 items with their value and Rank
Only the itemsets consist of items from table 5 are selected from table 1 and 2 which gives the
frequent itemset list from source S1 and S2. Table 6 shows the final deliverable patters.
Number Elements Source
1 [Juice] S1
2 [Biscuits] S1, S2
3 [Bread] S1, S2
4 [Milk] S2
5
[Biscuits,
Bread]
S1, S2
Table 6. Combined Frequent Itemsets from Sources S1 and S2
So from the above results , item [Milk] is missing at Site S1 but still considered as final
deliverable, similarly [Juice] is missing at Site S2.
In second scenario, all transactional data is transported to server site from site S1 and S2,
association rule mining is applied on integrated data. Frequent Itemsets generated are given in
Table 7.
Number Item Support
1 [Biscuits] 0.72
2 [Bread] 0.56
3 [Biscuits, Bread] 0.44
Table 7. Frequent ItemSets after combining Transactional Data from site S1 and S2
After comparing the results from table 6 and 7, it is observed that frequent items [Juice] and
[Milk] are missing in table 7 though they are in top 6 items list. If the expiry date, stock and profit
are considered, these items should move fast for selling. So retail expert can take decisions to
give some discount or offer on these two items.
5. CONCLUSION
The technique proposed is reducing the cost of transportation of data from client side to server
site. As the association rule mining is applied at local site, space required and time to execute
algorithm is reduced drastically. This helps in finding the various patterns based on geographical
location as people from different places have generally different buying pattern habits. Once all
patterns get integrated from various sites, we are able to get some common patterns. So ranking
method helps to sort the most important items which are supposed to move fast based on their
expiry date, stock and profit etc. So filtering those items from frequent itemlist helps retailer to
7. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.1, January 2017
43
clear those stock. We found that integrated transactional data result missing some patterns which
are crucial. This algorithm further can be extended by considering quantity purchased for each
item and quantity based association rule mining can be used to get patterns. We are working on
quantitative association rule mining for multiple sources.
REFERENCES
[1] R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large
databases,” in ACM SIGMOD Record, vol. 22, no. 2. ACM, 1993, pp. 207-216.
[2] Deshpande, Arti, and Anjali Mahajan. "Domain Driven Multi-Feature Combined Mining for retail
dataset." InInternational Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249-
8958.
[3] Hu, Jia, and Ning Zhong. "Organizing multiple data sources for developing intelligent e-business
portals." Data Mining and Knowledge Discovery 12, no. 2-3 (2006): 127-150.
[4] Sayyadian, Mayssam. "HeteroClass: a framework for effective classification from heterogeneous
databases." CS512 Project Report, University of Wisconsin, Madison (2006).
[5] Adhikari, Animesh, Pralhad Ramachandrarao, Bhanu Prasad, and Jhimli Adhikari. "Mining Multiple
Large Data Sources." Int. Arab J. Inf. Technol. 7, no. 3 (2010): 241-249.
[6] Borgelt, Christian. "Efficient implementations of apriori and eclat." In FIMI’03: Proceedings of the
IEEE ICDM workshop on frequent itemset mining implementations. 2003.
[7] Ramkumar, Thirunavukkarasu, and Rengaramanujam Srinivasan. "Modified algorithms for
synthesizing high-frequency rules from different data sources."Knowledge and information systems
17, no. 3 (2008): 313-334.
[8] Zhang, Shichao, Xindong Wu, and Chengqi Zhang. "Multi-database mining."IEEE Computational
Intelligence Bulletin 2, no. 1 (2003): 5-13.
[9] Savitha, K., and C. C handrasekar. "Trusted network selection using SAW and TOPSIS algorithms
for heterogeneous wireless networks." arXiv preprint arXiv:1108.0141 (2011).
[10] http://www.red-gate.com/products/sql-development/sql-data-generator/
Authors
Arti Deshpande is an Assistant Professor in the Department of Computer Engineering
in Thadomal Shahani Engineering College, Mumbai, India. She received Master of
computer engineering from Mumbai University, Maharashtra, India. Currently, she is a
PhD student at the Department of Computer Science and Engineering in G. H. Raisoni
College of Engineering , Nagpur, Maharashtra, India.
Dr. A. R. Mahajan is working as Head of Information Technology Department, GP,
Nagpur. She is BE and ME in Computer Science and Engineering She has completed
her PhD in Computer Science and Engineering. She has presented thirty four papers in
International Journal and one paper in national Journal She has published Forty three
papers in International conferences and five papers in national conferences .She has
more than twenty years of experience. Her area of specialization is compiler
optimization, Artificial Intelligence, Parallel algorithms. She is a member of IEEE,
ISTE, and CSI.
A.Thomas received M.Tech(Computer Science & Engineeirng) in 2013. She is Head of
Computer Science Department at G.H.Raisoni College of Engineering, Nagpur. She has
completed M.Phil (Computer Science) in the year 2011. Her area of specialization is
soft Computing and Data mining