As we know that the online mining of streaming data is one of the most important issues in data mining. In
this paper, we proposed an efficient one- .frequent item sets over a transaction-sensitive sliding window),
to mine the set of all frequent item sets in data streams with a transaction-sensitive sliding window. An
effective bit-sequence representation of items is used in the proposed algorithm to reduce the time and
memory needed to slide the windows. The experiments show that the proposed algorithm not only attain
highly accurate mining results, but also the performance significant faster and consume less memory than
existing algorithms for mining frequent item sets over recent data streams. In this paper our theoretical
analysis and experimental studies show that the proposed algorithm is efficient and scalable and perform
better for mining the set of all maximum frequent item sets over the entire history of the data streams.
The document discusses locking techniques and timestamp ordering for concurrency control in database systems. It provides details on shared locks and exclusive locks for concurrency control. Timestamp ordering assigns each transaction a unique timestamp based on when it enters the system. The timestamp ordering protocol ensures serializability by only allowing schedules where transactions execute in timestamp order. It compares the timestamp of transactions to the read and write timestamps of data items to determine if operations can be executed or if transactions need to be aborted. While timestamp ordering ensures serializability and avoids deadlocks, it does not prevent cascading rollbacks when transactions rely on each other.
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...IOSR Journals
The document analyzes pattern transformation algorithms for sensitive knowledge protection in data mining. It discusses:
1) Three main privacy preserving techniques - heuristic, cryptography, and reconstruction-based. The proposed algorithms use heuristic-based techniques.
2) Four proposed heuristic-based algorithms - item-based Maxcover (IMA), pattern-based Maxcover (PMA), transaction-based Maxcover (TMA), and Sensitivity Cost Sanitization (SCS) - that modify sensitive transactions to decrease support of restrictive patterns.
3) Performance improvements including parallel and incremental approaches to handle large, dynamic databases while balancing privacy and utility.
This document discusses using a big data platform for real-time fraud detection through sequence mining. Hadoop is used to build a Markov chain model from transaction data to identify patterns. Storm then processes transactions in real-time, using the model to flag potentially fraudulent sequences. Redis is used as an input queue and to store the model and flagged transactions. The approach sequences transactions by customer to detect fraud patterns not found by analyzing individual transactions.
Mining Of Big Data Using Map-Reduce TheoremIOSR Journals
This document discusses using MapReduce to efficiently extract large and complex data from big data sources. It proposes a MapReduce theorem for big data mining that is more efficient than the Heterogeneous Autonomous Complex and Evolving (HACE) theorem. MapReduce libraries support different programming languages and platforms, allowing for portable big data processing. The document outlines how MapReduce connects to Big Query to allow SQL queries to efficiently extract and analyze large datasets stored in the cloud. It also discusses data cleaning, sampling, and normalization as part of the big data mining process.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
CLOHUI: AN EFFICIENT ALGORITHM FOR MINING CLOSED + HIGH UTILITY ITEMSETS FROM...ijcsit
High-utility itemset mining (HUIM) is an important research topic in data mining field and extensive algorithms have been proposed. However, existing methods for HUIM present too many high-utility
itemsets (HUIs), which reduces not only efficiency but also effectiveness of mining since users have to sift
through a large number of HUIs to find useful ones. Recently a new representation, closed+ high-utility itemset (CHUI), has been proposed. With this concept, the number of HUIs is reduced massively. Existing
methods adopt two phases to discover CHUIs from a transaction database. In phase I, an itemset is first
checked whether it is closed. If the itemset is closed, an overestimation technique is adopted to set an upper
bound of the utility of this itemset in the database. The itemsets whose overestimated utilities are no less
than a given threshold are selected as candidate CHUIs. In phase II, the candidate CHUIs generated from
phase 1 are verified through computing their utilities in the database. However, there are two problems in
these methods. 1) The number of candidate CHUIs is usually very huge and extensive memory is required.
2) The method computing closed itemsets is time consuming. Thus in this paper we propose an efficient
algorithm CloHUI for mining CHUIs from a transaction database. CloHUI does not generate any
candidate CHUIs during the mining process, and verifies closed itemsets from a tree structure. We propose
a strategy to make the verifying process faster. Extensive experiments have been performed on sparse and
dense datasets to compare CloHUI with the state-of-the-art algorithm CHUD, the experiment results show
that for dense datasets our proposed algorithm CloHUI significantly outperforms CHUD: it is more than an order of magnitude faster, and consumes less memory.
A Relative Study on Various Techniques for High Utility Itemset Mining from T...IRJET Journal
This document summarizes research on techniques for mining high utility itemsets from transactional databases. It discusses how traditional frequent itemset mining focuses only on item frequency and not utility. High utility itemset mining considers both frequency and the utility (e.g. profit, quantity) of itemsets to find those with high total utility. The document reviews related work on frequent itemset mining and introduces high utility itemset mining. It defines key concepts like internal utility, external utility and discusses properties like the utility bound property. Finally, it surveys several algorithms for high utility itemset mining including Two-Phase, CTU-Mine, CTU-PRO and CTU-PROL.
This document discusses various data mining techniques, including artificial neural networks. It provides an overview of the knowledge discovery in databases process and the cross-industry standard process for data mining. It also describes techniques such as classification, clustering, regression, association rules, and neural networks. Specifically, it discusses how neural networks are inspired by biological neural networks and can be used to model complex relationships in data.
The document discusses locking techniques and timestamp ordering for concurrency control in database systems. It provides details on shared locks and exclusive locks for concurrency control. Timestamp ordering assigns each transaction a unique timestamp based on when it enters the system. The timestamp ordering protocol ensures serializability by only allowing schedules where transactions execute in timestamp order. It compares the timestamp of transactions to the read and write timestamps of data items to determine if operations can be executed or if transactions need to be aborted. While timestamp ordering ensures serializability and avoids deadlocks, it does not prevent cascading rollbacks when transactions rely on each other.
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...IOSR Journals
The document analyzes pattern transformation algorithms for sensitive knowledge protection in data mining. It discusses:
1) Three main privacy preserving techniques - heuristic, cryptography, and reconstruction-based. The proposed algorithms use heuristic-based techniques.
2) Four proposed heuristic-based algorithms - item-based Maxcover (IMA), pattern-based Maxcover (PMA), transaction-based Maxcover (TMA), and Sensitivity Cost Sanitization (SCS) - that modify sensitive transactions to decrease support of restrictive patterns.
3) Performance improvements including parallel and incremental approaches to handle large, dynamic databases while balancing privacy and utility.
This document discusses using a big data platform for real-time fraud detection through sequence mining. Hadoop is used to build a Markov chain model from transaction data to identify patterns. Storm then processes transactions in real-time, using the model to flag potentially fraudulent sequences. Redis is used as an input queue and to store the model and flagged transactions. The approach sequences transactions by customer to detect fraud patterns not found by analyzing individual transactions.
Mining Of Big Data Using Map-Reduce TheoremIOSR Journals
This document discusses using MapReduce to efficiently extract large and complex data from big data sources. It proposes a MapReduce theorem for big data mining that is more efficient than the Heterogeneous Autonomous Complex and Evolving (HACE) theorem. MapReduce libraries support different programming languages and platforms, allowing for portable big data processing. The document outlines how MapReduce connects to Big Query to allow SQL queries to efficiently extract and analyze large datasets stored in the cloud. It also discusses data cleaning, sampling, and normalization as part of the big data mining process.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
CLOHUI: AN EFFICIENT ALGORITHM FOR MINING CLOSED + HIGH UTILITY ITEMSETS FROM...ijcsit
High-utility itemset mining (HUIM) is an important research topic in data mining field and extensive algorithms have been proposed. However, existing methods for HUIM present too many high-utility
itemsets (HUIs), which reduces not only efficiency but also effectiveness of mining since users have to sift
through a large number of HUIs to find useful ones. Recently a new representation, closed+ high-utility itemset (CHUI), has been proposed. With this concept, the number of HUIs is reduced massively. Existing
methods adopt two phases to discover CHUIs from a transaction database. In phase I, an itemset is first
checked whether it is closed. If the itemset is closed, an overestimation technique is adopted to set an upper
bound of the utility of this itemset in the database. The itemsets whose overestimated utilities are no less
than a given threshold are selected as candidate CHUIs. In phase II, the candidate CHUIs generated from
phase 1 are verified through computing their utilities in the database. However, there are two problems in
these methods. 1) The number of candidate CHUIs is usually very huge and extensive memory is required.
2) The method computing closed itemsets is time consuming. Thus in this paper we propose an efficient
algorithm CloHUI for mining CHUIs from a transaction database. CloHUI does not generate any
candidate CHUIs during the mining process, and verifies closed itemsets from a tree structure. We propose
a strategy to make the verifying process faster. Extensive experiments have been performed on sparse and
dense datasets to compare CloHUI with the state-of-the-art algorithm CHUD, the experiment results show
that for dense datasets our proposed algorithm CloHUI significantly outperforms CHUD: it is more than an order of magnitude faster, and consumes less memory.
A Relative Study on Various Techniques for High Utility Itemset Mining from T...IRJET Journal
This document summarizes research on techniques for mining high utility itemsets from transactional databases. It discusses how traditional frequent itemset mining focuses only on item frequency and not utility. High utility itemset mining considers both frequency and the utility (e.g. profit, quantity) of itemsets to find those with high total utility. The document reviews related work on frequent itemset mining and introduces high utility itemset mining. It defines key concepts like internal utility, external utility and discusses properties like the utility bound property. Finally, it surveys several algorithms for high utility itemset mining including Two-Phase, CTU-Mine, CTU-PRO and CTU-PROL.
This document discusses various data mining techniques, including artificial neural networks. It provides an overview of the knowledge discovery in databases process and the cross-industry standard process for data mining. It also describes techniques such as classification, clustering, regression, association rules, and neural networks. Specifically, it discusses how neural networks are inspired by biological neural networks and can be used to model complex relationships in data.
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANKIJDKP
Retail company’s data may be geographically spread in different locations due to huge amount of data and
rapid growth in transactions. But for decision making, knowledge workers need integrated data of all sites.
Therefore the main challenge is to get generalized patterns or knowledge from the transactional data
which is spread at various locations. Transporting data from those locations to server site increases the
cost of transportation of data and at the same time finding patterns from huge data on the server increases
the time and space complexity. Thus multi-database mining plays a vital role to extract knowledge from
different data sources. Thus the technique proposed finds the patterns on various sites and instead of
transporting the data, only the patterns from various locations get transported to the server to find final
deliverable pattern. The technique uses the ranking algorithm to rank the items based on their profit, date
of expiry and stock available at each location. Then association rule mining (ARM) is used to extract
patterns based on ranking of items. Finally all the patterns discovered from various locations are merged
using pattern merger algorithm. Proposed algorithm is implemented and experimental results are taken
for both classical association rule mining on integrated data and for datasets at various sources. Finally
all patterns are combined to discover actionable patterns using pattern merger algorithm given in section
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...IOSR Journals
This document presents an improved item-based maxcover algorithm to protect sensitive patterns in large databases. The algorithm aims to minimize information loss when sanitizing databases to hide sensitive patterns. It works by identifying sensitive transactions containing restrictive patterns. It then sorts these transactions by degree and size and selects victim items to remove based on which items have the maximum cover across multiple patterns. This is done with only one scan of the source database. Experimental results on real datasets show the algorithm achieves zero hiding failure and low misses costs between 0-2.43% while keeping the sanitization rate between 40-68% and information loss below 1.1%.
Mining Fuzzy Association Rules from Web Usage Quantitative Data csandit
Web usage mining is the method of extracting interesting patterns from Web usage log file. Web
usage mining is subfield of data mining uses various data mining techniques to produce
association rules. Data mining techniques are used to generate association rules from
transaction data. Most of the time transactions are boolean transactions, whereas Web usage
data consists of quantitative values. To handle these real world quantitative data we used fuzzy
data mining algorithm for extraction of association rules from quantitative Web log file. To
generate fuzzy association rules first we designed membership function. This membership
function is used to transform quantitative values into fuzzy terms. Experiments are carried out
on different support and confidence. Experimental results show the performance of the
algorithm with varied supports and confidence.
The document discusses mining frequent patterns from object-relational data. It begins with an introduction to data mining and frequent pattern mining. It then reviews related work on data models, including relational, object-oriented, and object-relational data models. Several frequent pattern mining algorithms are described, including Apriori, FP-Growth, ECLAT and RElim. Two approaches for mining object-relational data are proposed: a fundamental approach that treats it similarly to transactional data, and a nested-relations approach to handle nested attributes. The document concludes by outlining parameters for applying frequent pattern mining algorithms to object-relational data.
Associationship is an important component of data mining. In real world applications, the knowledge that is used for aiding decision-making is always time-varying. However, most of the existing data mining approaches rely on the assumption that discovered knowledge is valid indefinitely. For supporting better decision making, it is desirable to be able to actually identify the temporal features with the interesting patterns or rules. This paper presents a novel approach for mining Efficient Temporal Association Rule (ETAR). The basic idea of ETAR is to first partition the database into time periods of item set and then progressively accumulates the occurrence count of each item set based on the intrinsic partitioning characteristics. Explicitly, the execution time of ETAR is, in orders of magnitude, smaller than those required by schemes which are directly extended from existing methods because it scan the database only once.
Social navigation can be considered as an effective approach for supporting information management issues particularly privacy and security concern that prevail in the management of information systems. Any of these management information system security issues are a matter of critical acknowledgement for knowledge management systems as well. This paper outlines multiple privacy and security risks that can be applied to information systems in general. Including examples from different sectors such as health care, public information, and e-commerce, these management information issues provide an outline of the present situation. It also observes the key concerns of information system executive in these areas, highlighting the identification and explanation of regional differences and similarities. Selecting on a current issue in management information system the paper provides a detailed amount of knowledge on the possible reasons for the issues such as factors of economic development, technological status, and political/legal environment. The paper concludes by providing a revised framework for the management information issues formulated with effective literature searches. This is going to be effective enough for the studies that will more likely support such reasoning in the future.
CELL TRACKING QUALITY COMPARISON BETWEEN ACTIVE SHAPE MODEL (ASM) AND ACTIVE ...ijitcs
The aim of this paper is to introduce a comparison between cell tracking using active shape model (ASM)
and active appearance model (AAM) algorithms, to compare the cells tracking quality between the two
methods to track the mobility of the living cells. Where sensitive and accurate cell tracking system is
essential to cell motility studies. The active shape model (ASM) and active appearance model (AAM)
algorithms has proved to be a successful methods for matching statistical models. The experimental results
indicate the ability of (AAM) meth
ADAPTIVESYNCHRONIZER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZH...ijitcs
This paper derives new adaptive synchronizers for the hybrid synchronization of hyperchaotic Zheng
systems (2010) and hyperchaotic Yu systems (2012). In the hybrid synchronization design of master and
slave systems, one part of the systems, viz. their odd states, are completely synchronized (CS), while the
other part, viz. their even states, are completely anti-synchronized (AS) so that CS and AS co-exist in the
process of synchronization. The research problem gets even more complicated, when the parameters of the
hyperchaotic systems are not known and we handle this complicate problem using adaptive control. The
main results of this research work are established via adaptive control theory andLyapunov stability
theory. MATLAB plotsusing classical fourth-order Runge-Kutta method have been depictedfor the new
adaptive hybrid synchronization results for the hyperchaotic Zheng and hyperchaotic Yu systems.
Making isp business profitable using data miningijitcs
This Data mining is a new powerful technology with great potential to extract hidden predictive
information from large databases. Tools of data mining scour databases for hidden patterns, predict future
trends and behaviours which allow businesses to make proactive, knowledge driven decision. This paper
analyzes ISP’s (Internet Service Provider) data to generate association rules for frequent patterns and
apply the criteria support and confidence to identify the most important relationships. Here, Apriori
algorithm is used to mine association rules. Furthermore, the conclusion point out business challenges and
provides the proposals of making ISP business profitable.
A SERVICE ORIENTED DESIGN APPROACH FOR E-GOVERNANCE SYSTEMSijitcs
The document describes a service-oriented design approach for e-governance systems. It discusses key challenges in developing e-governance systems and proposes addressing these challenges through a service-oriented paradigm. The approach defines concepts like service types (readily available, composable, collaborative), service windows, service composition, and service collaboration. Service types depend on the complexity of processing required - readily available services require minimal processing, composable services may invoke other related services, and collaborative services require coordination across service windows. The approach aims to provide reusable, interoperable services and facilitate integration of existing applications in e-governance systems.
In Cloud computing, application and desktop delivery are the two emerging technologies that has reduced
application and desktop computing costs and provided greater IT and user flexibility compared to
traditional application and desktop management models. Among the various SaaS technologies, XenApp
that allow numerous end users to connect to their corporate applications from any device. XenApp enables
organizations to improve application management by centralizing applications in the datacenter to reduce
costs, controlling and encrypting access to data and applications to improve security and delivering
applications instantly to users anywhere by remotely accessing or streaming. As per old architecture we
were using the oracle or MS-SQL in the backend of XenApp on Windows Server itself. But as we know for
this we have to pay for database especially because we are not using it for any other purpose. And as we
know windows is GUI based so running database over it consumes much more resources. So, it can lead to
single point of failure. For this, this paper proposes a scheme for removing this problem by using the HA
failover cluster based SQL server (mysql or Oracle) which will run over Linux box having concept of VIP
(Virtual IP).
An Efficient Algorithm to Calculate The Connectivity of Hyper-Rings Distribut...ijitcs
The aim of this paper is develop a software module to test the connectivity of various odd-sized HRs and attempted to answer an open question whether the node connectivity of an odd-sized HR is equal to its degree. We attempted to answer this question by explicitly testing the node connectivity's of various oddsized HRs. In this paper, we also study the properties, constructions, and connectivity of hyper-rings. We usually use a graph to represent the architecture of an interconnection network, where nodes represent processors and edges represent communication links between pairs of processors. Although the number of edges in a hyper-ring is roughly twice that of a hypercube with the same number of nodes, the diameter and the connectivity of the hyper-ring are shorter and larger, respectively, than those of the corresponding hypercube. These properties are advantageous to hyper-ring as desirable interconnection networks. This paper discusses the reliability in hyper-ring. One of the major goals in network design is to find the best way to increase the system’s reliability. The reliability of a distributed system depends on the reliabilities of its communication links and computer elements
The Internet Engineering Task Force (IETF) proposed proxy mobile ipv6 (PMIPv6) is a very promising
network based mobility management protocol. There are three entities LMA MAG and AAA server required
for the proper functioning of PMIPv6.. In PMIPv6 Networks having many disadvantages like signaling
overhead , handover latency ,packet loss problem and long authentication latency problems during
handoff. So we propose a new mechanism called SPAM which performs efficient authentication procedure
globally with low computational cost. It also supports global access technique using ticket based
algorithm. Which allows user’s mobile terminals to use only one ticket to communicate with their neighbor
Access Points? This algorithm not only reduces the mobile terminal’s computational cost but also provides
user ID protection to protect user privacy. Through this technique, it can implement handover
authentication protocol, which in turn results in less computation cost and communication delay when
compared with other existing methods.
ASSESSING THE ORGANIZATIONAL READINESS FOR IMPLEMENTING BI SYSTEMSijitcs
Implementation of business intelligence systems (BIS) is very complex and requires a lot of resources and
time. Business intelligence (BI) is a difficult concept and has a multi-tier architecture. The metadata causes
the complexity of the BI. That is why a BI readiness model is required. The frequency of BI maturity
models, such as the data warehousing (TDW), has been provided, but there are few frameworks for
measuring the readiness. Moreover, readiness frameworks often provide a general model for all
organizations. Hence, the objective of this study was to examine whether the factors affecting the
organizational readiness for BI implementation in all organizations are identical. For this purpose, based
on a comprehensive literature review, four factors of culture, people, strategy, and management were
extracted as the most important factors affecting the readiness and implementation of BI, and they were
studied in three educational, commerce, and IT organizations. Based on the findings, different factors
affect various organizations, and using a general model should not be advised.
New development in sensors, radar and ultrasonic technologies has proved to be a boon for electronics
travelling aids (ETAs). These devices are widely used by blind and physically challenged peoples. C5 laser
cane, Mowat sensor, belt and binaural sonic aid, NAV guide cane are among popular electronic travelling
aids used by blind peoples. For physically challenged person electric wheel chairs controlled by joystick,
eye movement and voice recognition are also available but they have their own limitation in terms of
operating complexity, noise environment and cost. Our paper proposes an automated innovative
wheelchair controlled by neck position of person. It uses simple LEDs, photo sensor, motor and
microcontroller to control the movement of wheelchair
ASSESSMENT OF ALTERNATIVE PRECISION POSITIONING SYSTEMSijitcs
The continuous evolution of technology, electronics, and software along with the dramatic decrease in the
cost of electronic devices has led to the spread of sensing, surveillance, and control devices. The Internetof-Things
(IoT) benefits from the spread of devices (things) by processing device feeds using Machine-toMachine
(M2M) technologies. At the heart of the M2M technologies lies the ability of devices (things) to identify their own location on the globe or relative to known landmark. Since location awareness is fundamental to processing sensing and control feeds, it has attracted researchers to identify ways to
identify and improve location accuracy. The article looks at Global Positioning Systems (GPS) along with the enhancements and amendments that apply to satellite based solutions. The article also looks at medium to short-range wireless solutions such as cellular, Wi-Fi, Dedicated Short-Range Communications (5.9 GHz DSRC) and similar solutions.
In recent years there has been an exponential growth of e-Governance in India. It is growing to such a
scale that requires full attention of the Government to ensure collaboration among different government
departments, private sectors and Non-Governmental Organisations(NGOs). In order to achieve successful
e-Governance, Government has to facilitate delivery of services to citizens, business houses and other
public or private organisations according to their requirements. In this paper, we have proposed
integration of different government departments using a Service Oriented e-Governance(SOeGov)
approach with web services technology and Service Oriented Architecture(SOA). The proposed approach
can be effectively used for achieving integration and interoperability in an e-Governance system.We have
demonstrated the working of our approach through a case study where integration of several departments
of the provincial Government of Odisha (India) has been made possible.
Brain tumor detection and localization in magnetic resonance imagingijitcs
A tumor also known as neoplasm is a growth in the abnormal tissue which can be differentiated from the
surrounding tissue by its structure. A tumor may lead to cancer, which is a major leading cause of death and
responsible for around 13% of all deaths world-wide. Cancer incidence rate is growing at an alarming rate
in the world. Great knowledge and experience on radiology are required for accurate tumor detection in
medical imaging. Automation of tumor detection is required because there might be a shortage of skilled
radiologists at a time of great need. We propose an automatic brain tumor detectionand localization
framework that can detect and localize brain tumor in magnetic resonance imaging. The proposed brain
tumor detection and localization framework comprises five steps: image acquisition, pre-processing, edge
detection, modified histogram clustering and morphological operations. After morphological operations,
tumors appear as pure white color on pure black backgrounds. We used 50 neuroimages to optimize our
system and 100 out-of-sample neuroimages to test our system. The proposed tumor detection and localization
system was found to be able to accurately detect and localize brain tumor in magnetic resonance imaging.
The preliminary results demonstrate how a simple machine learning classifier with a set of simple
image-based features can result in high classification accuracy. The preliminary results also demonstrate the
efficacy and efficiency of our five-step brain tumor detection and localization approach and motivate us to
extend this framework to detect and localize a variety of other types of tumors in other types of medical
imagery.
Load balancing using ant colony in cloud computingijitcs
Ants are very small insects.They are capable to find food even they are complete blind. The ants lives in
their nest and their job is to search food while they get hungry. We are not interested in their living style,
such as how they live, how they sleep. But we are interested in how they search for food, and how they find
the shortest path. The technique for finding the shortest path are now applying in cloud computing. The Ant
Colony approach towards Cloud Computing gives better performance.
PDHPE is a key learning area in the NSW primary school curriculum that helps children develop important life skills through physical development, health, and physical education. It supports the development of skills like communication, decision making, interaction, movement, and problem solving. PDHPE also helps children develop motor skills and an active lifestyle by encouraging at least 30 minutes of physical activity per day. Additionally, it provides an understanding of physical development, covers a wide range of health topics, and supports children's learning across other subjects by giving opportunities to be creative, active, and develop friendships.
MOBILE CLOUD COMPUTING APPLIED TO HEALTHCARE APPROACHijitcs
In the past few years it was clear that mobile cloud computing was established via integrating both mobile computing and cloud computing to be add in both storage space and processing speed. Integrating
healthcare applications and services is one of the vast data approaches that can be adapted to mobile
cloud computing. This work proposes a framework of a global healthcare computing based combining both
mobile computing and cloud computing. This approach leads to integrate all of the required services and overcoming the barriers through facilitating both privacy and security.
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANKIJDKP
Retail company’s data may be geographically spread in different locations due to huge amount of data and
rapid growth in transactions. But for decision making, knowledge workers need integrated data of all sites.
Therefore the main challenge is to get generalized patterns or knowledge from the transactional data
which is spread at various locations. Transporting data from those locations to server site increases the
cost of transportation of data and at the same time finding patterns from huge data on the server increases
the time and space complexity. Thus multi-database mining plays a vital role to extract knowledge from
different data sources. Thus the technique proposed finds the patterns on various sites and instead of
transporting the data, only the patterns from various locations get transported to the server to find final
deliverable pattern. The technique uses the ranking algorithm to rank the items based on their profit, date
of expiry and stock available at each location. Then association rule mining (ARM) is used to extract
patterns based on ranking of items. Finally all the patterns discovered from various locations are merged
using pattern merger algorithm. Proposed algorithm is implemented and experimental results are taken
for both classical association rule mining on integrated data and for datasets at various sources. Finally
all patterns are combined to discover actionable patterns using pattern merger algorithm given in section
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...IOSR Journals
This document presents an improved item-based maxcover algorithm to protect sensitive patterns in large databases. The algorithm aims to minimize information loss when sanitizing databases to hide sensitive patterns. It works by identifying sensitive transactions containing restrictive patterns. It then sorts these transactions by degree and size and selects victim items to remove based on which items have the maximum cover across multiple patterns. This is done with only one scan of the source database. Experimental results on real datasets show the algorithm achieves zero hiding failure and low misses costs between 0-2.43% while keeping the sanitization rate between 40-68% and information loss below 1.1%.
Mining Fuzzy Association Rules from Web Usage Quantitative Data csandit
Web usage mining is the method of extracting interesting patterns from Web usage log file. Web
usage mining is subfield of data mining uses various data mining techniques to produce
association rules. Data mining techniques are used to generate association rules from
transaction data. Most of the time transactions are boolean transactions, whereas Web usage
data consists of quantitative values. To handle these real world quantitative data we used fuzzy
data mining algorithm for extraction of association rules from quantitative Web log file. To
generate fuzzy association rules first we designed membership function. This membership
function is used to transform quantitative values into fuzzy terms. Experiments are carried out
on different support and confidence. Experimental results show the performance of the
algorithm with varied supports and confidence.
The document discusses mining frequent patterns from object-relational data. It begins with an introduction to data mining and frequent pattern mining. It then reviews related work on data models, including relational, object-oriented, and object-relational data models. Several frequent pattern mining algorithms are described, including Apriori, FP-Growth, ECLAT and RElim. Two approaches for mining object-relational data are proposed: a fundamental approach that treats it similarly to transactional data, and a nested-relations approach to handle nested attributes. The document concludes by outlining parameters for applying frequent pattern mining algorithms to object-relational data.
Associationship is an important component of data mining. In real world applications, the knowledge that is used for aiding decision-making is always time-varying. However, most of the existing data mining approaches rely on the assumption that discovered knowledge is valid indefinitely. For supporting better decision making, it is desirable to be able to actually identify the temporal features with the interesting patterns or rules. This paper presents a novel approach for mining Efficient Temporal Association Rule (ETAR). The basic idea of ETAR is to first partition the database into time periods of item set and then progressively accumulates the occurrence count of each item set based on the intrinsic partitioning characteristics. Explicitly, the execution time of ETAR is, in orders of magnitude, smaller than those required by schemes which are directly extended from existing methods because it scan the database only once.
Social navigation can be considered as an effective approach for supporting information management issues particularly privacy and security concern that prevail in the management of information systems. Any of these management information system security issues are a matter of critical acknowledgement for knowledge management systems as well. This paper outlines multiple privacy and security risks that can be applied to information systems in general. Including examples from different sectors such as health care, public information, and e-commerce, these management information issues provide an outline of the present situation. It also observes the key concerns of information system executive in these areas, highlighting the identification and explanation of regional differences and similarities. Selecting on a current issue in management information system the paper provides a detailed amount of knowledge on the possible reasons for the issues such as factors of economic development, technological status, and political/legal environment. The paper concludes by providing a revised framework for the management information issues formulated with effective literature searches. This is going to be effective enough for the studies that will more likely support such reasoning in the future.
CELL TRACKING QUALITY COMPARISON BETWEEN ACTIVE SHAPE MODEL (ASM) AND ACTIVE ...ijitcs
The aim of this paper is to introduce a comparison between cell tracking using active shape model (ASM)
and active appearance model (AAM) algorithms, to compare the cells tracking quality between the two
methods to track the mobility of the living cells. Where sensitive and accurate cell tracking system is
essential to cell motility studies. The active shape model (ASM) and active appearance model (AAM)
algorithms has proved to be a successful methods for matching statistical models. The experimental results
indicate the ability of (AAM) meth
ADAPTIVESYNCHRONIZER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZH...ijitcs
This paper derives new adaptive synchronizers for the hybrid synchronization of hyperchaotic Zheng
systems (2010) and hyperchaotic Yu systems (2012). In the hybrid synchronization design of master and
slave systems, one part of the systems, viz. their odd states, are completely synchronized (CS), while the
other part, viz. their even states, are completely anti-synchronized (AS) so that CS and AS co-exist in the
process of synchronization. The research problem gets even more complicated, when the parameters of the
hyperchaotic systems are not known and we handle this complicate problem using adaptive control. The
main results of this research work are established via adaptive control theory andLyapunov stability
theory. MATLAB plotsusing classical fourth-order Runge-Kutta method have been depictedfor the new
adaptive hybrid synchronization results for the hyperchaotic Zheng and hyperchaotic Yu systems.
Making isp business profitable using data miningijitcs
This Data mining is a new powerful technology with great potential to extract hidden predictive
information from large databases. Tools of data mining scour databases for hidden patterns, predict future
trends and behaviours which allow businesses to make proactive, knowledge driven decision. This paper
analyzes ISP’s (Internet Service Provider) data to generate association rules for frequent patterns and
apply the criteria support and confidence to identify the most important relationships. Here, Apriori
algorithm is used to mine association rules. Furthermore, the conclusion point out business challenges and
provides the proposals of making ISP business profitable.
A SERVICE ORIENTED DESIGN APPROACH FOR E-GOVERNANCE SYSTEMSijitcs
The document describes a service-oriented design approach for e-governance systems. It discusses key challenges in developing e-governance systems and proposes addressing these challenges through a service-oriented paradigm. The approach defines concepts like service types (readily available, composable, collaborative), service windows, service composition, and service collaboration. Service types depend on the complexity of processing required - readily available services require minimal processing, composable services may invoke other related services, and collaborative services require coordination across service windows. The approach aims to provide reusable, interoperable services and facilitate integration of existing applications in e-governance systems.
In Cloud computing, application and desktop delivery are the two emerging technologies that has reduced
application and desktop computing costs and provided greater IT and user flexibility compared to
traditional application and desktop management models. Among the various SaaS technologies, XenApp
that allow numerous end users to connect to their corporate applications from any device. XenApp enables
organizations to improve application management by centralizing applications in the datacenter to reduce
costs, controlling and encrypting access to data and applications to improve security and delivering
applications instantly to users anywhere by remotely accessing or streaming. As per old architecture we
were using the oracle or MS-SQL in the backend of XenApp on Windows Server itself. But as we know for
this we have to pay for database especially because we are not using it for any other purpose. And as we
know windows is GUI based so running database over it consumes much more resources. So, it can lead to
single point of failure. For this, this paper proposes a scheme for removing this problem by using the HA
failover cluster based SQL server (mysql or Oracle) which will run over Linux box having concept of VIP
(Virtual IP).
An Efficient Algorithm to Calculate The Connectivity of Hyper-Rings Distribut...ijitcs
The aim of this paper is develop a software module to test the connectivity of various odd-sized HRs and attempted to answer an open question whether the node connectivity of an odd-sized HR is equal to its degree. We attempted to answer this question by explicitly testing the node connectivity's of various oddsized HRs. In this paper, we also study the properties, constructions, and connectivity of hyper-rings. We usually use a graph to represent the architecture of an interconnection network, where nodes represent processors and edges represent communication links between pairs of processors. Although the number of edges in a hyper-ring is roughly twice that of a hypercube with the same number of nodes, the diameter and the connectivity of the hyper-ring are shorter and larger, respectively, than those of the corresponding hypercube. These properties are advantageous to hyper-ring as desirable interconnection networks. This paper discusses the reliability in hyper-ring. One of the major goals in network design is to find the best way to increase the system’s reliability. The reliability of a distributed system depends on the reliabilities of its communication links and computer elements
The Internet Engineering Task Force (IETF) proposed proxy mobile ipv6 (PMIPv6) is a very promising
network based mobility management protocol. There are three entities LMA MAG and AAA server required
for the proper functioning of PMIPv6.. In PMIPv6 Networks having many disadvantages like signaling
overhead , handover latency ,packet loss problem and long authentication latency problems during
handoff. So we propose a new mechanism called SPAM which performs efficient authentication procedure
globally with low computational cost. It also supports global access technique using ticket based
algorithm. Which allows user’s mobile terminals to use only one ticket to communicate with their neighbor
Access Points? This algorithm not only reduces the mobile terminal’s computational cost but also provides
user ID protection to protect user privacy. Through this technique, it can implement handover
authentication protocol, which in turn results in less computation cost and communication delay when
compared with other existing methods.
ASSESSING THE ORGANIZATIONAL READINESS FOR IMPLEMENTING BI SYSTEMSijitcs
Implementation of business intelligence systems (BIS) is very complex and requires a lot of resources and
time. Business intelligence (BI) is a difficult concept and has a multi-tier architecture. The metadata causes
the complexity of the BI. That is why a BI readiness model is required. The frequency of BI maturity
models, such as the data warehousing (TDW), has been provided, but there are few frameworks for
measuring the readiness. Moreover, readiness frameworks often provide a general model for all
organizations. Hence, the objective of this study was to examine whether the factors affecting the
organizational readiness for BI implementation in all organizations are identical. For this purpose, based
on a comprehensive literature review, four factors of culture, people, strategy, and management were
extracted as the most important factors affecting the readiness and implementation of BI, and they were
studied in three educational, commerce, and IT organizations. Based on the findings, different factors
affect various organizations, and using a general model should not be advised.
New development in sensors, radar and ultrasonic technologies has proved to be a boon for electronics
travelling aids (ETAs). These devices are widely used by blind and physically challenged peoples. C5 laser
cane, Mowat sensor, belt and binaural sonic aid, NAV guide cane are among popular electronic travelling
aids used by blind peoples. For physically challenged person electric wheel chairs controlled by joystick,
eye movement and voice recognition are also available but they have their own limitation in terms of
operating complexity, noise environment and cost. Our paper proposes an automated innovative
wheelchair controlled by neck position of person. It uses simple LEDs, photo sensor, motor and
microcontroller to control the movement of wheelchair
ASSESSMENT OF ALTERNATIVE PRECISION POSITIONING SYSTEMSijitcs
The continuous evolution of technology, electronics, and software along with the dramatic decrease in the
cost of electronic devices has led to the spread of sensing, surveillance, and control devices. The Internetof-Things
(IoT) benefits from the spread of devices (things) by processing device feeds using Machine-toMachine
(M2M) technologies. At the heart of the M2M technologies lies the ability of devices (things) to identify their own location on the globe or relative to known landmark. Since location awareness is fundamental to processing sensing and control feeds, it has attracted researchers to identify ways to
identify and improve location accuracy. The article looks at Global Positioning Systems (GPS) along with the enhancements and amendments that apply to satellite based solutions. The article also looks at medium to short-range wireless solutions such as cellular, Wi-Fi, Dedicated Short-Range Communications (5.9 GHz DSRC) and similar solutions.
In recent years there has been an exponential growth of e-Governance in India. It is growing to such a
scale that requires full attention of the Government to ensure collaboration among different government
departments, private sectors and Non-Governmental Organisations(NGOs). In order to achieve successful
e-Governance, Government has to facilitate delivery of services to citizens, business houses and other
public or private organisations according to their requirements. In this paper, we have proposed
integration of different government departments using a Service Oriented e-Governance(SOeGov)
approach with web services technology and Service Oriented Architecture(SOA). The proposed approach
can be effectively used for achieving integration and interoperability in an e-Governance system.We have
demonstrated the working of our approach through a case study where integration of several departments
of the provincial Government of Odisha (India) has been made possible.
Brain tumor detection and localization in magnetic resonance imagingijitcs
A tumor also known as neoplasm is a growth in the abnormal tissue which can be differentiated from the
surrounding tissue by its structure. A tumor may lead to cancer, which is a major leading cause of death and
responsible for around 13% of all deaths world-wide. Cancer incidence rate is growing at an alarming rate
in the world. Great knowledge and experience on radiology are required for accurate tumor detection in
medical imaging. Automation of tumor detection is required because there might be a shortage of skilled
radiologists at a time of great need. We propose an automatic brain tumor detectionand localization
framework that can detect and localize brain tumor in magnetic resonance imaging. The proposed brain
tumor detection and localization framework comprises five steps: image acquisition, pre-processing, edge
detection, modified histogram clustering and morphological operations. After morphological operations,
tumors appear as pure white color on pure black backgrounds. We used 50 neuroimages to optimize our
system and 100 out-of-sample neuroimages to test our system. The proposed tumor detection and localization
system was found to be able to accurately detect and localize brain tumor in magnetic resonance imaging.
The preliminary results demonstrate how a simple machine learning classifier with a set of simple
image-based features can result in high classification accuracy. The preliminary results also demonstrate the
efficacy and efficiency of our five-step brain tumor detection and localization approach and motivate us to
extend this framework to detect and localize a variety of other types of tumors in other types of medical
imagery.
Load balancing using ant colony in cloud computingijitcs
Ants are very small insects.They are capable to find food even they are complete blind. The ants lives in
their nest and their job is to search food while they get hungry. We are not interested in their living style,
such as how they live, how they sleep. But we are interested in how they search for food, and how they find
the shortest path. The technique for finding the shortest path are now applying in cloud computing. The Ant
Colony approach towards Cloud Computing gives better performance.
PDHPE is a key learning area in the NSW primary school curriculum that helps children develop important life skills through physical development, health, and physical education. It supports the development of skills like communication, decision making, interaction, movement, and problem solving. PDHPE also helps children develop motor skills and an active lifestyle by encouraging at least 30 minutes of physical activity per day. Additionally, it provides an understanding of physical development, covers a wide range of health topics, and supports children's learning across other subjects by giving opportunities to be creative, active, and develop friendships.
MOBILE CLOUD COMPUTING APPLIED TO HEALTHCARE APPROACHijitcs
In the past few years it was clear that mobile cloud computing was established via integrating both mobile computing and cloud computing to be add in both storage space and processing speed. Integrating
healthcare applications and services is one of the vast data approaches that can be adapted to mobile
cloud computing. This work proposes a framework of a global healthcare computing based combining both
mobile computing and cloud computing. This approach leads to integrate all of the required services and overcoming the barriers through facilitating both privacy and security.
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...IJDKP
The document describes a proposed algorithm called RAQ-FIG for mining frequent itemsets from transactional data streams. It operates using a sliding window model composed of basic windows. The algorithm has three phases: 1) initializing the sliding window by filling it with recent transactions from a buffer, 2) generating bit sequences for each basic window and finding frequent itemsets through bitwise operations, and 3) adapting the algorithm's processing based on available memory and quality metrics to ensure efficient resource usage and accurate results. The algorithm aims to account for computational resources and dynamically adjust the processing rate based on available memory while computing recent approximate frequent itemsets with a single pass.
An Efficient Algorithm for Mining Frequent Itemsets within Large Windows over...Waqas Tariq
Sliding window is an interesting model for frequent pattern mining over data stream due to handling concept change by considering recent data. In this study, a novel approximate algorithm for frequent itemset mining is proposed which operates in both transactional and time sensitive sliding window model. This algorithm divides the current window into a set of partitions and estimates the support of newly appeared itemsets within the previous partitions of the window. By monitoring essential set of itemsets within incoming data, this algorithm does not waste processing power for itemsets which are not frequent in the current window. Experimental evaluations using both synthetic and real datasets shows the superiority of the proposed algorithm with respect to previously proposed algorithms.
The challenges with respect to mining frequent items over data streaming engaging variable window size
and low memory space are addressed in this research paper. To check the varying point of context change
in streaming transaction we have developed a window structure which will be in two levels and supports in
fixing the window size instantly and controls the heterogeneities and assures homogeneities among
transactions added to the window. To minimize the memory utilization, computational cost and improve the
process scalability, this design will allow fixing the coverage or support at window level. Here in this
document, an incremental mining of frequent item-sets from the window and a context variation analysis
approach are being introduced. The complete technology that we are presenting in this document is named
as Mining Frequent Item-sets using Variable Window Size fixed by Context Variation Analysis (MFI-VWSCVA).
There are clear boundaries among frequent and infrequent item-sets in specific item-sets. In this
design we have used window size change to represent the conceptual drift in an information stream. As it
were, whenever there is a problem in setting window size effectively the item-set will be infrequent. The
experiments that we have executed and documented proved that the algorithm that we have designed is
much efficient than that of existing.
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
This document summarizes a research paper that proposes a new approach called CBSW (Chernoff Bound based Sliding Window) for mining frequent itemsets from data streams. CBSW uses concepts from the Chernoff bound to dynamically determine the window size for mining frequent itemsets. It monitors boundary movements in a synopsis data structure to detect changes in the data stream and adjusts the window size accordingly. Experimental results demonstrate the effectiveness of CBSW in mining frequent itemsets from high-speed data streams.
This document summarizes a research paper that proposes a new algorithm called ESW-FI to efficiently mine frequent itemsets from data streams using a sliding window model. The algorithm actively maintains potentially frequent itemsets in a compact data structure using only a single pass over the data. It guarantees output quality and bounds memory usage. The algorithm divides the sliding window into fixed-size segments and processes window slides by inserting new segments and removing old ones, avoiding reprocessing of all transactions on each slide.
This document summarizes an algorithm called ESW-FI that efficiently mines frequent itemsets from data streams using a sliding window model. The algorithm actively maintains potentially frequent itemsets in a compact data structure using only a single pass over the data. This is an improvement over existing algorithms that require multiple scans or maintaining all transaction data within the window. The ESW-FI algorithm guarantees output quality and bounds memory usage while processing streams of continuous, unpredictable data in a timely manner.
Mining high utility itemsets in data streams based on the weighted sliding wi...IJDKP
This document proposes an algorithm for mining high utility itemsets from data streams based on the weighted sliding window model. The weighted sliding window model allows the user to specify the number of windows, window size, and weight for each window. The algorithm, called HUI_W, makes a single pass over the data stream and takes advantage of reusing stored information to efficiently discover high utility itemsets. It first identifies high transaction-weighted utilization (HTWU) 1-itemsets, then generates candidate itemsets and determines actual high utility itemsets based on a transaction-weighted downward closure property to prune the search space.
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsAllenWu
This paper proposed a novel tree structure, DSTree, which can handle the stream data. The experiments show the comparable performance in terms of accuracy and efficiency.
A New Data Stream Mining Algorithm for Interestingness-rich Association RulesVenu Madhav
Frequent itemset mining and association rule generation is
a challenging task in data stream. Even though, various algorithms
have been proposed to solve the issue, it has been found
out that only frequency does not decides the significance
interestingness of the mined itemset and hence the association
rules. This accelerates the algorithms to mine the association
rules based on utility i.e. proficiency of the mined rules. However,
fewer algorithms exist in the literature to deal with the utility
as most of them deals with reducing the complexity in frequent
itemset/association rules mining algorithm. Also, those few
algorithms consider only the overall utility of the association
rules and not the consistency of the rules throughout a defined
number of periods. To solve this issue, in this paper, an enhanced
association rule mining algorithm is proposed. The algorithm
introduces new weightage validation in the conventional
association rule mining algorithms to validate the utility and
its consistency in the mined association rules. The utility is
validated by the integrated calculation of the cost/price efficiency
of the itemsets and its frequency. The consistency validation
is performed at every defined number of windows using the
probability distribution function, assuming that the weights are
normally distributed. Hence, validated and the obtained rules
are frequent and utility efficient and their interestingness are
distributed throughout the entire time period. The algorithm is
implemented and the resultant rules are compared against the
rules that can be obtained from conventional mining algorithms
Mining Stream Data using k-Means clustering AlgorithmManishankar Medi
This document discusses using k-means clustering to analyze urban road traffic stream data. Stream data arrives continuously over time and is challenging to process due to its high volume, velocity and volatility. The document proposes using a sliding window technique with k-means clustering to analyze recent urban traffic data and visualize clusters in real-time to provide insights into traffic patterns and congested roads. This analysis could help travelers and authorities respond to traffic issues more quickly.
This document discusses an approach for mining frequent itemsets from data streams using the Chernoff bound and sliding window model. The proposed CB-based method approximates itemset counts from summary information without rescanning the stream, making it adaptive to streams with different distributions. Experiments showed the method performs better in optimizing memory usage and mining recent patterns in less time with accurate results. The document reviews related work on frequent itemset mining from data streams and motivates the need for an efficient model to handle time-sensitive items in uncertain streams.
1. The document discusses stream data mining and compares classification algorithms. It defines stream data and challenges in mining stream data.
2. It describes sampling techniques and classification algorithms for stream data mining including Naive Bayesian, Hoeffding Tree, VFDT, and CVFDT.
3. The algorithms are experimentally compared in terms of time, memory usage, accuracy, and ability to handle concept drift. VFDT and CVFDT are found to have advantages over Hoeffding Tree in accuracy while maintaining speed, but CVFDT can additionally detect and respond to concept drift.
Weighted frequent pattern mining is suggested to find out more important frequent pattern by considering different weights of each item. Weighted Frequent Patterns are generated in weight ascending and frequency descending order by using prefix tree structure. These generated weighted frequent patterns are applied to maximal frequent item set mining algorithm. Maximal frequent pattern mining can reduces the number of frequent patterns and keep sufficient result information. In this paper, we proposed an efficient algorithm to mine maximal weighted frequent pattern mining over data streams. A new efficient data structure i.e. prefix tree and conditional tree structure is used to dynamically maintain the information of transactions. Here, three information mining strategies (i.e. Incremental, Interactive and Maximal) are presented. The detail of the algorithms is also discussed. Our study has submitted an application to the Electronic shop Market Basket Analysis. Experimental studies are performed to evaluate the good effectiveness of our algorithm..
Mining Regular Patterns in Data Streams Using Vertical FormatCSCJournals
The increasing prominence of data streams has been lead to the study of online mining in order to capture interesting trends, patterns and exceptions. Recently, temporal regularity in occurrence behavior of a pattern was treated as an emerging area in several online applications like network traffic, sensor networks, e-business and stock market analysis etc. A pattern is said to be regular in a data stream, if its occurrence behavior is not more than the user given regularity threshold. Although there has been some efforts done in finding regular patterns over stream data, no such method has been developed yet by using vertical data format. Therefore, in this paper we develop a new method called VDSRP-method to generate the complete set of regular patterns over a data stream at a user given regularity threshold. Our experimental results show that highly efficiency in terms of execution and memory consumption.
This document discusses sequential pattern mining, which aims to discover patterns or rules in sequential data where events are ordered by time. It provides background on sequential pattern mining and its applications. The document also discusses related work on mining sequential patterns and rules from time-series data and across multiple sequences. It describes algorithms for efficiently mining sequential patterns at scale from large databases.
A Quantified Approach for large Dataset Compression in Association MiningIOSR Journals
Abstract: With the rapid development of computer and information technology in the last several decades, an
enormous amount of data in science and engineering will continuously be generated in massive scale; data
compression is needed to reduce the cost and storage space. Compression and discovering association rules by
identifying relationships among sets of items in a transaction database is an important problem in Data Mining.
Finding frequent itemsets is computationally the most expensive step in association rule discovery and therefore
it has attracted significant research attention. However, existing compression algorithms are not appropriate in
data mining for large data sets. In this research a new approach is describe in which the original dataset is
sorted in lexicographical order and desired number of groups are formed to generate the quantification tables.
These quantification tables are used to generate the compressed dataset, which is more efficient algorithm for
mining complete frequent itemsets from compressed dataset. The experimental results show that the proposed
algorithm performs better when comparing it with the mining merge algorithm with different supports and
execution time.
Keywords: Apriori Algorithm, mining merge Algorithm, quantification table
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...ITIIIndustries
This paper presents a new approach based on genetic algorithms (GAs) to generate maximal frequent itemsets (MFIs) from large datasets. This new algorithm, GeneticMax, is heuristic which mimics natural selection approaches for finding MFIs in an efficient way. The search strategy of this algorithm uses a lexicographic tree that avoids level by level searching which reduces the time required to mine the MFIs in a linear way. Our implementation of the search strategy includes bitmap representation of the nodes in a lexicographic tree and identifying frequent itemsets (FIs) from superset-subset relationships of nodes. This new algorithm uses the principles of GAs to perform global searches. The time complexity is less than many of the other algorithms since it uses a non-deterministic approach. We separate the effect of each step of this algorithm by experimental analysis on real datasets such as Tic-Tac-Toe, Zoo, and a 10000×8 dataset. Our experimental results showed that this approach is efficient and scalable for different sizes of itemsets. It accesses a major dataset to calculate a support value for fewer number of nodes to find the FIs even when the search space is very large, dramatically reducing the search time. The proposed algorithm shows how evolutionary method can be used on real datasets to find all the MFIs in an efficient way.
This document summarizes a survey on flexible approaches to mine high utility itemsets from transactional databases using an improved UP-Growth+ algorithm. It first provides background on mining high utility itemsets and limitations of existing methods. It then discusses using distributed programming and main memory computing to overcome these limitations and improve UP-Growth and UP-Growth+ performance. The document concludes that evaluating the proposed improvements against other methods can demonstrate effectiveness and efficiency.
An improved apriori algorithm for association rulesijnlc
There are several mining algorithms of association rules. One of the most popular algorithms is Apriori
that is used to extract frequent itemsets from large database and getting the association rule for
discovering the knowledge. Based on this algorithm, this paper indicates the limitation of the original
Apriori algorithm of wasting time for scanning the whole database searching on the frequent itemsets, and
presents an improvement on Apriori by reducing that wasted time depending on scanning only some
transactions. The paper shows by experimental results with several groups of transactions, and with
several values of minimum support that applied on the original Apriori and our implemented improved
Apriori that our improved Apriori reduces the time consumed by 67.38% in comparison with the original
Apriori, and makes the Apriori algorithm more efficient and less time consuming
1) The document discusses the Cyclic Model Analysis (CMA) technique for sequential pattern mining which aims to predict customer purchasing behavior.
2) CMA calculates the Trend Distribution Function from sequential patterns to model purchasing trends over time. It then uses Generalized Periodicity Detection and Trend Modeling to identify periodic patterns and construct an approximating model.
3) The Cyclic Model Analysis algorithm is applied to further analyze the patterns, dividing the domain into segments where the distribution function is increasing or decreasing and applying the other techniques recursively to fully model the cyclic behavior.
Similar to Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding Window Techniques (20)
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3Data Hops
Free A4 downloadable and printable Cyber Security, Social Engineering Safety and security Training Posters . Promote security awareness in the home or workplace. Lock them Out From training providers datahops.com
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-axelera-ai-uses-digital-compute-in-memory-to-deliver-fast-and-energy-efficient-computer-vision-a-presentation-from-axelera-ai/
Bram Verhoef, Head of Machine Learning at Axelera AI, presents the “How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-efficient Computer Vision” tutorial at the May 2024 Embedded Vision Summit.
As artificial intelligence inference transitions from cloud environments to edge locations, computer vision applications achieve heightened responsiveness, reliability and privacy. This migration, however, introduces the challenge of operating within the stringent confines of resource constraints typical at the edge, including small form factors, low energy budgets and diminished memory and computational capacities. Axelera AI addresses these challenges through an innovative approach of performing digital computations within memory itself. This technique facilitates the realization of high-performance, energy-efficient and cost-effective computer vision capabilities at the thin and thick edge, extending the frontier of what is achievable with current technologies.
In this presentation, Verhoef unveils his company’s pioneering chip technology and demonstrates its capacity to deliver exceptional frames-per-second performance across a range of standard computer vision networks typical of applications in security, surveillance and the industrial sector. This shows that advanced computer vision can be accessible and efficient, even at the very edge of our technological ecosystem.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding Window Techniques
1. International Journal of Information Technology Convergence and Services (IJITCS) Vol.3, No.2, April 2013
DOI:10.5121/ijitcs.2013.3201 1
Mining Maximum Frequent Item Sets Over Data
Streams Using Transaction Sliding
Window Techniques
Neeraj And Anuradha
Student/ Software engineering/M.Tech ITM University Gurgaon, 122017, India
neeraj.yadav28 @gmail.com
Faculty/ Software engineering ITM University Gurgaon, 122017, India
anuradha@itmindia.edu
ABSTRACT
As we know that the online mining of streaming data is one of the most important issues in data mining. In
this paper, we proposed an efficient one- .frequent item sets over a transaction-sensitive sliding window),
to mine the set of all frequent item sets in data streams with a transaction-sensitive sliding window. An
effective bit-sequence representation of items is used in the proposed algorithm to reduce the time and
memory needed to slide the windows. The experiments show that the proposed algorithm not only attain
highly accurate mining results, but also the performance significant faster and consume less memory than
existing algorithms for mining frequent item sets over recent data streams. In this paper our theoretical
analysis and experimental studies show that the proposed algorithm is efficient and scalable and perform
better for mining the set of all maximum frequent item sets over the entire history of the data streams.
Keywords
Recently frequent item sets, Maximum frequent item set, Transaction sliding window, Data stream.
1. INTRODUCTION
We know that the frequent item sets mining play an essential role in many data mining tasks.
With years of research into this research, several data stream mining problems have been
discussed, such as frequent item set mining [5, 6, 7, 8, 9], closed frequent structure mining
[19,24,29]. The frequent item set mining over data streams is to find an approximate set of
frequent item sets in transaction with respect to a given support and threshold. It should support
the flexible trade-off between processing time and mining accuracy. It should be time efficient
even when the user-specified minimum support threshold is small. The objective was to propose
an effective algorithm which generates frequent patterns in a very less time. Our approach has
been developed based on improvement and analysis of MFI algorithm. Frequent patterns can be
very meaningful in data streams. In network monitoring, frequent patterns can correspond to
excessive traffic which could be an indicator for network attack. In sales transactions, frequent
patterns relate to the top selling products in a market, and possibly their relationships. Due to
numerous applications of this sort mining frequent patterns from data stream has been a hot
research area. If we consider that the data stream consists of transactions, each being a set of
items, then the problem definition of mining frequent patterns can be written as - Given a set of
transactions, find all patterns with frequency above a threshold s. Traditional mining algorithms
assume a finite dataset over which the algorithms are allowed to scan multiple times. Therefore it
is necessary to modify these algorithms in order to apply these algorithms to transactional data
streams. An important property that distinguishes data stream mining from traditional data mining
2. International Journal of Information Technology Convergence and Services (IJITCS) Vol.3, No.2, April 2013
2
is the unbounded nature of stream data, which precludes multiple-scan algorithms. Traditional
frequent item sets mining algorithms are thus not applicable to a data stream [11].
Stream mining algorithms are being developed to discover useful knowledge from data as it
streams in -- a process that isn't as straightforward as it might seem. Scientists have to deal with
the fact that the data rate of the stream isn't constant, leading to a condition called business, and
the patterns of the data stream are continuously evolving. Online mining of frequent item sets
over a stream sliding window is one of the most important problems in stream data mining with
broad applications. It is also a difficult issue since the streaming data possess some challenging
characteristics, such as unknown or unbound size, possibly a very fast arrival rate, inability to
backtrack over previously arrived transactions, and a lack of system control
overtheorderinwhichthedataarrive.Inthispaper,weproposeaneffectivebit-sequencebased,one pass
algorithm, called MFI-TransSW (Mining Frequent Item sets within a Transaction-sensitive
Sliding Window), to mine the set of frequent item sets from data streams within a transaction-
sensitive sliding window which consists of a fixed number of transactions[2,28,30]
Figure first shows the categories of data streams are divided into three parts: Landmark windows,
sliding windows, damped windows. Further the sliding windows also divided into two more parts:
transaction sensitive windows and time sensitive sliding windows [2, 21, 22, 23].
Fig 1.Categories of data stream.
Pei, Yin, & Mao, 2004 is used to maintain the frequent item sets and their supports stored in
titled-time windows. In a sliding window model, knowledge discovery is performed over a fixed
number of recently generated data elements which is the target of data mining. Two types of
sliding widow, i.e., transaction-sensitive sliding window and time-sensitive sliding window, are
used in mining data streams. The basic processing unit of window sliding of transaction-sensitive
sliding window is an expired transaction while the basic unit of window sliding of time-sensitive
sliding window is a time unit, such as a minute or 1h. [3, 20, 25, 26, 27]
According to the stream processing model, the research of mining frequent item sets in data
streams can be divided into three categories, snapshot windows, sliding windows and Landmark
windows. In snapshot window model the frequent items are detected in a fixed range of time so
that model is not used frequently. In landmark window model knowledge discovery is performed
based on the values between the specific timestamp called landmark and the present. In the
sliding window model knowledge discovery is performed over a fixed number of recently
generated data elements which is the target of data mining. Sliding window model is a widely
3. International Journal of Information Technology Convergence and Services (IJITCS) Vol.3, No.2, April 2013
3
used model to perform frequent item set mining since it considers only recent transactions and
forgets obsolete ones. Due to this reason, a large number of sliding window based algorithms
have been devised [11][12] [13][14][15][17]. However, only a few of these studies adaptively
maintain and update the set of frequent item sets [13][14][15] and others only store sliding
window transactions in an efficient way using a suitable synopsis structure and perform the
mining task when the user requests.
The purpose of this paper is to mining the frequent item set over online data streams with the help
of the transaction sensitive sliding window. The experiment shows that the purposed algorithm
MFI-TransSW not only attained high accurate mining data but also increase the speed as well as
the memory uses is less than the existing mining algorithm. The problem of mining frequent item
set is defined and the algorithm is purposed also in this paper.
2. PROBLEM DEFINITION
One of the most important data mining problems is mining maximal frequent item sets from a
large database [2, 6, 7, 14, 25].This problem of mining maximal frequent item sets was first of all
given by Bayardo [4].
Let W= {i1, i2,..., im} be a set of items. A transaction T= (tid, x1x2...xn), xi 2W, for is a set of
items, while n is called the size of the transaction, and tid is the unique identifier of the
transaction. An item set is a non empty set of items. An item set with size k is called a k-item set.
A transaction data stream TDS=T1,T2,...,TN is a continuous sequence of transactions, where N is
the tid of latest incoming transaction TN.A transaction-sensitive sliding window (TransSW) in the
transaction data stream is a window that slides forward for every transaction. The window at each
slide has a fixed number, w, of transactions, and w is called the size of the window. Hence, the
current transaction- sensitive sliding windows
TransSWNw+1=[TN w+1,TN w+2,...,TN],where w+1 is the window identifier of current
TransSW.The support of an item set X over TransSW, denoted as sup(X) TransSW, is the number
of transactions in TransSW containing X as a subset. An item set X is called a frequent item set
(FI) if sup(X) TranSWPs w, where s is a user defined minimum support threshold (MS) in the
range of [0, 1, 2]. The value s w is called the frequent threshold of TranSW (FTTranSW) given
a transaction-sensitive sliding window TransSW, and a MST s, the problem of online mining of
frequent item sets in recent transaction data streams is to mine the set of all frequent item sets by
one scan of the TransSW.
EXAMPLE: Suppose we have six transaction in a transaction data stream be <T1,(1 3 4
7)>,<T2,(2 3 5 6 8)>,<T3,(1 2 3 5 7) >,<T4,(2 5 8)>,<T5,(1 3 7 8)>,<T6,(2 3 7)>, where
T1,T2,T3,T4,T5,T6 are the transactions and the 1,2,3.,4,5,6,7,8 are the item sets in the
transactions .Let the size of the sliding window be 3 and the minimum support is 2.
4. International Journal of Information Technology Convergence and Services (IJITCS) Vol.3, No.2, April 2013
4
TID
T1
T2
T3
T4
T5
T6
Item sets
1,3,4,7
2,3,5,6,8
1,2,3,5,7
2,5,8
1,3,7,8
2,3,7
Item sets
1
2
3
5
7
8
Support
3
4
5
3
4
3
Item sets
{1,3}
{1,7}
{2,3}
{2,5}
{3,7}
Support
3
3
3
3
4
Item sets
{1,3,7}
Support
3
Fig 2: An example of transaction data stream and find out maximum frequent item set.
Online mining of frequent item sets within a transaction-sensitive sliding window:
In this part we describe our algorithm that is a single pass mining algorithm called MFI-Trans SW
and its bit representation of items with the help of the example .In this we compared our
algorithm with the existing sliding window based mining techniques and improve the speed as
well as memory by dynamically maintenance in the current sliding window by the effective bit
sequence representation.
1) Bit-Sequence Representation of Items
Transactions Items Bit-
Sequence
T1<1,3,4,5>
T2<2,3,5,6,7>
T3<1,2,3,5,7>
1
2
3
4
5
6
7
8
101
011
111
100
011
010
101
010
5. International Journal of Information Technology Convergence and Services (IJITCS) Vol.3, No.2, April 2013
5
101 &111=101
This is the bit
representation
for (1,3)
Fig: 3 Bit-Sequence Representations of Items.
The above figure 3 shows that the bit sequence representation. How we can represent our item set
in bit sequence representation. The computation of frequent item sets can be done efficiently by
representing all the items in the transactions of the data stream by bit sequence representation
[12]. If there are three transactions in the given window size then the length of the bit sequence
representation will be three. If there are four transactions then the length will be four. The first
bits represent the first transaction, second bit represent the second transaction and so on. If the
sliding window is containing three transaction and they contain different item set as shown in the
above fig. The concept of bit sequence representation are illustrated in the above figure 2 where
the bit sequence representation for items (1,3) is obtained by performing bit AND operator on bit
the sequence representation of item 1 and the sequence representation of item 3.The same
procedure is repeated for deriving the bit sequence representation of other item set of length 2
.Once the 2 item set are available then easily we can illustrate the three bit representation of item
three and so on. The bit sequence representation enables the computation of frequent item sets
easily because counting the number of 1’s in the bit sequence will give us the exact count of
number of transactions that contain the item set. This count can be compared with the support
threshold to find if the item set is frequent or not.
3. MFI- TransSW ALGORITHIM
3.1 Window Initialization Phase
This phase is activated when the number of transaction data stream is less than or equal to a user
defined sliding window size. In this phase every transaction is converted into bit sequence
representation.
3.2 Window Sliding Phase
ID ITEM SET Window (1, 2, 3, 4)
T1 1,3,4,7
T2 2,3,5,8
T3 1, 2,3,5,7
T4 2, 5, 8
T5 1,3,7,8
T6 2, 3, 7
Time line Window (1, 2, 3, 4)
Fig: 4 running example of window size 3.
6. International Journal of Information Technology Convergence and Services (IJITCS) Vol.3, No.2, April 2013
6
Input: TDS (a transaction data stream),ms( user defined minimum support in the range of
[0,1,2],and ws (the user defined sliding window size)
Output: FI-Output (set of frequent item set)
Begin
/*the following algorithm is for sliding window how the transactions are going according
to GWS ( given window size) */
Item_count=0,t_count=0,GWS(given window
size),TDS Load
Tdsw=NULL; /*consist of number of
transactions */
For (i=0; i>=EOF TDS;i++) /*for each
transaction */
T_count++;
If (t_count>=GWS) then
W1=TDS (1) to TDS(GWS)
Same repetitions for W2
end
if (w2 > GWS) then
go to w3
end
/*the following algorithm is for candidate generation phase and find out the maximum
frequent item set */
do item = item of transaction
For (i=o; i<=EOF TDS;i++)
If (item ==TDS (i)) then
I_count =I_count+1;
end
if (I_count >=MS) then
CG1 = w2 (FI (k-1)/*candidate item generation step */
if (CG1>=MS) then
CG2=w3=FI (K) {CG2>=MS}
Item = w3;
Go to step 1;
While EOF TDS
end if
end for
Output = MFI (maximum frequent item set)
4. EXPERIMENTS
In this experiment we shows the result of proposed MFITSW (maximum frequent item set
transaction sensitive sliding window) algorithm .All the program are implemented using
Microsoft visual studio 2010(DOT NET FRAMEWORK 4.0) and performed on a 4 GHz Intel i3
PC machine with 512 MB memory running on windows 7.
From the description of the algorithm, it is clear that the mining MFI maximum frequent item set
over online is very efficient by maintaining the various item set in the various transactions. We
have to implemented our sliding window algorithm program in the DOT NET FRAMEWORK
4.0 MICROSOFT VISUAL STUDIO 2010 .We have to compare and testing frequent item set
mining over MFITSW maximum frequent item set transaction sensitive sliding window algorithm
implementation using with IBM Synthetic Data Generator proposed by Agrawal and Srikant.
7. International Journal of Information Technology Convergence and Services (IJITCS) Vol.3, No.2, April 2013
7
Fig:2 shows the synthetic data stream, denoted by T5.I4.D1000K of size of 1 million transactions
has an average of 5 of items(T5) with average maximum frequent item set size of 4 items (I4).
Fig:-5 transactions data stream.
Fig:3 shows the significantly outperforms for memory consumption and CPU cost and take less
time to take out the maximum frequent item set of an online data stream. The sliding windows are
based upon the user requriments.If the user gives the value of GWS (given window size).
In this experiment first of download the synthetic data steams using IBM synthetic data generator
purposed by Agrawal and Srikant. After that converts the data stream into the CSV format and
run this CSV data stream into Microsoft visual studio framework 2010 and first of all design the
program in the design file drag and drops the label, text and button from the toolbox and make the
code or programs in the source file after that running the programs and get the output. Select the
CSV file of online transaction data streams and click on the submit button for find out the
maximum frequent item set as an output.
Fig:-6 output as maximum frequent item set.
8. International Journal of Information Technology Convergence and Services (IJITCS) Vol.3, No.2, April 2013
8
Experiment 1
Fig: 7 number of transactions and processing time in ms.
Figure 7 shows that the graph between the processing time in mille second and the number of
incoming transaction. The new MFI-TransSW takes less time than the MFI-TransSW.
Experiment 2
In the experiment 2 we compare our algorithm on the basis of the memory (MB) and the number
of incoming sliding window in the generating frequent item set phase. This experiment shows that
purposed algorithm take little bit less memory then the existing one. We have performed my
algorithm in Dot Net framework. This algorithm increase the processing time as well as take less
time for generating
Fig: 8 incoming sliding window and memory uses infrequent item set generation phase.
5. CONCLUDING REMARKS
As we know that find out the maximum frequent item set of online transaction data stream is very
important concept .In this paper proposes a method for finding recently frequent item sets over a
data stream based on MFITSW mining frequent item set transaction sensitive sliding window
algorithm. In order to support various needs of data stream analysis, experiment shows that the
proposed not only attained the highly accuracy mining result but also run significant faster and
consume less memory then the existing algorithms for mining frequent item sets. In this paper the
interesting recent range of a data stream is defined by the size of a window. The proposed method
can be employed to monitor the recent change of embedded knowledge in a data stream. The
proposed algorithm gave a guarantee of the output quality and also a bound on the memory usage.
0
10
20
30
2000 4000 6000 8000
MFI-
TransSW
new MFI-
TransSW
30
35
40
45
New MFI-
Trans
MFI-
TransSW
9. International Journal of Information Technology Convergence and Services (IJITCS) Vol.3, No.2, April 2013
9
REFERENCES
QuestDataMiningSyntheticDataGenerationCode.Available:http://www.almaden.ibm.com/cs/projects/iis/hd
b/Projects/datamining/datasets/syndata.html:-.
[1] B. Babcock, S. Babu, M. Datar, R. Motwani, and J.Widom. Models and Issues in Data Stream
Systems. In Proc. of the 2002 ACM Symposium on Principles of Database Systems (PODS 2002),
ACM Press, 2002.
[2] H.-F. Li, S.-Y. Lee/Expert Systems with Applications 36 (2009) 1466–1477.
[3] Han, J., Pei, J., Yin, Y., & Mao, R. (2004). Mining frequent patterns without candidate generation: A
frequent-pattern tree approach. Data Mining and Knowledge Discovery, 8(1), 53–87.
[4] Roberto Bayardo. Efficiently Mining Long Patterns from Databases. In ACM SIGMOD Conference,
1998.
[5] J. Chang and W. Lee. Finding Recent Frequent Item sets adaptively over Online Data Streams. In
Proc. of the 9th ACM SIGKDD International Conference & Data Mining (KDD-2003), 2003.
[6] V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining Data Streams under Block Evolution. SIGKDD
Exploration, 3(2):1-10, Jan. 2002.
[7] C. Giannella, J. Han, J. Pei, X. Yan and P. S. Yu. Mining Frequent Patterns in Data Streams at
Multiple Time Granularities. In Proc. of the NSF Workshop on Next Generation Data Mining, 2002.
[8] G. S. Manku and R. Motwani. Approximate Frequency Counts Over Data Streams. In Proc. of the
28th VLDB conference, 2002.
[9] W.G. Teng, M.-S. Chen and P. S. Yu. A Regression Based Temporal Pattern Mining Scheme for Data
Streams.
[10] International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.2, No.6,
November 2012.
[11] J. Han, H. Cheng, D. Xin, & X. Yan. (2007) “Frequent pattern mining: current status and future
directions”, Data Mining and Knowledge Discovery, vol. 15(1), pp. 55–86.
[12] Hua-Fu Li, Chin-Chuan Ho, Man-Kwan Shan, and Suh-Yin Lee, (October 8-11, 2006) ” Efficient
Maintenance and Mining of Frequent Item sets over Online Data Streams with a Sliding Window”,
IEEE international Conference on Systems, Man, and Cybernetics Taipei, Taiwan.
[13] M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy,(2005) "Resource aware Mining of data streams”,
Journal of universal computer science,vol II ,pp 1440-1453.
[14] K.FJea,C ,W Li,T P Chang, ”An efficient approximate approach to mining frequent item sets over
high speed transactional data streams “ ,Eighth International conference on intelligent system design
and applications, DOI 10.1109/ISDA2008.74 .
[15] Jia Ren, Ke LI , (July 2008) ” Online Data Stream Mining Of Recent Frequent Item sets Based On
Sliding Window Model “,Proceedings of the Seventh International Conference on Machine Learning
and Cybernetics, Kunming.
[16] J.H. Chang and W.S. Lee, (2003).Finding recent frequent item sets adaptively over online data
streams. In Proc. of the 9th ACM SIGKDD, pp. 487-492.
[17] D Lee, W Lee. (2005). “Finding Maximal Frequent Item sets over Online Data Streams Adaptively”,
Fifth Intl. Conference on Data Mining.
[18] Mozafari, H. Thakkar, and C. Zaniolo. Verifying and mining frequent patterns from large windows
over data streams. In Proc. Int. Conf. on Data Engineering, pages 179–188, Los Alamitos, CA, 2008.
IEEE Computer Society.
[19] Caixia Meng, (2009)”An efficient algorithm for mining frequent patterns over high speed data
streams”, World congress on software engineering.
[20] H.-F. Li, S.-Y. Lee, M.-K. Shan, An efficient algorithm for mining frequent item sets over the entire
history of data streams, in: Proc. International workshop on Knowledge Discovery in Data Streams,
2004.
[21] C.K.-S. Leung, Q.I. Khan, and DSTree: a tree structure for the mining of frequent sets from data
streams, in: Proc. ICDM, 2006, pp. 928–932. [23] C.K.-S. Leung, Q.I. Khan, Efficient mining of
constrained frequent patterns from streams, in: Proc. 10th International Database Engineering and
Applications Symposium, 2006.
[22] H.-F. Li, S.-Y. Lee, Mining frequent item sets over data streams using efficient window sliding
techniques, Expert Systems with Applications 36 (2009) 1466–1477.
10. International Journal of Information Technology Convergence and Services (IJITCS) Vol.3, No.2, April 2013
10
[23] G.S. Manku, R. Motwani, Approximate frequency counts over data streams, in: Proc. VLDB, 2002,
pp. 346–357.
[24] B. Mozafari, H. Thakkar, C. Zaniolo, Verifying and mining frequent patterns from large windows
over data streams, in: Proc. ICDE, 2008 pp. 179–188.
[25] J. Li, D. Maier, K. Tuftel, V. Papadimos, P.A. Tucker, No pane, no gain: efficient evaluation of
sliding-window aggregates over data streams, SIGM ODRecord 34 (1) (2005) 39–44.
[26] C.-H. Lin, D.-Y. Chiu, Y.-H. Wu, A.L.P. Chen, Mining Frequent item sets from data streams with a
time-sensitive sliding window, in: Proc. SIAM International Conference on Data Mining, 2005.
[27] J.X. Yu, Z. Chong, H. Lu, Z. Zhang, A. Zhou, a False negative approach to mining frequent item sets
from high speed transactional data streams, Information Sciences 176 (14) (2006) 1986–2015.
[28] J.X. Yu, Z. Chong, H.Lu, A. Zhou, False Positive or false negative: mining frequent item sets from
high speed transactional data streams, in: Proc. VLDB, 2004, pp. 204–215.
[29] S.K. Tanbeer, C.F. Ahmed, B.-S. Jeong, Y.-K. Lee, CP-tree: A tree structure for single-pass frequent
pattern mining, in: T. Washio Et al. (Eds.), Proc. PAKDD, 2008, pp.1022–1027.
[30] Y.J. Tsay, J.Y. Chiang, An efficient cluster and decomposition algorithm for mining association rules,
Information Sciences 160(1–4) (2004) 161–171.
.
AUTHORS
Neeraj received B.E (Information Technology) from DAV College of Engineering and
Technology from Kanina (Haryana), INDIA. During 2006-2010 & pursuing M-tech
(Software Engineering) from ITM University Gurgaon (INDIA) during 2011-2013.