SlideShare a Scribd company logo
1 of 37
Download to read offline
A
Project Report On
MARKET ANALYSIS AND SALES DEVELOPMENT
Submitted By
SUNNY BARKE UIN 091P041
AADIL CHOUDHARY UIN 112P014
ADEEL ANSARI UIN 112P012
SAYED MEHDI ABBAS UIN 112P002
Under the guidance of
Prof. DINESH DEORE
Submitted as a partial fulfillment of
Bachelor of Engineering
B.E. (Semester VIII), COMPUTER
[2013 - 2014]
from
Rizvi College of Engineering
New Rizvi Educational Complex, Off-Carter Road,
Bandra(w), Mumbai - 400050
Affiliated to
University of Mumbai
CERTIFICATE
This is certify that the project report entitled
“MARKET ANALYSIS AND SALES DEVELOPMENT”
Submitted By
SUNNY BARKE
AADIL CHOUDHARY
ADEEL ANSARI
SAYED MEHDI ABBAS
of Rizvi College of Engineering, Computer has been approved in partial fulfillment of requirement for
the degree of Bachelor of Engineering.
Prof. DINESH DEORE Prof. ———————
Internal Guide External Guide
Prof. DINESH DEORE Dr. Varsha Shah
Head of Department Principal
Prof. ———————– Prof. ————————
Internal Examiner External Examiner
Date:
Acknowledgement
Put your acknowledgement here. Refer below for a sample.
I am profoundly grateful to ——(Prof. DINESH DEORE )——- for his expert guidance and con-
tinuous encouragement throughout to see that this project rights its target since its commencement to its
completion.
I would like to express deepest appreciation towards Dr. Varsha Shah, Principal RCOE, Mumbai and
——(Prof. DINESH DEORE )——-, HoD ——-(COMPUTER)—— whose invaluable guidance sup-
ported me in completing this project.
At last I must express my sincere heartfelt gratitude to all the staff members of ——-(COMPUTER)—
— who helped me directly or indirectly during this course of work.
SUNNY BARKE
AADIL CHOUDHARY
ADEEL ANSARI
SAYED MEHDI ABBAS
ABSTRACT
The mentioned system is designed to find the most frequent combinations of items. It is based on
developing an efficient algorithm that outperforms the best available frequent pattern algorithms on a
number of typical data sets. This will help in marketing and sales. The technique can be used to uncover
interesting cross-sells and related products. Three different algorithms from association mining have
been implanted and then best combination method is utilized to find more interesting results. The analyst
then can perform the data mining and extraction and finally conclude the result and make appropriate
decision.
With the explosive growth of information sources available on the World Wide Web, it has become
increasingly necessary for users to utilize automated tools to find the desired information resources, and
to track and analyze their usage patterns. Association rule mining is an active data mining.research
area. However, most ARM algorithms cater to a centralized environment. In contrast to previous ARM
algorithms, Optimized Distributed Association Rule Mining (ODARM) is a distributed algorithm for
geographically spread data sets that aimed to reduces operational/ communication costs. Recently, as
the need to mine patterns across distributed databases has grown, Distributed Association Rule Mining
(DARM) algorithms have been developed. These algorithms assume that the databases are either hori-
zontally or vertically distributed. In the special case of databases populated from information extracted
from textual data, existing D-ARM algorithms cannot discover rules based on higher-order associations
between items in distributed textual documents that are neither vertically nor horizontally distributed,
but rather a hybrid of the two. Hence, this paper proposes a Distributed Count Association Rule Mining
Algorithm(DCARM), which is experimented on real time datasets obtained from UCI machine learning
repository.
We are given a large database of customer transactions.Each transaction consists of items purchased
by a customer in a visit. We present an efficient algorithm that generates all signicant association rules
between items in the database. The algorithm incorporates buer management and novel estimation and
pruning techniques. We also present results of applying this algorithm to sales data obtained from a
large retailing company, which shows the effectiveness of the algorithm.
Keywords : Association rule mining, Optimized Distributed Association Rule Mining (ODARM),
Distributed Count Association Rule Mining Algorithm(DCARM)
Contents
1 Introduction 1
2 Literature Survey 3
2.1 PROBLEM STATEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 EXISTING SYSTEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.1 Classification: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.2 Clustering: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 PROPOSED SYSTEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 SYSTEM REQUIRMENT SPECIFICATION . . . . . . . . . . . . . . . . . . . . . . . 5
2.4.1 ENVIRONMENTAL SPECIFICATION . . . . . . . . . . . . . . . . . . . . . . 5
3 TECHNOLOGIES 6
3.1 SOFTWARE ENVIROMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 SYSTEM DESIGN 8
4.1 SOFTWARE DESIGN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1.1 Logical Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1.2 Input Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1.3 Output Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.4 Data Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 FUNDAMENTAL DESIGN CONCEPTS . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.1 Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.2 Information Hiding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.3 Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.4 Concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.5 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 DATA FLOW DIAGRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 IMPLEMENTATION 14
5.1 ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6 SYSTEM TESTING 21
6.1 Types of Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.1.1 Unit testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.1.2 Integration testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.1.3 Functional test: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.1.4 System Test: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.1.5 White Box Testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.1.6 Black Box Testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.1.7 Unit Testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.1.8 Integration Testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7 SYSTEM STUDY 24
7.1 FEASIBILITY STUDY: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.2 ECONOMICAL FEASIBILITY: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.3 TECHNICAL FEASIBILITY: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.4 SOCIAL FEASIBILITY: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
8 PLAN OF WORK & PROJECT TIMELINE 25
9 Conclusion and Future Scope 28
9.1 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
9.2 FUTURE ENHANCEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
References 29
APPENDICES 29
A Project Hosting 30
List of Figures
1.1 Entity Relationship Diagram of Market-Basket Analysis . . . . . . . . . . . . . . . . . 2
4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 KDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.1 Frequency Itemset Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3 Data Flow Diagram of Admins Function . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4 Sequence Diagram of Manager,GUI & Application . . . . . . . . . . . . . . . . . . . . 20
8.1 Project Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
8.2 Gantt Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
8.3 Planned Gantt Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
8.4 Pert Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Chapter 1 Introduction
Chapter 1
Introduction
Data mining, the extraction of hidden predictive information from large databases, is a powerful new
technology with great potential to help companies focus on the most important information in their
data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make
proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining
move beyond the analyses of past events provided by retrospective tools typical of decision support
systems. Data mining tools can answer business questions that traditionally were too time consuming to
resolve. They scour databases for hidden patterns, finding predictive information that experts may miss
because it lies outside their expectations.
Most companies already collect and refine massive quantities of data. Data mining techniques can
be implemented rapidly on existing software and hardware platforms to enhance the value of existing
information resources, and can be integrated with new products and systems as they are brought on-line.
When implemented on high performance client/server or parallel processing computers, data mining
tools can analyze massive databases to deliver answers to questions such as, ”Which clients are most
likely to respond to my next promotional mailing, and why?”
Data mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery
and Data Mining, is the process of automatically searching large volumes of data for patterns using tools
such as classification, association rule mining, clustering, etc.. Data mining is a complex topic and has
links with multiple core fields such as computer science and adds value to rich seminal computational
techniques from statistics, information retrieval, machine learning and pattern recognition.
Data mining techniques are the result of a long process of research and product development. This
evolution began when business data was first stored on computers, continued with improvements in data
access, and more recently, generated technologies that allow users to navigate through their data in real
time. Data mining takes this evolutionary process beyond retrospective data access and navigation to
prospective and proactive information delivery. Data mining is ready for application in the business
community because it is supported by three technologies that are now sufficiently mature:
• Massive data collection
• Powerful multiprocessor computers
• Data mining algorithms
Rizvi College of Engineering, Bandra, Mumbai. 1
Chapter 1 Introduction
Commercial databases are growing at unprecedented rates. A recent META Group survey of data
warehouse projects found that 19 percent of respondents are beyond the 50 gigabyte level, while 59
percent expect to be there by second quarter of 1996.1 In some industries, such as retail, these numbers
can be much larger. The accompanying need for improved computational engines can now be met
in a cost-effective manner with parallel multiprocessor computer technology. Data mining algorithms
embody techniques that have existed for at least 10 years, but have only recently been implemented as
mature, reliable, understandable tools that consistently outperform older statistical methods.
With the explosive growth of information sources available on the World Wide Web, it has become
increasingly necessary for users to utilize automated tools in find the desired information resources, and
to track and analyze their usage patterns. These factors give rise to the necessity of creating serverside
and clientside intelligent systems that can effectively mine for knowledge. Web miningcan be broadly
defined as the discovery and analysis of useful information from the World Wide Web. This describes
the automatic search of information resources available online, i.e. Web content mining, and the discov-
ery of user access patterns from Web servers, i.e., Web usage mining.
Figure 1.1: Entity Relationship Diagram of Market-Basket Analysis
Rizvi College of Engineering, Bandra, Mumbai. 2
Chapter 2 Literature Survey
Chapter 2
Literature Survey
2.1 PROBLEM STATEMENT
To develop an efficient algorithm to find the desired information resources and their usage pattern and
also to develop a distributed algorithm for geographical data sets that reduces communication cost and
communication overhead.
Purpose
It has become increasingly necessary for users to utilize automated tools in find the desired informa-
tion resources, and to track and analyze their usage patterns.Association rule mining is an active data
mining research area. However, most ARM algorithms cater to a centralized environment. Distributed
Association Rule Mining (D-ARM) algorithms have been developed. These algorithms, however, as-
sume that the databases are either horizontally or vertically distributed. In the special case of databases
populated from information extracted from textual data, existing D-ARM algorithms cannot discover
rules based on higher-order associations between items in distributed textual documents that are neither
vertically nor horizontally distributed, but rather a hybrid of the two.
2.2 EXISTING SYSTEM
The Data mining Algorithms can be categorized into the following :
• Association Algorithm
• Classification
• Clustering Algorithm
2.2.1 Classification:
The process of dividing a dataset into mutually exclusive groups such that the members of each group
are as ”close” as possible to one another, and different groups are as ”far” as possible from one another
where distance is measured with respect to specific variable(s) you are trying to predict for example, a
typical classification problem is to divide a database of companies into groups that are as homogeneous
as possible with respect to a creditworthiness variable with values ”Good” and ”Bad.”
2.2.2 Clustering:
The process of dividing a dataset into mutually exclusive groups such that the members of each group
are as ”close” as possible to one another, and different groups are as ”far” as possible from one another,
Rizvi College of Engineering, Bandra, Mumbai. 3
Chapter 2 Literature Survey
where distance is measured with respect to all available variables given databases of sufficient size and
quality, data mining technology can generate new business opportunities by providing these capabilities
• Automated prediction of trends and behaviors. Data mining automates the process of finding
predictive information in large databases. Questions that traditionally required extensive hands-on
analysis can now be answered directly from the data quickly. A typical example of a predictive
problem is targeted marketing. Data mining uses data on past promotional mailings to identify the
targets most likely to maximize return on investment in future mailings. Other predictive problems
include forecasting bankruptcy and other forms of default, and identifying segments of a population
likely to respond similarly to given events.
• Automated discovery of previously unknown patterns. Data mining tools sweep through databases
and identify previously hidden patterns in one step. DARM discovers rules from various geograph-
ically distributed data sets. However, the network connection between those data sets isn’t as fast
as in a parallel environment, so distributed mining usually aims to minimize communication costs.
2.3 PROPOSED SYSTEM
• Unlike other algorithms, ODAM offers better performance by minimizing candidate itemset gen-
eration costs. It achieves this by focusing on two major DARM issues communication and syn-
chronization. Communication is one of the most important DARM objectives. DARM algorithms
will perform better if we can reduce communication (for example, message exchange size) costs.
Synchronization forces
• Each participating site to wait a certain period until globally frequent itemset generation completes.
Each site will wait longer if computing support counts takes more time. Hence, we reduce the
computation time of candidate itemsets’ support counts.
• To reduce communication costs, we highlight several message optimization techniques. ARM al-
gorithms and on the message exchange method, we can divide the message optimization techniques
into two methods direct and indirect support counts exchange.
• Each method has different aims, expectations, advantages, and disadvantages. For example, the first
method exchanges each candidate itemset’s support count to generate globally frequent itemsets of
that pass (CD and FDM are examples of this approach).
Rizvi College of Engineering, Bandra, Mumbai. 4
Chapter 2 Literature Survey
2.4 SYSTEM REQUIRMENT SPECIFICATION
2.4.1 ENVIRONMENTAL SPECIFICATION
The environmental specification specifies the hardware and software requirements for carrying out this
project. The following are the hardware and the software requirements.
Hardware:-
• 1 GB RAM.
• 320 GB HDD.
• Intel 2.4 GHz Processor core2duo
Software:-
• Windows XP Service Pack 2 / Windows 7
• Visual Studio 2008
• MS SQL Server 2005
• Windows Operating System
Rizvi College of Engineering, Bandra, Mumbai. 5
Chapter 3 TECHNOLOGIES
Chapter 3
TECHNOLOGIES
3.1 SOFTWARE ENVIROMENT
ASP.NET
ASP.NET is more than the next version of Active Server Pages (ASP); it is a unified Web development
platform that provides the services necessary for developers to build enterprise-class Web applications.
While ASP.NET is largely syntax-compatible with ASP, it also provides a new programming model and
infrastructure that enables a powerful new class of applications. You can migrate your existing ASP
applications by incrementally adding ASP.NET functionality to them. ASP.NET is a compiled .NET
Framework -based environment. You can author applications in any .NET Framework compatible lan-
guage, including Visual Basic and Visual Csharp. Additionally, the entire .NET Framework platform is
available to any ASP.NET application. Developers can easily access the benefits of the .NET Frame-
work, which include a fully managed, protected, and feature-rich application execution environment,
simplified development and deployment, and seamless integration with a wide variety of languages.
VB.NET
Visual Basic is a programming language that is designed especially for windows programming. It
will explain most of the tools available for implementing GUI based programs. After introducing the
basic facilities and tools provided by Visual Basic, we apply our knowledge to implementing a small
VB program. Our program will implement a visual interface for a commonly know stack abstract data
type.
VB.NET is still the only language in VS.NET that includes background compilation, which means
that it can flag errors immediately, while you type. VB.NET is the only .NET language that supports
late binding. In the VS.NET IDE, VB.NET provides a dropdown list at the top of the code window with
all the objects and events; the IDE does not provide this functionality for any other language. VB.NET
is also unique for providing default values for optional parameters, and for having a collection of the
controls available to the developer.
Advantages of VB.NET:
• Build Robust Windows-based Applications :
With new Windows Forms, developers using Visual Basic.Net can build Windows-based appli-
cations that leverage the rich user interface features available in the Windows operating system.
All the rapid application development (RAD) tools that developers have come to expect from Mi-
crosoft are found in Visual Basic .NET, including drag-and-drop design and code behind forms.
In addition, new features such as automatic control resizing eliminate the need for complex resize
code.
Rizvi College of Engineering, Bandra, Mumbai. 6
Chapter 3 TECHNOLOGIES
• Resolve Deployment and Versioning Issues Seamlessly:- Visual Basic .NET delivers the answer
to all of your application setup and maintenance problems. With Visual Basic .NET, issues with
Component Object Model (COM) registration and DLL overwrites are relics of the past. Side-by-
side versioning prevents the overwriting and corruption of existing components and applications.
• Microsoft SQL Server 2005 Business today demands a different kind of data management solution.
Performance scalability, and reliability are essential, but businesses now expect more from their key
IT investment. SQL Server 2005 exceeds dependability requirements and provides innovative capa-
bilities that increase employee effectiveness, integrate heterogeneous IT ecosystems,and maximize
capital and operating budgets. SQL Server 2005 provides the enterprise data management plat-
form your organization needs to adapt quickly in a fast changing environment. Benchmarked for
scalability, speed, and performance, SQL Server 2005 is a fully enterprise-class database product,
providing core support for Extensible Markup Language (XML) and Internet queries.
• Easy-to-use Business Intelligence(BI) Tools Through rich data analysis and data mining capabil-
ities that integrate with familiar applications such as Microsoft Office, SQL Server 2005 enables
you to provide all of your employees with critical, timely business information tailored to their
specific information needs. Every copy of SQL Server 2005 ships with a suite of BI services.
• Self-Tuning and Management Capabilities Revolutionary self-tuning and dynamic self-configuring
features optimize database performance, while management tools automate standard activities.
Graphical tools and performance, wizards simplify setup, database design, and performance moni-
toring, allowing database administrators to focus on meeting strategic business needs.
• Data Management Application and Services Unlike its competitors, SQL Server 2005 provides a
powerful and comprehensive data management platform. Every software license includes extensive
management and development tools, a powerful extraction, transformation, and loading (ETL)
tool, business intelligence and analysis services such as Notification Service. The result is the
best overall business value available. Enterprise Edition includes the complete set of SQL Server
data management and analysis features are and is uniquely characterized by several features that
makes it the most scalable and available edition of SQL Server 2005 .It scales to the performance
levels required to support the largest Web sites, Enterprise Online Transaction Processing (OLTP)
system and Data Warehousing systems. Its support for failover clustering also makes it ideal for
any mission critical line-of-business application.
Rizvi College of Engineering, Bandra, Mumbai. 7
Chapter 4 SYSTEM DESIGN
Chapter 4
SYSTEM DESIGN
4.1 SOFTWARE DESIGN
System Design is a solution to how to approach to the creation of a system. This important phase
provides the understanding and procedural details necessary for implementing the system recommended
in the feasibility study. The design step produces a data design, an architectural design and a procedural
design. The data design transforms the information domain model created during analysis in to the data
structures that will be required to implement the software.
The architectural design defines the relationship among major structural components into a procedu-
ral description of the software. Source code generated and testing is conducted to integrate and validate
the software. From a project management point of view, software design is conducted in two steps.
Preliminary design is connected with the transformation of requirements into data and software archi-
tecture. Detailed design focuses on refinements to the architectural representation that leads to detailed
data structure and algorithmic representations of software.
4.1.1 Logical Design
The logical design of an information system is analogous to an engineering blue print or conceptual
view of an automobile. It shows the major features and how they are related to one another. The outputs,
inputs and relationship between the variables are designed in this phase. The objectives of database are
accuracy, integrity and successful recover from failure, privacy and security of data and good overall
performance.
4.1.2 Input Design
The input design is the bridge between users and the information system. It specifies the manner in
which data enters the system for processing. It can ensure the reliability of the system and produce
reports from accurate date or it may result in the output of error information. Online data entry is
available which accepts input from the keyboard and data is displayed on the screen for verification.
While designing the following points have been taken into consideration. Input formats are designed as
per the user requirements.
a) Interaction with the user is maintained in simple dialogues.
b) Appropriate fields are locked thereby allowing only valid inputs.
Rizvi College of Engineering, Bandra, Mumbai. 8
Chapter 4 SYSTEM DESIGN
4.1.3 Output Design
Each and every activity in this work is result-oriented. The most important feature of information sys-
tem for users is the output. Efficient intelligent output design improves the usability and acceptability of
the system and also helps in decision-making. Thus the following points are considered during output
design.
(1) What information to be present ?
(2) Whether to display or print the information ?
(3) How to arrange the information in an acceptable format ?
(4) How the status has to be maintained each and every time ?
(5) How to distribute the outputs to the recipients ?
The system being user friendly in nature is served to fulfill the requirements of the users; suitable
screen designs are made and produced to the user for refinements. The main requirement for the user is
the retrieval information related to a particular user.
4.1.4 Data Design
Data design is the first of the three design activities that are conducted during software engineering. The
impact of data structure on program structure and procedural complexity causes data design to have a
profound influence on software quality. The concepts of information hiding and data abstraction provide
the foundation for an approach to data design.
4.2 FUNDAMENTAL DESIGN CONCEPTS
4.2.1 Abstraction
During the software design, abstraction allows us to organize and channel our process by postponing
structural considerations until the functional characteristics; data streams and data stores have been
established. Data abstraction involves specifying legal operations on objects; representations and ma-
nipulations details are suppressed.
4.2.2 Information Hiding
Information hiding is a fundamental design concept for software. When software system is designed
using the information hiding approach, each module in the system hides the internal details if the pro-
cessing activities and modules communicating only through well-defined interfaces. Information hiding
can be used as the principal design technique for architectural design of a system.
4.2.3 Modularity
Modular systems incorporate collections of abstractions in which each functional abstraction, each data
abstraction and each control abstraction handles a local aspect of the problem being solved. Modular
system consists of well-defined interfaces among the units. Modularity enhances design clarity, which
in turn eases implementation, debugging and maintenance of the software product.
Rizvi College of Engineering, Bandra, Mumbai. 9
Chapter 4 SYSTEM DESIGN
4.2.4 Concurrency
Software systems can be categorized as sequential or concurrent. In a sequential system, of the sys-
tem is activate at any given time. Concurrent systems have implemented process that can be activated
simultaneously if multiple processors are available.
4.2.5 Verification
Design is the bridge between customer requirements and implementations that satisfies the customers
requirements. This is typically done in two steps:
1. Verification that the software requirements definition satisfies the customers needs.
2. Verification that the design satisfies the requirements definition.
4.3 DATA FLOW DIAGRAM
Figure 4.1:
Figure 4.2:
Rizvi College of Engineering, Bandra, Mumbai. 10
Chapter 4 SYSTEM DESIGN
Overview of the System:
Association rule mining finds interesting associations and/or correlation relationships among large
set of data items. Association rules show attributes value conditions that occur frequently together in a
given dataset. A typical and widely-used example of association rule mining is Market Basket Analysis.
For example, data are collected using bar-code scanners in supermarkets. Such market basket databases
consist of a large number of transaction records. Each record lists all items bought by a customer on a
single purchase transaction. Managers would be interested to know if certain groups of items are consis-
tently purchased together. They could use this data for adjusting store layouts (placing items optimally
with respect to each other), for cross-selling, for promotions, for catalog design and to identify customer
segments based on buying patterns.
Association rules provide information of this type in the form of ”if-then” statements. These rules
are computed from the data and, unlike the if-then rules of logic, association rules are probabilistic in
nature.
In addition to the antecedent (the ”if” part) and the consequent (the ”then” part), an association
rule has two numbers that express the degree of uncertainty about the rule. In association analysis the
antecedent and consequent are sets of items (called itemsets) that are disjoint (do not have any items in
common).
The first number is called the support for the rule. The support is simply the number of transactions
that include all items in the antecedent and consequent parts of the rule. (The support is sometimes
expressed as a percentage of the total number of records in the database.)
The other number is known as the confidence of the rule. Confidence is the ratio of the number of
transactions that include all items in the consequent as well as the antecedent (namely, the support) to
the number of transactions that include all items in the antecedent.
For example, if a supermarket database has 100,000 point-of-sale transactions, out of which 2,000
include both items A and B and 800 of these include item C, the association rule ”If A and B are pur-
chased then C is purchased on the same trip” has a support of 800 transactions (alternatively 0.8% =
800/100,000) and a confidence of 40% (=800/2,000). One way to think of support is that it is the proba-
bility that a randomly selected transaction from the database will contain all items in the antecedent and
the consequent, whereas the confidence is the conditional probability that a randomly selected transac-
tion will include all the items in the consequent given that the transaction includes all the items in the
antecedent. An association rule tells us about the association between two or more items. For example:
In 80% of the cases when people buy bread, they also buy milk. This tells us of the association between
bread and milk.
We represent it as - bread =¿ milk — 80%
This should be read as - ”Bread means or implies milk, 80% of the time.” Here 80% is the ”confidence
factor” of the rule.
Association rules can be between more than 2 items. For example -
bread, milk =¿ jam — 60%
bread =¿ milk, jam — 40% Given any rule, we can easily find its confidence. For example, for the
rule
bread, milk =Âż jam
we count the number say n1, of records that contain bread and milk. Of these, how many contain jam as
well? Let this be n2. Then required confidence is n2/n1.
This means that the user has to guess which rule is interesting and ask for its confidence. But our goal was
to ”automatically” find all interesting rules. This is going to be difficult because the database is bound
to be very large. We might have to go through the entire database many times to find all interesting rules.
Rizvi College of Engineering, Bandra, Mumbai. 11
Chapter 4 SYSTEM DESIGN
Brute Force
The common-sense approach to solving this problem is as follows -
Let I = { i1, i2, ..., in } be a set of items, also called as an itemset. The number of times, this itemset
appears in the database is called its ”support”. Note that we can speak about support of an itemset and
confidence of a rule. The other combinations - support of a rule and confidence of an itemset are not
defined.
Now, if we know the support of ‘I’ and all its subsets, we can calculate the confidence of all rules which
involve these items. For example, the confidence of the rule i1, i2, i3 =Âż i4, i5
support of{ i1, i2, i3, i4, i5 }
is
support of { i1, i2, i3 }
So, the easiest approach would be to let ‘I’ contain all items in the supermarket. Then setup a counter
for every subset of ‘I’ to count all its occurances in the database. At the end of one pass of the database,
we would have all those counts and we can find the confidence of all rules. Then select the most ”in-
teresting” rules based on their confidence factors. How easy. The problem with this approach is that,
normally ‘I’ will contain atleast about 100 items. This means that it can have 2100 subsets. We will need
to maintain that many counters. If each counter is a single byte, then about 1020 GB will be required.
Clearly this can’t be done
. Minimum Support
To make the problem tractable, we introduce the concept of minimum support. The user has to specify
this parameter - let us call it minsupport. Then any rule i1, i2, ... , in =Âż j1, j2, ... , jn
needs to be considered, only if the set of all items in this rule which is { i1, i2, ... , in, j1, j2, ... , jn } has
support greater than minsupport.
The idea is that in the rule
bread, milk =Âż jam
if the number of people buying bread, milk and jam together is very small, then this rule is hardly worth
consideration (even if it has high confidence).
Our problem now becomes - Find all rules that have a given minimum confidence and involves itemsets
whose support is more than minsupport. Clearly, once we know the supports of all these itemsets, we
can easily determine the rules and their confidences. Hence we need to concentrate on the problem of
finding all itemsets which have minimum support. We call such itemsets as frequent itemsets.
Some Properties of Frequent Itemsets
The methods used to find frequent itemsets are based on the following properties -
1. Every subset of a frequent itemset is also frequent. Algorithms make use of this property in the
following way - we need not find the count of an itemset, if all its subsets are not frequent. So, we can
first find the counts of some short itemsets in one pass of the database. Then consider longer and longer
itemsets in subsequent passes. When we consider a long itemset, we can make sure that all its subsets are
frequent. This can be done because we already have the counts of all those subsets in previous passes.
2. Let us divide the tuples of the database into partitions, not necessarily of equal size. Then an itemset
can be frequent only if it is frequent in atleast one partition. This property enables us to apply divide
and conquer type algorithms. We can divide the database into partitions and find the frequent itemsets
in each partition. An itemset can be frequent only if it is frequent in atleast one of these partitions. To
see that this is true, consider k partitions of sizes n1, n2,...,nk.
Rizvi College of Engineering, Bandra, Mumbai. 12
Chapter 4 SYSTEM DESIGN
Let minimum support be s.
Consider an itemset which does not have minimum support in any partition. Then its count in each
partition must be less than sn1, sn2,...,snk respectively.
3. Therefore its total count must be less than the sum of all these counts,
which is s( n1 + n2 +...+ nk ).
This is equal to s*(size of database).
Hence the itemset is not frequent in the entire database. This is extended to distributed data base.
Use Case Diagrams
Figure 4.3: KDD
Rizvi College of Engineering, Bandra, Mumbai. 13
Chapter 5 IMPLEMENTATION
Chapter 5
IMPLEMENTATION
5.1 ALGORITHMS
Association Rule Mining Association rule mining finds interesting associations and/or correlation re-
lationships among large set of data items. Association rules shows attribute value conditions that occur
frequently together in a given dataset. A typical and widelyused example of association rule mining is
Market BasketAnalysis.
For example, data are collected using bar-code scanners in supermarkets. Such market basket databases
consist of a large number of transaction records. Each record lists all items bought by a customer on a
single purchase transaction. Association rules provide information of this type in the form of ”if-then”
statements. These rules are computed from the data and, unlike the if-then rules of logic, association
rules are probabilistic in nature. In addition to the antecedent (the ”if” part) and the consequent (the
”then” part), an association rule has two numbers that express the degree of uncertainty about the rule.
• Support
• Confidence
Support: In association analysis the antecedent and consequent are sets of items (called itemsets) that
are disjoint (do not have any items in common). The first number is called the support for the rule. The
support is simply the number of transactions that include all items in the antecedent and consequent
parts of the rule. (The support is sometimes expressed as a percentage of the total number of records in
the database.)
Confidence: The other number is known as the confidence of the rule. Confidence is the ratio of the
number of transactions that include all items in the consequent as well as the antecedent (namely, the
support) to the number of transactions that include all items in the antecedent.
Let us see an example based on these two association rule numbers:
If a supermarket database has 100,000 point-of-sale transactions, out of which 2,000 include both items
A and B and 800 of these include item C, the association rule ”If A and B are purchased then C is
purchased on the same trip” has a support of 800 transactions (alternatively 0.8% = 800/100,000) and
a confidence of 40% (=800/2,000). One way to think of support is that it is the probability that a ran-
domly selected transaction from the database will contain all items in the antecedent and the consequent,
whereas the confidence is the conditional probability that a randomly selected transaction will include
all the items in the consequent given that the transaction includes all the items in the antecedent
. An association rule tells us about the association between two or more items. For example: In 80% of
the cases when people buy bread, they also buy milk. This tells us of the association between bread and
milk.
Rizvi College of Engineering, Bandra, Mumbai. 14
Chapter 5 IMPLEMENTATION
We represent it as -
bread =¿ milk — 80%
This should be read as - ”Bread means or implies milk, 80% of the time.” Here 80% is the ”confidence
factor” of the rule. Association rules can be between more than 2 items. For example -
bread , milk =¿ jam — 60%
bread =¿ milk, jam — 40%
Given any rule, we can easily find its confidence. For example, for the rule
bread, milk =Âż jam
We count the number say n1, of records that contain bread and milk. Of these, how many contain jam
as well? Let this be n2. Then required confidence is n2/n1. This means that the user has to guess which
rule is interesting and ask for its confidence. But our goal was to ”automatically” find all interesting
rules. This is going to be difficult because the database is bound to be very large. We might have to go
through the entire database many times to find all interesting rules.
Apriori Algorithm Apriori is designed to operate on databases containing transactions for example,
collections of items bought by customers or details of a website frequentation. As is common in associ-
ation rule mining, given a set of itemsets (for instance, sets of retail transactions, each listing individual
items purchased), the algorithm attempts to find subsets which are common to at least a minimum num-
ber C of the itemsets. Apriori uses a ”bottom up” approach, where frequent subsets are extended one
item at a time (a step known as candidate generation) and groups of candidates are tested against the
data. Apriori uses breadth-first search and a tree.
Brute Force The common-sense approach to solving this problem is as follows -
Let I = i1, i2, ..., in be a set of items, also called as an itemset. The number of times, this itemset
appears in the database is called its ”support”. Note that we can speak about support of an itemset and
confidence of a rule. The other combinations - support of a rule and confidence of an itemset are not
defined.
Now, if we know the support of ‘I’ and all its subsets, we can calculate the confidence of all rules which
involve these items. For example, the confidence of the rule
i1, i2, i3 =Âż i4, i5
support of { i1, i2, i3, i4, i5 }
is
support of { i1, i2, i3 }
So, the easiest approach would be to let ‘I’ contain all items in the supermarket. Then setup a counter
for every subset of ‘I’ to count all its occurances in the database. At the end of one pass of the database,
we would have all those counts and we can find the confidence of all rules. Then select the most ”in-
teresting” rules based on their confidence factors. How easy. The problem with this approach is that,
normally ‘I’ will contain atleast about 100 items. This means that it can have 2100 subsets. We will need
to maintain that many counters. If each counter is a single byte, then about 1020 GB will be required.
Clearly this can’t be done.
Minimum Support
To make the problem tractable, we introduce the concept of minimum support. The user has to specify
this parameter - let us call it minsupport. Then any rule
Rizvi College of Engineering, Bandra, Mumbai. 15
Chapter 5 IMPLEMENTATION
i1, i2, ... , in =Âż j1, j2, ... , jn
needs to be considered, only if the set of all items in this rule which is { i1, i2, ... , in, j1, j2, ... , jn } has
support greater than minsupport.
The idea is that in the rule
bread, milk =Âż jam
if the number of people buying bread, milk and jam together is very small, then this rule is hardly worth
consideration (even if it has high confidence).
Our problem now becomes - Find all rules that have a given minimum confidence and involves itemsets
whose support is more than minsupport. Clearly, once we know the supports of all these itemsets, we
can easily determine the rules and their confidences. Hence we need to concentrate on the problem of
finding all itemsets which have minimum support. We call such itemsets as frequent itemsets.
Some Properties of Frequent Itemsets
The methods used to find frequent itemsets are based on the following properties -
1. Every subset of a frequent itemset is also frequent. Algorithms make use of this property in the
following way - we need not find the count of an itemset, if all its subsets are not frequent. So, we can
first find the counts of some short itemsets in one pass of the database. Then consider longer and longer
itemsets in subsequent passes. When we consider a long itemset, we can make sure that all its subsets are
frequent. This can be done because we already have the counts of all those subsets in previous passes.
2. Let us divide the tuples of the database into partitions, not necessarily of equal size. Then an itemset
can be frequent only if it is frequent in atleast one partition. This property enables us to apply divide
and conquer type algorithms. We can divide the database into partitions and find the frequent itemsets
in each partition. An itemset can be frequent only if it is frequent in atleast one of these partitions. To
see that this is true, consider k partitions of sizes n1, n2,...,nk.
Let minimum support be s.
Consider an itemset which does not have minimum support in any partition. Then its count in each
partition must be less than sn1, sn2,...,snk respectively.
3. Therefore its total count must be less than the sum of all these counts,
which is s( n1 + n2 +...+ nk ).
This is equal to s*(size of database).
Hence the itemset is not frequent in the entire database. This is extended to data base.
Figure 5.1: Frequency Itemset Generation
Rizvi College of Engineering, Bandra, Mumbai. 16
Chapter 5 IMPLEMENTATION
MODULES:
Network Connections Management
Client-server computing or networking is a distributed application architecture that partitions tasks or
workloads between service providers (servers) and service requesters, called clients. Often clients and
servers operate over a computer network on separate hardware. A server machine is a high-performance
host that is running one or more server programs which share its resources with clients. A client also
shares any of its resources; Clients therefore initiate communication sessions with servers which await
(listen to) incoming requests.
Database Management
The distributed database in our model is a horizontally partitioned database, which means the database
schema of all the partitions are the same. However, distributed database also has an intrinsic data skew-
ness property. The distributions of the item sets in different partitions are not identical, and many items
occur more frequently in some partitions than the others. As a result, many item sets may be large locally
at some sites but not necessarily in the other sites. This skewness property poses a new requirement in
the design of mining algorithm.
ARM Module:
Association rule mining is an active data mining research area and most ARM algorithms cater to a
centralized environment. However, adapting centralized data mining to discover useful patterns in dis-
tributed database isn’t always feasible because merging data sets from different sites incurs huge network
communication costs. Therefore, our research is to develop a distributed algorithm for geographically
distributed data sets that reduces communication costs.
EDMA Module:
In this paper, we developed an efficient association rule mining algorithm in distributed databases called
EDMA. We have found that many candidate sets generated by applying the Apriori-gen function are not
needed in the search of frequent itemsets. In fact, there is a natural and effective method for every site to
generate its own set of candidate sets, which is typically much smaller than the set of all the candidate
sets. Following that, every site only needs to find the frequent itemsets among these candidate sets. The
following lemma is described to illustrate the above observations.
Rizvi College of Engineering, Bandra, Mumbai. 17
Chapter 5 IMPLEMENTATION
Results and Statistics
Then 2-itemsets are formed with the returned globally large 1-itemsets in the particular site and local
count is calculated. The process is repeated till no sets are formed or returned.
Figure 5.2:
Global support threshold:
((50/100)*12)= 6
[The global support count is calculated only by adding the counts of locally large item sets]
Rizvi College of Engineering, Bandra, Mumbai. 18
Chapter 5 IMPLEMENTATION
Bread 9
Peanutbutter 6
Milk 5
Beer 3
Messages: [Considering site 3 as receiver site]
Site 1:
Messages sent = 2
Messages received= 2
Site 2:
Messages sent = 3
Messages received = 1
Site 3:
Messages sent = 3
Messages received = 5
TOTAL SENT TO SITE 3 = 3
TOTAL RECEIVED FROM SITE 3 = 5
TOTAL MESSAGES = 8
Figure 5.3: Data Flow Diagram of Admins Function
Rizvi College of Engineering, Bandra, Mumbai. 19
Chapter 5 IMPLEMENTATION
Figure 5.4: Sequence Diagram of Manager,GUI & Application
Rizvi College of Engineering, Bandra, Mumbai. 20
Chapter 6 SYSTEM TESTING
Chapter 6
SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying to discover every conceivable
fault or weakness in a work product. It provides a way to check the functionality of components, sub
assemblies, assemblies and/or a finished product It is the process of exercising software with the intent
of ensuring that the Software system meets its requirements and user expectations and does not fail in
an unacceptable manner. There are various types of test. Each test type addresses a specific testing
requirement.
6.1 Types of Testing
6.1.1 Unit testing:
Unit testing involves the design of test cases that validate that the internal program logic is functioning
properly, and that program inputs produce valid outputs. All decision branches and internal code flow
should be validated. It is the testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing, that relies on knowledge
of its construction and is invasive. Unit tests perform basic tests at component level and test a specific
business process, application, and/or system configuration. Unit tests ensure that each unique path of
a business process performs accurately to the documented specifications and contains clearly defined
inputs and expected results.
6.1.2 Integration testing:
Integration tests are designed to test integrated software components to determine if they actually run as
one program. Testing is event driven and is more concerned with the basic outcome of screens or fields.
Integration tests demonstrate that although the components were individually satisfaction, as shown by
successfully unit testing, the combination of components is correct and consistent. Integration testing is
specifically aimed at exposing the problems that arise from the combination of components.
6.1.3 Functional test:
Functional tests provide systematic demonstrations that functions tested are available as specified by
the business and technical requirements, system documentation, and user manuals. Functional testing is
centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Rizvi College of Engineering, Bandra, Mumbai. 21
Chapter 6 SYSTEM TESTING
Output : identified classes of application outputs must be exercised.
Systems/Procedure : interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements, key functions, or special
test cases. In addition, systematic coverage pertaining to identify Business process flows; data fields,
predefined processes, and successive processes must be considered for testing. Before functional testing
is complete, additional tests are identified and the effective value of current tests is determined.
6.1.4 System Test:
System testing ensures that the entire integrated software system meets requirements. It tests a con-
figuration to ensure known and predictable results. An example of system testing is the configuration
oriented system integration test. System testing is based on process descriptions and flows, emphasizing
pre-driven process links and integration points.
6.1.5 White Box Testing:
White Box Testing is a testing in which in which the software tester has knowledge of the inner workings,
structure and language of the software, or at least its purpose. It is purpose. It is used to test areas that
cannot be reached from a black box level.
6.1.6 Black Box Testing:
Black Box Testing is testing the software without any knowledge of the inner workings, structure or
language of the module being tested. Black box tests, as most other kinds of tests, must be written from
a definitive source document, such as specification or requirements document, such as specification or
requirements document. It is a testing in which the software under test is treated, as a black box .you
cannot see into it. The test provides inputs and responds to outputs without considering how the software
works.
6.1.7 Unit Testing:
Unit testing is usually conducted as part of a combined code and unit test phase of the software lifecycle,
although it is not uncommon for coding and unit testing to be conducted as two distinct phases.
Test strategy and approach
Field testing will be performed manually and functional tests will be written in detail.
Test objectives:
• All field entries must work properly.
• Pages must be activated from the identified link.
• The entry screen, messages and responses must not be delayed.
Features to be tested:
• Verify that the entries are of the correct format
• No duplicate entries should be allowed
• All links should take the user to the correct page.
Rizvi College of Engineering, Bandra, Mumbai. 22
Chapter 6 SYSTEM TESTING
6.1.8 Integration Testing:
Software integration testing is the incremental integration testing of two or more integrated software
components on a single platform to produce failures caused by interface defects. The task of the inte-
gration test is to check that components or software applications, e.g. components in a software system
or one step up software applications at the company level interact without error.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
Acceptance Testing: User Acceptance Testing is a critical phase of any project and requires signifi-
cant participation by the end user. It also ensures that the system meets the functional requirements.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
Rizvi College of Engineering, Bandra, Mumbai. 23
Chapter 7 SYSTEM STUDY
Chapter 7
SYSTEM STUDY
7.1 FEASIBILITY STUDY:
The feasibility of the project is analyzed in this phase and business proposal is put forth with a very
general plan for the project and some cost estimates. During system analysis the feasibility study of the
proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the
company. Three key considerations involved in the feasibility analysis are
• ECONOMICAL FEASIBILITY
• TECHNICAL FEASIBILITY
• SOCIAL FEASIBILITY
7.2 ECONOMICAL FEASIBILITY:
This study is carried out to check the economic impact that the system will have on the organization. The
amount of fund that the company can pour into the research and development of the system is limited.
The expenditures must be justified. Thus the developed system as well within the budget and this was
achieved because most of the technologies used are freely available.
7.3 TECHNICAL FEASIBILITY:
This study is carried out to check the technical feasibility, that is, the technical requirements of the
system. Any system developed must not have a high demand on the available technical resources. This
will lead to high demands on the available technical resources. This will lead to high demands being
placed on the client. The developed system must have a modest requirement, as only minimal or null
changes are required for implementing this system.
7.4 SOCIAL FEASIBILITY:
The aspect of study is to check the level of acceptance of the system by the user. This includes the
process of training the user to use the system efficiently. The user must not feel threatened by the
system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the
methods that are employed to educate the user about the system and to make him familiar with it. His
level of confidence must be raised as he is the final user of the system.
Rizvi College of Engineering, Bandra, Mumbai. 24
Chapter 8 PLAN OF WORK & PROJECT TIMELINE
Chapter 8
PLAN OF WORK & PROJECT TIMELINE
Figure 8.1: Project Timeline
Rizvi College of Engineering, Bandra, Mumbai. 25
Chapter 8 PLAN OF WORK & PROJECT TIMELINE
Gantt Charts The Gantt Chart shows planned and actual progress for a number of tasks displayed
against a horizontal time scale. It is effective and easy-to-read method of indicating the actual current
status for each of set of tasks compared to planned progress for each activity of the set.
Gantt Charts provide a clear picture of the current state of the project.
Figure 8.2: Gantt Charts
Figure 8.3: Planned Gantt Charts
Rizvi College of Engineering, Bandra, Mumbai. 26
Chapter 8 PLAN OF WORK & PROJECT TIMELINE
Figure 8.4: Pert Charts
Rizvi College of Engineering, Bandra, Mumbai. 27
Chapter 9 Conclusion and Future Scope
Chapter 9
Conclusion and Future Scope
9.1 CONCLUSION
Distributed ARM algorithms must reduce communication costs so that generating global association
rules costs less than combining the participating sites’ datasets into a centralized site. We have developed
an efficient algorithm for mining association rules in distributed databases.
• Reduces the size of message exchanges by novel local and global pruning.
• Reduces the time of scan partition databases to get support counts by using a compressed matrix-
CMatrix, which is very effective in increasing the performance.
• Founds a center site to manage every the message exchanges to obtain all globally frequent item-
sets, only O(n) messages are needed for support count exchange. This is much less than a straight
adaptation of Apriori, which requires O(n2) messages for support count exchange.
9.2 FUTURE ENHANCEMENT
EDMA can be applied to the mining of association rules in a large centralized database by partitioning
the database to the nodes of a distributed system. This is particularly useful if the data set is too large for
sequential mining. In the future, as in our communication network the users concentration on different
alarms is various, which makes how to decide the weight of each alarm to be further considered.
Rizvi College of Engineering, Bandra, Mumbai. 28
References
References
[1] ”A Fast Distributed Algorithm for Mining Association Rules”,Proc. Parallel and Distributed Infor-
mation Systems; D.W. Cheung, et al., IEEE CS Press, 1996,pp. 31-42
[2] ”Introduction: Recent Developments in Parallel and Distributed Data Mining”,J. Distributed and
Parallel Databases; M.J. Zaki and Y. Pin, vol. 11, no. 2, 2002,pp. 123-127
[3] ”Efficient Mining of Association Rules in Distributed Databases”,IEEE Trans. Knowledge and
Data Eng.; D.W. Cheung , et al., vol. 8, no. 6, 1996,pp. 911-922
[4] ”Communication-Efficient Distributed Mining of Association Rules”; A. Schuster and R. Wolff,
Proc. ACM SIGMOD Int’l Conf. Management of Data, ACM Press, 2001,pp. 473-484
[5] ”Mining Association Rules Between Sets of Items in Large Databases”; R. Agrawal, T. Imielinski,
and A. Swami, Proc. ACMSIGMOD Int’l Conf. Management of Data, , May 1993
[6] ”An Optimized Distributed Association Rule Mining Algorithm”; M.Z Ashrafi, Monash University
ODAM, IEEE DISTRIBUTED SYSTEMS ONLINE 1541-4922 2004
[7] ”The Data Warehouse Toolkit, The Complete Guide to Dimensional Modeling”,2nd edn. John Wi-
ley & Sons; Kimball, R., Ross, M., New York (2002)
[8] ”Web for Data Mining: Organizing and Interpreting the Discovered Rules Using the
Web”,SIGKDD Explorations; Ma, Y., Liu, B., Wong, C.K., Vol. 2 (1). ACM Press, (2000) 16-
23.
[9] ”New Algorithm for Fast Discovery of Association Rules”,Technical Report
No. 261; Zaky, M.J., Parthasarathy, S., Ogihara, M., Li, W., University of
Rochester(1997),http://cs.aue.aau.dk/contribution/projects/datamining/papers/t r651.pdf
Rizvi College of Engineering, Bandra, Mumbai. 29
Project Hosting
Appendix A
Project Hosting
The project is hosted at Google Code. The complete source code along with the manual to operate the
project and supplementary files are uploaded.
Project Link : https://code.google.com/p/proquiz
QR CODE:
Rizvi College of Engineering, Bandra, Mumbai. 30

More Related Content

Similar to A Project Report On MARKET ANALYSIS AND SALES DEVELOPMENT Submitted By Under The Guidance Of

Master_Thesis
Master_ThesisMaster_Thesis
Master_ThesisKieran Flesk
 
Report-V1.5_with_comments
Report-V1.5_with_commentsReport-V1.5_with_comments
Report-V1.5_with_commentsMohamed Abdelsalam
 
Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)Priyanka Kapoor
 
Data over dab
Data over dabData over dab
Data over dabDigris AG
 
digiinfo website project report
digiinfo website project reportdigiinfo website project report
digiinfo website project reportABHIJEET KHIRE
 
Automatic Detection of Performance Design and Deployment Antipatterns in Comp...
Automatic Detection of Performance Design and Deployment Antipatterns in Comp...Automatic Detection of Performance Design and Deployment Antipatterns in Comp...
Automatic Detection of Performance Design and Deployment Antipatterns in Comp...Trevor Parsons
 
Software Engineering
Software EngineeringSoftware Engineering
Software EngineeringSoftware Guru
 
Smart attendance system using facial recognition
Smart attendance system using facial recognitionSmart attendance system using facial recognition
Smart attendance system using facial recognitionVigneshLakshmanan8
 
AUGUMENTED REALITY FOR SPACE.pdf
AUGUMENTED REALITY FOR SPACE.pdfAUGUMENTED REALITY FOR SPACE.pdf
AUGUMENTED REALITY FOR SPACE.pdfjeevanbasnyat1
 
QBD_1464843125535 - Copy
QBD_1464843125535 - CopyQBD_1464843125535 - Copy
QBD_1464843125535 - CopyBhavesh Jangale
 
Distributed Mobile Graphics
Distributed Mobile GraphicsDistributed Mobile Graphics
Distributed Mobile GraphicsJiri Danihelka
 
Risk analyticsmaster
Risk analyticsmasterRisk analyticsmaster
Risk analyticsmasterMamadou Bass
 

Similar to A Project Report On MARKET ANALYSIS AND SALES DEVELOPMENT Submitted By Under The Guidance Of (20)

Master_Thesis
Master_ThesisMaster_Thesis
Master_Thesis
 
Report-V1.5_with_comments
Report-V1.5_with_commentsReport-V1.5_with_comments
Report-V1.5_with_comments
 
Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)Report on e-Notice App (An Android Application)
Report on e-Notice App (An Android Application)
 
Data over dab
Data over dabData over dab
Data over dab
 
digiinfo website project report
digiinfo website project reportdigiinfo website project report
digiinfo website project report
 
report
reportreport
report
 
Automatic Detection of Performance Design and Deployment Antipatterns in Comp...
Automatic Detection of Performance Design and Deployment Antipatterns in Comp...Automatic Detection of Performance Design and Deployment Antipatterns in Comp...
Automatic Detection of Performance Design and Deployment Antipatterns in Comp...
 
Software Engineering
Software EngineeringSoftware Engineering
Software Engineering
 
ThesisB
ThesisBThesisB
ThesisB
 
E.M._Poot
E.M._PootE.M._Poot
E.M._Poot
 
Smart attendance system using facial recognition
Smart attendance system using facial recognitionSmart attendance system using facial recognition
Smart attendance system using facial recognition
 
dissertation
dissertationdissertation
dissertation
 
AUGUMENTED REALITY FOR SPACE.pdf
AUGUMENTED REALITY FOR SPACE.pdfAUGUMENTED REALITY FOR SPACE.pdf
AUGUMENTED REALITY FOR SPACE.pdf
 
Report
ReportReport
Report
 
QBD_1464843125535 - Copy
QBD_1464843125535 - CopyQBD_1464843125535 - Copy
QBD_1464843125535 - Copy
 
report
reportreport
report
 
Distributed Mobile Graphics
Distributed Mobile GraphicsDistributed Mobile Graphics
Distributed Mobile Graphics
 
Risk analyticsmaster
Risk analyticsmasterRisk analyticsmaster
Risk analyticsmaster
 
T401
T401T401
T401
 
Fraser_William
Fraser_WilliamFraser_William
Fraser_William
 

More from Gina Rizzo

How To Write An Empathy Essay By Jones Jessica - I
How To Write An Empathy Essay By Jones Jessica - IHow To Write An Empathy Essay By Jones Jessica - I
How To Write An Empathy Essay By Jones Jessica - IGina Rizzo
 
Rocket Outer Space Lined Paper Lined Paper, Writin
Rocket Outer Space Lined Paper Lined Paper, WritinRocket Outer Space Lined Paper Lined Paper, Writin
Rocket Outer Space Lined Paper Lined Paper, WritinGina Rizzo
 
College Research Paper Writing S
College Research Paper Writing SCollege Research Paper Writing S
College Research Paper Writing SGina Rizzo
 
Research Paper Executive Summary How To Write
Research Paper Executive Summary How To WriteResearch Paper Executive Summary How To Write
Research Paper Executive Summary How To WriteGina Rizzo
 
Hypothesis Experiment 4
Hypothesis Experiment 4Hypothesis Experiment 4
Hypothesis Experiment 4Gina Rizzo
 
Descriptive Essay Introduction Sa
Descriptive Essay Introduction SaDescriptive Essay Introduction Sa
Descriptive Essay Introduction SaGina Rizzo
 
Writing A Personal Letter - MakeMyAssignments Blog
Writing A Personal Letter - MakeMyAssignments BlogWriting A Personal Letter - MakeMyAssignments Blog
Writing A Personal Letter - MakeMyAssignments BlogGina Rizzo
 
How To Write Better Essays Pdf - BooksFree
How To Write Better Essays Pdf - BooksFreeHow To Write Better Essays Pdf - BooksFree
How To Write Better Essays Pdf - BooksFreeGina Rizzo
 
97 In Text Citation Poetry Mla
97 In Text Citation Poetry Mla97 In Text Citation Poetry Mla
97 In Text Citation Poetry MlaGina Rizzo
 
Heart Template - 6 Inch - TimS Printables - Free He
Heart Template - 6 Inch - TimS Printables - Free HeHeart Template - 6 Inch - TimS Printables - Free He
Heart Template - 6 Inch - TimS Printables - Free HeGina Rizzo
 
5 Components Of Fitness Worksheet
5 Components Of Fitness Worksheet5 Components Of Fitness Worksheet
5 Components Of Fitness WorksheetGina Rizzo
 
Cursive Alphabet Zaner Bloser AlphabetWorksheetsFree.Com
Cursive Alphabet Zaner Bloser AlphabetWorksheetsFree.ComCursive Alphabet Zaner Bloser AlphabetWorksheetsFree.Com
Cursive Alphabet Zaner Bloser AlphabetWorksheetsFree.ComGina Rizzo
 
How To Start Your Introduction For A Research Paper. How To Write
How To Start Your Introduction For A Research Paper. How To WriteHow To Start Your Introduction For A Research Paper. How To Write
How To Start Your Introduction For A Research Paper. How To WriteGina Rizzo
 
Custom Admission Essay Dnp A Writing Service Wi
Custom Admission Essay Dnp A Writing Service WiCustom Admission Essay Dnp A Writing Service Wi
Custom Admission Essay Dnp A Writing Service WiGina Rizzo
 
Blank Torn White Paper Template Premium Image
Blank Torn White Paper Template Premium ImageBlank Torn White Paper Template Premium Image
Blank Torn White Paper Template Premium ImageGina Rizzo
 
Green, Yellow, Red The Keys To The Perfect Persua
Green, Yellow, Red The Keys To The Perfect PersuaGreen, Yellow, Red The Keys To The Perfect Persua
Green, Yellow, Red The Keys To The Perfect PersuaGina Rizzo
 
FCE Exam Writing Samples - My Hometown Essay Writi
FCE Exam Writing Samples - My Hometown Essay WritiFCE Exam Writing Samples - My Hometown Essay Writi
FCE Exam Writing Samples - My Hometown Essay WritiGina Rizzo
 
Referencing Essay
Referencing EssayReferencing Essay
Referencing EssayGina Rizzo
 
How To Teach Opinion Writing Tips And Resources Artofit
How To Teach Opinion Writing Tips And Resources ArtofitHow To Teach Opinion Writing Tips And Resources Artofit
How To Teach Opinion Writing Tips And Resources ArtofitGina Rizzo
 
Fantasy Space Writing Paper By Miss Cleve Tea
Fantasy Space Writing Paper By Miss Cleve TeaFantasy Space Writing Paper By Miss Cleve Tea
Fantasy Space Writing Paper By Miss Cleve TeaGina Rizzo
 

More from Gina Rizzo (20)

How To Write An Empathy Essay By Jones Jessica - I
How To Write An Empathy Essay By Jones Jessica - IHow To Write An Empathy Essay By Jones Jessica - I
How To Write An Empathy Essay By Jones Jessica - I
 
Rocket Outer Space Lined Paper Lined Paper, Writin
Rocket Outer Space Lined Paper Lined Paper, WritinRocket Outer Space Lined Paper Lined Paper, Writin
Rocket Outer Space Lined Paper Lined Paper, Writin
 
College Research Paper Writing S
College Research Paper Writing SCollege Research Paper Writing S
College Research Paper Writing S
 
Research Paper Executive Summary How To Write
Research Paper Executive Summary How To WriteResearch Paper Executive Summary How To Write
Research Paper Executive Summary How To Write
 
Hypothesis Experiment 4
Hypothesis Experiment 4Hypothesis Experiment 4
Hypothesis Experiment 4
 
Descriptive Essay Introduction Sa
Descriptive Essay Introduction SaDescriptive Essay Introduction Sa
Descriptive Essay Introduction Sa
 
Writing A Personal Letter - MakeMyAssignments Blog
Writing A Personal Letter - MakeMyAssignments BlogWriting A Personal Letter - MakeMyAssignments Blog
Writing A Personal Letter - MakeMyAssignments Blog
 
How To Write Better Essays Pdf - BooksFree
How To Write Better Essays Pdf - BooksFreeHow To Write Better Essays Pdf - BooksFree
How To Write Better Essays Pdf - BooksFree
 
97 In Text Citation Poetry Mla
97 In Text Citation Poetry Mla97 In Text Citation Poetry Mla
97 In Text Citation Poetry Mla
 
Heart Template - 6 Inch - TimS Printables - Free He
Heart Template - 6 Inch - TimS Printables - Free HeHeart Template - 6 Inch - TimS Printables - Free He
Heart Template - 6 Inch - TimS Printables - Free He
 
5 Components Of Fitness Worksheet
5 Components Of Fitness Worksheet5 Components Of Fitness Worksheet
5 Components Of Fitness Worksheet
 
Cursive Alphabet Zaner Bloser AlphabetWorksheetsFree.Com
Cursive Alphabet Zaner Bloser AlphabetWorksheetsFree.ComCursive Alphabet Zaner Bloser AlphabetWorksheetsFree.Com
Cursive Alphabet Zaner Bloser AlphabetWorksheetsFree.Com
 
How To Start Your Introduction For A Research Paper. How To Write
How To Start Your Introduction For A Research Paper. How To WriteHow To Start Your Introduction For A Research Paper. How To Write
How To Start Your Introduction For A Research Paper. How To Write
 
Custom Admission Essay Dnp A Writing Service Wi
Custom Admission Essay Dnp A Writing Service WiCustom Admission Essay Dnp A Writing Service Wi
Custom Admission Essay Dnp A Writing Service Wi
 
Blank Torn White Paper Template Premium Image
Blank Torn White Paper Template Premium ImageBlank Torn White Paper Template Premium Image
Blank Torn White Paper Template Premium Image
 
Green, Yellow, Red The Keys To The Perfect Persua
Green, Yellow, Red The Keys To The Perfect PersuaGreen, Yellow, Red The Keys To The Perfect Persua
Green, Yellow, Red The Keys To The Perfect Persua
 
FCE Exam Writing Samples - My Hometown Essay Writi
FCE Exam Writing Samples - My Hometown Essay WritiFCE Exam Writing Samples - My Hometown Essay Writi
FCE Exam Writing Samples - My Hometown Essay Writi
 
Referencing Essay
Referencing EssayReferencing Essay
Referencing Essay
 
How To Teach Opinion Writing Tips And Resources Artofit
How To Teach Opinion Writing Tips And Resources ArtofitHow To Teach Opinion Writing Tips And Resources Artofit
How To Teach Opinion Writing Tips And Resources Artofit
 
Fantasy Space Writing Paper By Miss Cleve Tea
Fantasy Space Writing Paper By Miss Cleve TeaFantasy Space Writing Paper By Miss Cleve Tea
Fantasy Space Writing Paper By Miss Cleve Tea
 

Recently uploaded

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A BeĂąa
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 

Recently uploaded (20)

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 

A Project Report On MARKET ANALYSIS AND SALES DEVELOPMENT Submitted By Under The Guidance Of

  • 1. A Project Report On MARKET ANALYSIS AND SALES DEVELOPMENT Submitted By SUNNY BARKE UIN 091P041 AADIL CHOUDHARY UIN 112P014 ADEEL ANSARI UIN 112P012 SAYED MEHDI ABBAS UIN 112P002 Under the guidance of Prof. DINESH DEORE Submitted as a partial fulfillment of Bachelor of Engineering B.E. (Semester VIII), COMPUTER [2013 - 2014] from Rizvi College of Engineering New Rizvi Educational Complex, Off-Carter Road, Bandra(w), Mumbai - 400050 Affiliated to University of Mumbai
  • 2. CERTIFICATE This is certify that the project report entitled “MARKET ANALYSIS AND SALES DEVELOPMENT” Submitted By SUNNY BARKE AADIL CHOUDHARY ADEEL ANSARI SAYED MEHDI ABBAS of Rizvi College of Engineering, Computer has been approved in partial fulfillment of requirement for the degree of Bachelor of Engineering. Prof. DINESH DEORE Prof. ——————— Internal Guide External Guide Prof. DINESH DEORE Dr. Varsha Shah Head of Department Principal Prof. ———————– Prof. ———————— Internal Examiner External Examiner Date:
  • 3. Acknowledgement Put your acknowledgement here. Refer below for a sample. I am profoundly grateful to ——(Prof. DINESH DEORE )——- for his expert guidance and con- tinuous encouragement throughout to see that this project rights its target since its commencement to its completion. I would like to express deepest appreciation towards Dr. Varsha Shah, Principal RCOE, Mumbai and ——(Prof. DINESH DEORE )——-, HoD ——-(COMPUTER)—— whose invaluable guidance sup- ported me in completing this project. At last I must express my sincere heartfelt gratitude to all the staff members of ——-(COMPUTER)— — who helped me directly or indirectly during this course of work. SUNNY BARKE AADIL CHOUDHARY ADEEL ANSARI SAYED MEHDI ABBAS
  • 4. ABSTRACT The mentioned system is designed to find the most frequent combinations of items. It is based on developing an efficient algorithm that outperforms the best available frequent pattern algorithms on a number of typical data sets. This will help in marketing and sales. The technique can be used to uncover interesting cross-sells and related products. Three different algorithms from association mining have been implanted and then best combination method is utilized to find more interesting results. The analyst then can perform the data mining and extraction and finally conclude the result and make appropriate decision. With the explosive growth of information sources available on the World Wide Web, it has become increasingly necessary for users to utilize automated tools to find the desired information resources, and to track and analyze their usage patterns. Association rule mining is an active data mining.research area. However, most ARM algorithms cater to a centralized environment. In contrast to previous ARM algorithms, Optimized Distributed Association Rule Mining (ODARM) is a distributed algorithm for geographically spread data sets that aimed to reduces operational/ communication costs. Recently, as the need to mine patterns across distributed databases has grown, Distributed Association Rule Mining (DARM) algorithms have been developed. These algorithms assume that the databases are either hori- zontally or vertically distributed. In the special case of databases populated from information extracted from textual data, existing D-ARM algorithms cannot discover rules based on higher-order associations between items in distributed textual documents that are neither vertically nor horizontally distributed, but rather a hybrid of the two. Hence, this paper proposes a Distributed Count Association Rule Mining Algorithm(DCARM), which is experimented on real time datasets obtained from UCI machine learning repository. We are given a large database of customer transactions.Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all signicant association rules between items in the database. The algorithm incorporates buer management and novel estimation and pruning techniques. We also present results of applying this algorithm to sales data obtained from a large retailing company, which shows the effectiveness of the algorithm. Keywords : Association rule mining, Optimized Distributed Association Rule Mining (ODARM), Distributed Count Association Rule Mining Algorithm(DCARM)
  • 5. Contents 1 Introduction 1 2 Literature Survey 3 2.1 PROBLEM STATEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 EXISTING SYSTEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2.1 Classification: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2.2 Clustering: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 PROPOSED SYSTEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.4 SYSTEM REQUIRMENT SPECIFICATION . . . . . . . . . . . . . . . . . . . . . . . 5 2.4.1 ENVIRONMENTAL SPECIFICATION . . . . . . . . . . . . . . . . . . . . . . 5 3 TECHNOLOGIES 6 3.1 SOFTWARE ENVIROMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4 SYSTEM DESIGN 8 4.1 SOFTWARE DESIGN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.1.1 Logical Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.1.2 Input Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.1.3 Output Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1.4 Data Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 FUNDAMENTAL DESIGN CONCEPTS . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2.1 Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2.2 Information Hiding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2.3 Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2.4 Concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.2.5 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.3 DATA FLOW DIAGRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5 IMPLEMENTATION 14 5.1 ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6 SYSTEM TESTING 21 6.1 Types of Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.1.1 Unit testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.1.2 Integration testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.1.3 Functional test: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.1.4 System Test: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
  • 6. 6.1.5 White Box Testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6.1.6 Black Box Testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6.1.7 Unit Testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6.1.8 Integration Testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 7 SYSTEM STUDY 24 7.1 FEASIBILITY STUDY: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7.2 ECONOMICAL FEASIBILITY: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7.3 TECHNICAL FEASIBILITY: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7.4 SOCIAL FEASIBILITY: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 8 PLAN OF WORK & PROJECT TIMELINE 25 9 Conclusion and Future Scope 28 9.1 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 9.2 FUTURE ENHANCEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 References 29 APPENDICES 29 A Project Hosting 30
  • 7. List of Figures 1.1 Entity Relationship Diagram of Market-Basket Analysis . . . . . . . . . . . . . . . . . 2 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.3 KDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.1 Frequency Itemset Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.3 Data Flow Diagram of Admins Function . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.4 Sequence Diagram of Manager,GUI & Application . . . . . . . . . . . . . . . . . . . . 20 8.1 Project Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 8.2 Gantt Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 8.3 Planned Gantt Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 8.4 Pert Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
  • 8. Chapter 1 Introduction Chapter 1 Introduction Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Most companies already collect and refine massive quantities of data. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line. When implemented on high performance client/server or parallel processing computers, data mining tools can analyze massive databases to deliver answers to questions such as, ”Which clients are most likely to respond to my next promotional mailing, and why?” Data mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns using tools such as classification, association rule mining, clustering, etc.. Data mining is a complex topic and has links with multiple core fields such as computer science and adds value to rich seminal computational techniques from statistics, information retrieval, machine learning and pattern recognition. Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery. Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature: • Massive data collection • Powerful multiprocessor computers • Data mining algorithms Rizvi College of Engineering, Bandra, Mumbai. 1
  • 9. Chapter 1 Introduction Commercial databases are growing at unprecedented rates. A recent META Group survey of data warehouse projects found that 19 percent of respondents are beyond the 50 gigabyte level, while 59 percent expect to be there by second quarter of 1996.1 In some industries, such as retail, these numbers can be much larger. The accompanying need for improved computational engines can now be met in a cost-effective manner with parallel multiprocessor computer technology. Data mining algorithms embody techniques that have existed for at least 10 years, but have only recently been implemented as mature, reliable, understandable tools that consistently outperform older statistical methods. With the explosive growth of information sources available on the World Wide Web, it has become increasingly necessary for users to utilize automated tools in find the desired information resources, and to track and analyze their usage patterns. These factors give rise to the necessity of creating serverside and clientside intelligent systems that can effectively mine for knowledge. Web miningcan be broadly defined as the discovery and analysis of useful information from the World Wide Web. This describes the automatic search of information resources available online, i.e. Web content mining, and the discov- ery of user access patterns from Web servers, i.e., Web usage mining. Figure 1.1: Entity Relationship Diagram of Market-Basket Analysis Rizvi College of Engineering, Bandra, Mumbai. 2
  • 10. Chapter 2 Literature Survey Chapter 2 Literature Survey 2.1 PROBLEM STATEMENT To develop an efficient algorithm to find the desired information resources and their usage pattern and also to develop a distributed algorithm for geographical data sets that reduces communication cost and communication overhead. Purpose It has become increasingly necessary for users to utilize automated tools in find the desired informa- tion resources, and to track and analyze their usage patterns.Association rule mining is an active data mining research area. However, most ARM algorithms cater to a centralized environment. Distributed Association Rule Mining (D-ARM) algorithms have been developed. These algorithms, however, as- sume that the databases are either horizontally or vertically distributed. In the special case of databases populated from information extracted from textual data, existing D-ARM algorithms cannot discover rules based on higher-order associations between items in distributed textual documents that are neither vertically nor horizontally distributed, but rather a hybrid of the two. 2.2 EXISTING SYSTEM The Data mining Algorithms can be categorized into the following : • Association Algorithm • Classification • Clustering Algorithm 2.2.1 Classification: The process of dividing a dataset into mutually exclusive groups such that the members of each group are as ”close” as possible to one another, and different groups are as ”far” as possible from one another where distance is measured with respect to specific variable(s) you are trying to predict for example, a typical classification problem is to divide a database of companies into groups that are as homogeneous as possible with respect to a creditworthiness variable with values ”Good” and ”Bad.” 2.2.2 Clustering: The process of dividing a dataset into mutually exclusive groups such that the members of each group are as ”close” as possible to one another, and different groups are as ”far” as possible from one another, Rizvi College of Engineering, Bandra, Mumbai. 3
  • 11. Chapter 2 Literature Survey where distance is measured with respect to all available variables given databases of sufficient size and quality, data mining technology can generate new business opportunities by providing these capabilities • Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in large databases. Questions that traditionally required extensive hands-on analysis can now be answered directly from the data quickly. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Other predictive problems include forecasting bankruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events. • Automated discovery of previously unknown patterns. Data mining tools sweep through databases and identify previously hidden patterns in one step. DARM discovers rules from various geograph- ically distributed data sets. However, the network connection between those data sets isn’t as fast as in a parallel environment, so distributed mining usually aims to minimize communication costs. 2.3 PROPOSED SYSTEM • Unlike other algorithms, ODAM offers better performance by minimizing candidate itemset gen- eration costs. It achieves this by focusing on two major DARM issues communication and syn- chronization. Communication is one of the most important DARM objectives. DARM algorithms will perform better if we can reduce communication (for example, message exchange size) costs. Synchronization forces • Each participating site to wait a certain period until globally frequent itemset generation completes. Each site will wait longer if computing support counts takes more time. Hence, we reduce the computation time of candidate itemsets’ support counts. • To reduce communication costs, we highlight several message optimization techniques. ARM al- gorithms and on the message exchange method, we can divide the message optimization techniques into two methods direct and indirect support counts exchange. • Each method has different aims, expectations, advantages, and disadvantages. For example, the first method exchanges each candidate itemset’s support count to generate globally frequent itemsets of that pass (CD and FDM are examples of this approach). Rizvi College of Engineering, Bandra, Mumbai. 4
  • 12. Chapter 2 Literature Survey 2.4 SYSTEM REQUIRMENT SPECIFICATION 2.4.1 ENVIRONMENTAL SPECIFICATION The environmental specification specifies the hardware and software requirements for carrying out this project. The following are the hardware and the software requirements. Hardware:- • 1 GB RAM. • 320 GB HDD. • Intel 2.4 GHz Processor core2duo Software:- • Windows XP Service Pack 2 / Windows 7 • Visual Studio 2008 • MS SQL Server 2005 • Windows Operating System Rizvi College of Engineering, Bandra, Mumbai. 5
  • 13. Chapter 3 TECHNOLOGIES Chapter 3 TECHNOLOGIES 3.1 SOFTWARE ENVIROMENT ASP.NET ASP.NET is more than the next version of Active Server Pages (ASP); it is a unified Web development platform that provides the services necessary for developers to build enterprise-class Web applications. While ASP.NET is largely syntax-compatible with ASP, it also provides a new programming model and infrastructure that enables a powerful new class of applications. You can migrate your existing ASP applications by incrementally adding ASP.NET functionality to them. ASP.NET is a compiled .NET Framework -based environment. You can author applications in any .NET Framework compatible lan- guage, including Visual Basic and Visual Csharp. Additionally, the entire .NET Framework platform is available to any ASP.NET application. Developers can easily access the benefits of the .NET Frame- work, which include a fully managed, protected, and feature-rich application execution environment, simplified development and deployment, and seamless integration with a wide variety of languages. VB.NET Visual Basic is a programming language that is designed especially for windows programming. It will explain most of the tools available for implementing GUI based programs. After introducing the basic facilities and tools provided by Visual Basic, we apply our knowledge to implementing a small VB program. Our program will implement a visual interface for a commonly know stack abstract data type. VB.NET is still the only language in VS.NET that includes background compilation, which means that it can flag errors immediately, while you type. VB.NET is the only .NET language that supports late binding. In the VS.NET IDE, VB.NET provides a dropdown list at the top of the code window with all the objects and events; the IDE does not provide this functionality for any other language. VB.NET is also unique for providing default values for optional parameters, and for having a collection of the controls available to the developer. Advantages of VB.NET: • Build Robust Windows-based Applications : With new Windows Forms, developers using Visual Basic.Net can build Windows-based appli- cations that leverage the rich user interface features available in the Windows operating system. All the rapid application development (RAD) tools that developers have come to expect from Mi- crosoft are found in Visual Basic .NET, including drag-and-drop design and code behind forms. In addition, new features such as automatic control resizing eliminate the need for complex resize code. Rizvi College of Engineering, Bandra, Mumbai. 6
  • 14. Chapter 3 TECHNOLOGIES • Resolve Deployment and Versioning Issues Seamlessly:- Visual Basic .NET delivers the answer to all of your application setup and maintenance problems. With Visual Basic .NET, issues with Component Object Model (COM) registration and DLL overwrites are relics of the past. Side-by- side versioning prevents the overwriting and corruption of existing components and applications. • Microsoft SQL Server 2005 Business today demands a different kind of data management solution. Performance scalability, and reliability are essential, but businesses now expect more from their key IT investment. SQL Server 2005 exceeds dependability requirements and provides innovative capa- bilities that increase employee effectiveness, integrate heterogeneous IT ecosystems,and maximize capital and operating budgets. SQL Server 2005 provides the enterprise data management plat- form your organization needs to adapt quickly in a fast changing environment. Benchmarked for scalability, speed, and performance, SQL Server 2005 is a fully enterprise-class database product, providing core support for Extensible Markup Language (XML) and Internet queries. • Easy-to-use Business Intelligence(BI) Tools Through rich data analysis and data mining capabil- ities that integrate with familiar applications such as Microsoft Office, SQL Server 2005 enables you to provide all of your employees with critical, timely business information tailored to their specific information needs. Every copy of SQL Server 2005 ships with a suite of BI services. • Self-Tuning and Management Capabilities Revolutionary self-tuning and dynamic self-configuring features optimize database performance, while management tools automate standard activities. Graphical tools and performance, wizards simplify setup, database design, and performance moni- toring, allowing database administrators to focus on meeting strategic business needs. • Data Management Application and Services Unlike its competitors, SQL Server 2005 provides a powerful and comprehensive data management platform. Every software license includes extensive management and development tools, a powerful extraction, transformation, and loading (ETL) tool, business intelligence and analysis services such as Notification Service. The result is the best overall business value available. Enterprise Edition includes the complete set of SQL Server data management and analysis features are and is uniquely characterized by several features that makes it the most scalable and available edition of SQL Server 2005 .It scales to the performance levels required to support the largest Web sites, Enterprise Online Transaction Processing (OLTP) system and Data Warehousing systems. Its support for failover clustering also makes it ideal for any mission critical line-of-business application. Rizvi College of Engineering, Bandra, Mumbai. 7
  • 15. Chapter 4 SYSTEM DESIGN Chapter 4 SYSTEM DESIGN 4.1 SOFTWARE DESIGN System Design is a solution to how to approach to the creation of a system. This important phase provides the understanding and procedural details necessary for implementing the system recommended in the feasibility study. The design step produces a data design, an architectural design and a procedural design. The data design transforms the information domain model created during analysis in to the data structures that will be required to implement the software. The architectural design defines the relationship among major structural components into a procedu- ral description of the software. Source code generated and testing is conducted to integrate and validate the software. From a project management point of view, software design is conducted in two steps. Preliminary design is connected with the transformation of requirements into data and software archi- tecture. Detailed design focuses on refinements to the architectural representation that leads to detailed data structure and algorithmic representations of software. 4.1.1 Logical Design The logical design of an information system is analogous to an engineering blue print or conceptual view of an automobile. It shows the major features and how they are related to one another. The outputs, inputs and relationship between the variables are designed in this phase. The objectives of database are accuracy, integrity and successful recover from failure, privacy and security of data and good overall performance. 4.1.2 Input Design The input design is the bridge between users and the information system. It specifies the manner in which data enters the system for processing. It can ensure the reliability of the system and produce reports from accurate date or it may result in the output of error information. Online data entry is available which accepts input from the keyboard and data is displayed on the screen for verification. While designing the following points have been taken into consideration. Input formats are designed as per the user requirements. a) Interaction with the user is maintained in simple dialogues. b) Appropriate fields are locked thereby allowing only valid inputs. Rizvi College of Engineering, Bandra, Mumbai. 8
  • 16. Chapter 4 SYSTEM DESIGN 4.1.3 Output Design Each and every activity in this work is result-oriented. The most important feature of information sys- tem for users is the output. Efficient intelligent output design improves the usability and acceptability of the system and also helps in decision-making. Thus the following points are considered during output design. (1) What information to be present ? (2) Whether to display or print the information ? (3) How to arrange the information in an acceptable format ? (4) How the status has to be maintained each and every time ? (5) How to distribute the outputs to the recipients ? The system being user friendly in nature is served to fulfill the requirements of the users; suitable screen designs are made and produced to the user for refinements. The main requirement for the user is the retrieval information related to a particular user. 4.1.4 Data Design Data design is the first of the three design activities that are conducted during software engineering. The impact of data structure on program structure and procedural complexity causes data design to have a profound influence on software quality. The concepts of information hiding and data abstraction provide the foundation for an approach to data design. 4.2 FUNDAMENTAL DESIGN CONCEPTS 4.2.1 Abstraction During the software design, abstraction allows us to organize and channel our process by postponing structural considerations until the functional characteristics; data streams and data stores have been established. Data abstraction involves specifying legal operations on objects; representations and ma- nipulations details are suppressed. 4.2.2 Information Hiding Information hiding is a fundamental design concept for software. When software system is designed using the information hiding approach, each module in the system hides the internal details if the pro- cessing activities and modules communicating only through well-defined interfaces. Information hiding can be used as the principal design technique for architectural design of a system. 4.2.3 Modularity Modular systems incorporate collections of abstractions in which each functional abstraction, each data abstraction and each control abstraction handles a local aspect of the problem being solved. Modular system consists of well-defined interfaces among the units. Modularity enhances design clarity, which in turn eases implementation, debugging and maintenance of the software product. Rizvi College of Engineering, Bandra, Mumbai. 9
  • 17. Chapter 4 SYSTEM DESIGN 4.2.4 Concurrency Software systems can be categorized as sequential or concurrent. In a sequential system, of the sys- tem is activate at any given time. Concurrent systems have implemented process that can be activated simultaneously if multiple processors are available. 4.2.5 Verification Design is the bridge between customer requirements and implementations that satisfies the customers requirements. This is typically done in two steps: 1. Verification that the software requirements definition satisfies the customers needs. 2. Verification that the design satisfies the requirements definition. 4.3 DATA FLOW DIAGRAM Figure 4.1: Figure 4.2: Rizvi College of Engineering, Bandra, Mumbai. 10
  • 18. Chapter 4 SYSTEM DESIGN Overview of the System: Association rule mining finds interesting associations and/or correlation relationships among large set of data items. Association rules show attributes value conditions that occur frequently together in a given dataset. A typical and widely-used example of association rule mining is Market Basket Analysis. For example, data are collected using bar-code scanners in supermarkets. Such market basket databases consist of a large number of transaction records. Each record lists all items bought by a customer on a single purchase transaction. Managers would be interested to know if certain groups of items are consis- tently purchased together. They could use this data for adjusting store layouts (placing items optimally with respect to each other), for cross-selling, for promotions, for catalog design and to identify customer segments based on buying patterns. Association rules provide information of this type in the form of ”if-then” statements. These rules are computed from the data and, unlike the if-then rules of logic, association rules are probabilistic in nature. In addition to the antecedent (the ”if” part) and the consequent (the ”then” part), an association rule has two numbers that express the degree of uncertainty about the rule. In association analysis the antecedent and consequent are sets of items (called itemsets) that are disjoint (do not have any items in common). The first number is called the support for the rule. The support is simply the number of transactions that include all items in the antecedent and consequent parts of the rule. (The support is sometimes expressed as a percentage of the total number of records in the database.) The other number is known as the confidence of the rule. Confidence is the ratio of the number of transactions that include all items in the consequent as well as the antecedent (namely, the support) to the number of transactions that include all items in the antecedent. For example, if a supermarket database has 100,000 point-of-sale transactions, out of which 2,000 include both items A and B and 800 of these include item C, the association rule ”If A and B are pur- chased then C is purchased on the same trip” has a support of 800 transactions (alternatively 0.8% = 800/100,000) and a confidence of 40% (=800/2,000). One way to think of support is that it is the proba- bility that a randomly selected transaction from the database will contain all items in the antecedent and the consequent, whereas the confidence is the conditional probability that a randomly selected transac- tion will include all the items in the consequent given that the transaction includes all the items in the antecedent. An association rule tells us about the association between two or more items. For example: In 80% of the cases when people buy bread, they also buy milk. This tells us of the association between bread and milk. We represent it as - bread =Âż milk — 80% This should be read as - ”Bread means or implies milk, 80% of the time.” Here 80% is the ”confidence factor” of the rule. Association rules can be between more than 2 items. For example - bread, milk =Âż jam — 60% bread =Âż milk, jam — 40% Given any rule, we can easily find its confidence. For example, for the rule bread, milk =Âż jam we count the number say n1, of records that contain bread and milk. Of these, how many contain jam as well? Let this be n2. Then required confidence is n2/n1. This means that the user has to guess which rule is interesting and ask for its confidence. But our goal was to ”automatically” find all interesting rules. This is going to be difficult because the database is bound to be very large. We might have to go through the entire database many times to find all interesting rules. Rizvi College of Engineering, Bandra, Mumbai. 11
  • 19. Chapter 4 SYSTEM DESIGN Brute Force The common-sense approach to solving this problem is as follows - Let I = { i1, i2, ..., in } be a set of items, also called as an itemset. The number of times, this itemset appears in the database is called its ”support”. Note that we can speak about support of an itemset and confidence of a rule. The other combinations - support of a rule and confidence of an itemset are not defined. Now, if we know the support of ‘I’ and all its subsets, we can calculate the confidence of all rules which involve these items. For example, the confidence of the rule i1, i2, i3 =Âż i4, i5 support of{ i1, i2, i3, i4, i5 } is support of { i1, i2, i3 } So, the easiest approach would be to let ‘I’ contain all items in the supermarket. Then setup a counter for every subset of ‘I’ to count all its occurances in the database. At the end of one pass of the database, we would have all those counts and we can find the confidence of all rules. Then select the most ”in- teresting” rules based on their confidence factors. How easy. The problem with this approach is that, normally ‘I’ will contain atleast about 100 items. This means that it can have 2100 subsets. We will need to maintain that many counters. If each counter is a single byte, then about 1020 GB will be required. Clearly this can’t be done . Minimum Support To make the problem tractable, we introduce the concept of minimum support. The user has to specify this parameter - let us call it minsupport. Then any rule i1, i2, ... , in =Âż j1, j2, ... , jn needs to be considered, only if the set of all items in this rule which is { i1, i2, ... , in, j1, j2, ... , jn } has support greater than minsupport. The idea is that in the rule bread, milk =Âż jam if the number of people buying bread, milk and jam together is very small, then this rule is hardly worth consideration (even if it has high confidence). Our problem now becomes - Find all rules that have a given minimum confidence and involves itemsets whose support is more than minsupport. Clearly, once we know the supports of all these itemsets, we can easily determine the rules and their confidences. Hence we need to concentrate on the problem of finding all itemsets which have minimum support. We call such itemsets as frequent itemsets. Some Properties of Frequent Itemsets The methods used to find frequent itemsets are based on the following properties - 1. Every subset of a frequent itemset is also frequent. Algorithms make use of this property in the following way - we need not find the count of an itemset, if all its subsets are not frequent. So, we can first find the counts of some short itemsets in one pass of the database. Then consider longer and longer itemsets in subsequent passes. When we consider a long itemset, we can make sure that all its subsets are frequent. This can be done because we already have the counts of all those subsets in previous passes. 2. Let us divide the tuples of the database into partitions, not necessarily of equal size. Then an itemset can be frequent only if it is frequent in atleast one partition. This property enables us to apply divide and conquer type algorithms. We can divide the database into partitions and find the frequent itemsets in each partition. An itemset can be frequent only if it is frequent in atleast one of these partitions. To see that this is true, consider k partitions of sizes n1, n2,...,nk. Rizvi College of Engineering, Bandra, Mumbai. 12
  • 20. Chapter 4 SYSTEM DESIGN Let minimum support be s. Consider an itemset which does not have minimum support in any partition. Then its count in each partition must be less than sn1, sn2,...,snk respectively. 3. Therefore its total count must be less than the sum of all these counts, which is s( n1 + n2 +...+ nk ). This is equal to s*(size of database). Hence the itemset is not frequent in the entire database. This is extended to distributed data base. Use Case Diagrams Figure 4.3: KDD Rizvi College of Engineering, Bandra, Mumbai. 13
  • 21. Chapter 5 IMPLEMENTATION Chapter 5 IMPLEMENTATION 5.1 ALGORITHMS Association Rule Mining Association rule mining finds interesting associations and/or correlation re- lationships among large set of data items. Association rules shows attribute value conditions that occur frequently together in a given dataset. A typical and widelyused example of association rule mining is Market BasketAnalysis. For example, data are collected using bar-code scanners in supermarkets. Such market basket databases consist of a large number of transaction records. Each record lists all items bought by a customer on a single purchase transaction. Association rules provide information of this type in the form of ”if-then” statements. These rules are computed from the data and, unlike the if-then rules of logic, association rules are probabilistic in nature. In addition to the antecedent (the ”if” part) and the consequent (the ”then” part), an association rule has two numbers that express the degree of uncertainty about the rule. • Support • Confidence Support: In association analysis the antecedent and consequent are sets of items (called itemsets) that are disjoint (do not have any items in common). The first number is called the support for the rule. The support is simply the number of transactions that include all items in the antecedent and consequent parts of the rule. (The support is sometimes expressed as a percentage of the total number of records in the database.) Confidence: The other number is known as the confidence of the rule. Confidence is the ratio of the number of transactions that include all items in the consequent as well as the antecedent (namely, the support) to the number of transactions that include all items in the antecedent. Let us see an example based on these two association rule numbers: If a supermarket database has 100,000 point-of-sale transactions, out of which 2,000 include both items A and B and 800 of these include item C, the association rule ”If A and B are purchased then C is purchased on the same trip” has a support of 800 transactions (alternatively 0.8% = 800/100,000) and a confidence of 40% (=800/2,000). One way to think of support is that it is the probability that a ran- domly selected transaction from the database will contain all items in the antecedent and the consequent, whereas the confidence is the conditional probability that a randomly selected transaction will include all the items in the consequent given that the transaction includes all the items in the antecedent . An association rule tells us about the association between two or more items. For example: In 80% of the cases when people buy bread, they also buy milk. This tells us of the association between bread and milk. Rizvi College of Engineering, Bandra, Mumbai. 14
  • 22. Chapter 5 IMPLEMENTATION We represent it as - bread =Âż milk — 80% This should be read as - ”Bread means or implies milk, 80% of the time.” Here 80% is the ”confidence factor” of the rule. Association rules can be between more than 2 items. For example - bread , milk =Âż jam — 60% bread =Âż milk, jam — 40% Given any rule, we can easily find its confidence. For example, for the rule bread, milk =Âż jam We count the number say n1, of records that contain bread and milk. Of these, how many contain jam as well? Let this be n2. Then required confidence is n2/n1. This means that the user has to guess which rule is interesting and ask for its confidence. But our goal was to ”automatically” find all interesting rules. This is going to be difficult because the database is bound to be very large. We might have to go through the entire database many times to find all interesting rules. Apriori Algorithm Apriori is designed to operate on databases containing transactions for example, collections of items bought by customers or details of a website frequentation. As is common in associ- ation rule mining, given a set of itemsets (for instance, sets of retail transactions, each listing individual items purchased), the algorithm attempts to find subsets which are common to at least a minimum num- ber C of the itemsets. Apriori uses a ”bottom up” approach, where frequent subsets are extended one item at a time (a step known as candidate generation) and groups of candidates are tested against the data. Apriori uses breadth-first search and a tree. Brute Force The common-sense approach to solving this problem is as follows - Let I = i1, i2, ..., in be a set of items, also called as an itemset. The number of times, this itemset appears in the database is called its ”support”. Note that we can speak about support of an itemset and confidence of a rule. The other combinations - support of a rule and confidence of an itemset are not defined. Now, if we know the support of ‘I’ and all its subsets, we can calculate the confidence of all rules which involve these items. For example, the confidence of the rule i1, i2, i3 =Âż i4, i5 support of { i1, i2, i3, i4, i5 } is support of { i1, i2, i3 } So, the easiest approach would be to let ‘I’ contain all items in the supermarket. Then setup a counter for every subset of ‘I’ to count all its occurances in the database. At the end of one pass of the database, we would have all those counts and we can find the confidence of all rules. Then select the most ”in- teresting” rules based on their confidence factors. How easy. The problem with this approach is that, normally ‘I’ will contain atleast about 100 items. This means that it can have 2100 subsets. We will need to maintain that many counters. If each counter is a single byte, then about 1020 GB will be required. Clearly this can’t be done. Minimum Support To make the problem tractable, we introduce the concept of minimum support. The user has to specify this parameter - let us call it minsupport. Then any rule Rizvi College of Engineering, Bandra, Mumbai. 15
  • 23. Chapter 5 IMPLEMENTATION i1, i2, ... , in =Âż j1, j2, ... , jn needs to be considered, only if the set of all items in this rule which is { i1, i2, ... , in, j1, j2, ... , jn } has support greater than minsupport. The idea is that in the rule bread, milk =Âż jam if the number of people buying bread, milk and jam together is very small, then this rule is hardly worth consideration (even if it has high confidence). Our problem now becomes - Find all rules that have a given minimum confidence and involves itemsets whose support is more than minsupport. Clearly, once we know the supports of all these itemsets, we can easily determine the rules and their confidences. Hence we need to concentrate on the problem of finding all itemsets which have minimum support. We call such itemsets as frequent itemsets. Some Properties of Frequent Itemsets The methods used to find frequent itemsets are based on the following properties - 1. Every subset of a frequent itemset is also frequent. Algorithms make use of this property in the following way - we need not find the count of an itemset, if all its subsets are not frequent. So, we can first find the counts of some short itemsets in one pass of the database. Then consider longer and longer itemsets in subsequent passes. When we consider a long itemset, we can make sure that all its subsets are frequent. This can be done because we already have the counts of all those subsets in previous passes. 2. Let us divide the tuples of the database into partitions, not necessarily of equal size. Then an itemset can be frequent only if it is frequent in atleast one partition. This property enables us to apply divide and conquer type algorithms. We can divide the database into partitions and find the frequent itemsets in each partition. An itemset can be frequent only if it is frequent in atleast one of these partitions. To see that this is true, consider k partitions of sizes n1, n2,...,nk. Let minimum support be s. Consider an itemset which does not have minimum support in any partition. Then its count in each partition must be less than sn1, sn2,...,snk respectively. 3. Therefore its total count must be less than the sum of all these counts, which is s( n1 + n2 +...+ nk ). This is equal to s*(size of database). Hence the itemset is not frequent in the entire database. This is extended to data base. Figure 5.1: Frequency Itemset Generation Rizvi College of Engineering, Bandra, Mumbai. 16
  • 24. Chapter 5 IMPLEMENTATION MODULES: Network Connections Management Client-server computing or networking is a distributed application architecture that partitions tasks or workloads between service providers (servers) and service requesters, called clients. Often clients and servers operate over a computer network on separate hardware. A server machine is a high-performance host that is running one or more server programs which share its resources with clients. A client also shares any of its resources; Clients therefore initiate communication sessions with servers which await (listen to) incoming requests. Database Management The distributed database in our model is a horizontally partitioned database, which means the database schema of all the partitions are the same. However, distributed database also has an intrinsic data skew- ness property. The distributions of the item sets in different partitions are not identical, and many items occur more frequently in some partitions than the others. As a result, many item sets may be large locally at some sites but not necessarily in the other sites. This skewness property poses a new requirement in the design of mining algorithm. ARM Module: Association rule mining is an active data mining research area and most ARM algorithms cater to a centralized environment. However, adapting centralized data mining to discover useful patterns in dis- tributed database isn’t always feasible because merging data sets from different sites incurs huge network communication costs. Therefore, our research is to develop a distributed algorithm for geographically distributed data sets that reduces communication costs. EDMA Module: In this paper, we developed an efficient association rule mining algorithm in distributed databases called EDMA. We have found that many candidate sets generated by applying the Apriori-gen function are not needed in the search of frequent itemsets. In fact, there is a natural and effective method for every site to generate its own set of candidate sets, which is typically much smaller than the set of all the candidate sets. Following that, every site only needs to find the frequent itemsets among these candidate sets. The following lemma is described to illustrate the above observations. Rizvi College of Engineering, Bandra, Mumbai. 17
  • 25. Chapter 5 IMPLEMENTATION Results and Statistics Then 2-itemsets are formed with the returned globally large 1-itemsets in the particular site and local count is calculated. The process is repeated till no sets are formed or returned. Figure 5.2: Global support threshold: ((50/100)*12)= 6 [The global support count is calculated only by adding the counts of locally large item sets] Rizvi College of Engineering, Bandra, Mumbai. 18
  • 26. Chapter 5 IMPLEMENTATION Bread 9 Peanutbutter 6 Milk 5 Beer 3 Messages: [Considering site 3 as receiver site] Site 1: Messages sent = 2 Messages received= 2 Site 2: Messages sent = 3 Messages received = 1 Site 3: Messages sent = 3 Messages received = 5 TOTAL SENT TO SITE 3 = 3 TOTAL RECEIVED FROM SITE 3 = 5 TOTAL MESSAGES = 8 Figure 5.3: Data Flow Diagram of Admins Function Rizvi College of Engineering, Bandra, Mumbai. 19
  • 27. Chapter 5 IMPLEMENTATION Figure 5.4: Sequence Diagram of Manager,GUI & Application Rizvi College of Engineering, Bandra, Mumbai. 20
  • 28. Chapter 6 SYSTEM TESTING Chapter 6 SYSTEM TESTING The purpose of testing is to discover errors. Testing is the process of trying to discover every conceivable fault or weakness in a work product. It provides a way to check the functionality of components, sub assemblies, assemblies and/or a finished product It is the process of exercising software with the intent of ensuring that the Software system meets its requirements and user expectations and does not fail in an unacceptable manner. There are various types of test. Each test type addresses a specific testing requirement. 6.1 Types of Testing 6.1.1 Unit testing: Unit testing involves the design of test cases that validate that the internal program logic is functioning properly, and that program inputs produce valid outputs. All decision branches and internal code flow should be validated. It is the testing of individual software units of the application .it is done after the completion of an individual unit before integration. This is a structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform basic tests at component level and test a specific business process, application, and/or system configuration. Unit tests ensure that each unique path of a business process performs accurately to the documented specifications and contains clearly defined inputs and expected results. 6.1.2 Integration testing: Integration tests are designed to test integrated software components to determine if they actually run as one program. Testing is event driven and is more concerned with the basic outcome of screens or fields. Integration tests demonstrate that although the components were individually satisfaction, as shown by successfully unit testing, the combination of components is correct and consistent. Integration testing is specifically aimed at exposing the problems that arise from the combination of components. 6.1.3 Functional test: Functional tests provide systematic demonstrations that functions tested are available as specified by the business and technical requirements, system documentation, and user manuals. Functional testing is centered on the following items: Valid Input : identified classes of valid input must be accepted. Invalid Input : identified classes of invalid input must be rejected. Functions : identified functions must be exercised. Rizvi College of Engineering, Bandra, Mumbai. 21
  • 29. Chapter 6 SYSTEM TESTING Output : identified classes of application outputs must be exercised. Systems/Procedure : interfacing systems or procedures must be invoked. Organization and preparation of functional tests is focused on requirements, key functions, or special test cases. In addition, systematic coverage pertaining to identify Business process flows; data fields, predefined processes, and successive processes must be considered for testing. Before functional testing is complete, additional tests are identified and the effective value of current tests is determined. 6.1.4 System Test: System testing ensures that the entire integrated software system meets requirements. It tests a con- figuration to ensure known and predictable results. An example of system testing is the configuration oriented system integration test. System testing is based on process descriptions and flows, emphasizing pre-driven process links and integration points. 6.1.5 White Box Testing: White Box Testing is a testing in which in which the software tester has knowledge of the inner workings, structure and language of the software, or at least its purpose. It is purpose. It is used to test areas that cannot be reached from a black box level. 6.1.6 Black Box Testing: Black Box Testing is testing the software without any knowledge of the inner workings, structure or language of the module being tested. Black box tests, as most other kinds of tests, must be written from a definitive source document, such as specification or requirements document, such as specification or requirements document. It is a testing in which the software under test is treated, as a black box .you cannot see into it. The test provides inputs and responds to outputs without considering how the software works. 6.1.7 Unit Testing: Unit testing is usually conducted as part of a combined code and unit test phase of the software lifecycle, although it is not uncommon for coding and unit testing to be conducted as two distinct phases. Test strategy and approach Field testing will be performed manually and functional tests will be written in detail. Test objectives: • All field entries must work properly. • Pages must be activated from the identified link. • The entry screen, messages and responses must not be delayed. Features to be tested: • Verify that the entries are of the correct format • No duplicate entries should be allowed • All links should take the user to the correct page. Rizvi College of Engineering, Bandra, Mumbai. 22
  • 30. Chapter 6 SYSTEM TESTING 6.1.8 Integration Testing: Software integration testing is the incremental integration testing of two or more integrated software components on a single platform to produce failures caused by interface defects. The task of the inte- gration test is to check that components or software applications, e.g. components in a software system or one step up software applications at the company level interact without error. Test Results: All the test cases mentioned above passed successfully. No defects encountered. Acceptance Testing: User Acceptance Testing is a critical phase of any project and requires signifi- cant participation by the end user. It also ensures that the system meets the functional requirements. Test Results: All the test cases mentioned above passed successfully. No defects encountered. Rizvi College of Engineering, Bandra, Mumbai. 23
  • 31. Chapter 7 SYSTEM STUDY Chapter 7 SYSTEM STUDY 7.1 FEASIBILITY STUDY: The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company. Three key considerations involved in the feasibility analysis are • ECONOMICAL FEASIBILITY • TECHNICAL FEASIBILITY • SOCIAL FEASIBILITY 7.2 ECONOMICAL FEASIBILITY: This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. 7.3 TECHNICAL FEASIBILITY: This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system. 7.4 SOCIAL FEASIBILITY: The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised as he is the final user of the system. Rizvi College of Engineering, Bandra, Mumbai. 24
  • 32. Chapter 8 PLAN OF WORK & PROJECT TIMELINE Chapter 8 PLAN OF WORK & PROJECT TIMELINE Figure 8.1: Project Timeline Rizvi College of Engineering, Bandra, Mumbai. 25
  • 33. Chapter 8 PLAN OF WORK & PROJECT TIMELINE Gantt Charts The Gantt Chart shows planned and actual progress for a number of tasks displayed against a horizontal time scale. It is effective and easy-to-read method of indicating the actual current status for each of set of tasks compared to planned progress for each activity of the set. Gantt Charts provide a clear picture of the current state of the project. Figure 8.2: Gantt Charts Figure 8.3: Planned Gantt Charts Rizvi College of Engineering, Bandra, Mumbai. 26
  • 34. Chapter 8 PLAN OF WORK & PROJECT TIMELINE Figure 8.4: Pert Charts Rizvi College of Engineering, Bandra, Mumbai. 27
  • 35. Chapter 9 Conclusion and Future Scope Chapter 9 Conclusion and Future Scope 9.1 CONCLUSION Distributed ARM algorithms must reduce communication costs so that generating global association rules costs less than combining the participating sites’ datasets into a centralized site. We have developed an efficient algorithm for mining association rules in distributed databases. • Reduces the size of message exchanges by novel local and global pruning. • Reduces the time of scan partition databases to get support counts by using a compressed matrix- CMatrix, which is very effective in increasing the performance. • Founds a center site to manage every the message exchanges to obtain all globally frequent item- sets, only O(n) messages are needed for support count exchange. This is much less than a straight adaptation of Apriori, which requires O(n2) messages for support count exchange. 9.2 FUTURE ENHANCEMENT EDMA can be applied to the mining of association rules in a large centralized database by partitioning the database to the nodes of a distributed system. This is particularly useful if the data set is too large for sequential mining. In the future, as in our communication network the users concentration on different alarms is various, which makes how to decide the weight of each alarm to be further considered. Rizvi College of Engineering, Bandra, Mumbai. 28
  • 36. References References [1] ”A Fast Distributed Algorithm for Mining Association Rules”,Proc. Parallel and Distributed Infor- mation Systems; D.W. Cheung, et al., IEEE CS Press, 1996,pp. 31-42 [2] ”Introduction: Recent Developments in Parallel and Distributed Data Mining”,J. Distributed and Parallel Databases; M.J. Zaki and Y. Pin, vol. 11, no. 2, 2002,pp. 123-127 [3] ”Efficient Mining of Association Rules in Distributed Databases”,IEEE Trans. Knowledge and Data Eng.; D.W. Cheung , et al., vol. 8, no. 6, 1996,pp. 911-922 [4] ”Communication-Efficient Distributed Mining of Association Rules”; A. Schuster and R. Wolff, Proc. ACM SIGMOD Int’l Conf. Management of Data, ACM Press, 2001,pp. 473-484 [5] ”Mining Association Rules Between Sets of Items in Large Databases”; R. Agrawal, T. Imielinski, and A. Swami, Proc. ACMSIGMOD Int’l Conf. Management of Data, , May 1993 [6] ”An Optimized Distributed Association Rule Mining Algorithm”; M.Z Ashrafi, Monash University ODAM, IEEE DISTRIBUTED SYSTEMS ONLINE 1541-4922 2004 [7] ”The Data Warehouse Toolkit, The Complete Guide to Dimensional Modeling”,2nd edn. John Wi- ley & Sons; Kimball, R., Ross, M., New York (2002) [8] ”Web for Data Mining: Organizing and Interpreting the Discovered Rules Using the Web”,SIGKDD Explorations; Ma, Y., Liu, B., Wong, C.K., Vol. 2 (1). ACM Press, (2000) 16- 23. [9] ”New Algorithm for Fast Discovery of Association Rules”,Technical Report No. 261; Zaky, M.J., Parthasarathy, S., Ogihara, M., Li, W., University of Rochester(1997),http://cs.aue.aau.dk/contribution/projects/datamining/papers/t r651.pdf Rizvi College of Engineering, Bandra, Mumbai. 29
  • 37. Project Hosting Appendix A Project Hosting The project is hosted at Google Code. The complete source code along with the manual to operate the project and supplementary files are uploaded. Project Link : https://code.google.com/p/proquiz QR CODE: Rizvi College of Engineering, Bandra, Mumbai. 30