The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
Mining internal sources of data
1. Mining internal sources of data
Data mining is a process of discovering interesting knowledge, such
as patterns, associations, changes, anomalies and significant
structures from large amount of data stored in databases and data
warehouses. Technically, data mining is the process of finding
correlations or patterns among dozens of fields in large relational
databases.
Data warehouse:
A data warehouse is a central repository for all
or significant parts of the data that an enterprise's various business
systems collect.
What can data mining do?
Data mining is primarily used today by companies with a strong
consumer focus - retail, financial, communication, and marketing
organizations. It enables these companies to determine
relationships among "internal" factors such as price, product
positioning, or staff skills, and "external" factors such as economic
indicators, competition, and customer demographics. And, it
enables them to determine the impact on sales, customer
satisfaction, and corporate profits. Finally, it enables them to "drill
down" into summary information to view detail transactional data.
2. Steps of Data mining
There are various steps that are involved in mining data.
Data Integration: First of all the data are collected and
integrated from all the different sources.
Data Selection:We may not all the data we have collected
in the first step. So in this step we select only those data
which we think useful for data mining.
Data Cleaning:The data we have collected are not clean
and may contain errors, missing values, noisy or inconsistent
data. So we need to apply different techniques to get rid of
such anomalies.
Data Transformation:The data even after cleaning are not
ready for mining as we need to transform them into forms
appropriate for mining. The techniques used to accomplish
this are smoothing, aggregation, normalization etc.
Data Mining: Now we are ready to apply data mining
techniques on the data to discover the interesting patterns.
Techniques like clustering and association analysis are among
the many different techniques used for data mining.
Pattern Evaluation and Knowledge Presentation: This
step involves visualization, transformation, removing
redundant patterns etc from the patterns we generated.
Decisions / Use of Discovered Knowledge: This step
helps user to make use of the knowledge acquired to take
better decisions.
3. Evolution of data mining
Data mining is a direct result of the increasing use of computer
databases in order to store and retrieve information. Data collection
technology existed in a primitive form starting in the 1960s. It was
used to find out basic information about how much a company
earned over a given period of time.
At this time, the primary methods of storage were tapes, disks, and
some computers. The computers at this time had very little storage
capacity, and only the largest companies or organizations could
afford them. By the 1980s, computers had become smaller, faster,
and cheaper, and they also had more storage capabilities. By this
time, data access was used to find out how many product sales
occured within a given period of time.
It was during the 1980s that true computerized databases begin to
be widely used for the first time. The introduction of computerized
databases allowed data warehouses to be created for the first time.
The databases used for this were called multidimensional
databases. It was during the late 1980s and 1990s that data mining
begin to exist in the form that is present today. Instead of simply
finding out how many sales occured within a given period of time,
companies could now find out more about the customers who
contributed to those sales. Computers are now faster and cheaper
than ever before, and they also have high storage capabilties.
Data mining techniques and sources
Several core techniques that are used in data mining describe the
type of mining and data recovery operation.
Let's look at some key techniques and examples of how to use
different tools to build the data mining.
4. Association:
Association (or relation) is probably the better known
and most familiar and straightforward data mining technique. Here,
you make a simple correlation between two or more items, often of
the same type to identify patterns. For example, when tracking
people's buying habits, you might identify that a customer always
buys chips when they buy cold drinks, and therefore suggest that
the next time that they buy cold drinks they might also want to buy
chips.
Clustering:
Clustering is a data mining technique that makes
meaningful or useful cluster of objects which have similar
characteristics using automatic technique. To make the concept
clearer, we can take book management in library as an example. In
a library, there is a wide range of books in various topics available.
The challenge is how to keep those books in a way that readers can
take several books in a particular topic without hassle. By using
clustering technique, we can keep books that have some kinds of
similarities in one cluster or one shelf and label it with a meaningful
name. If readers want to grab books in that topic, they would only
have to go to that shelf instead of looking for entire library.
Prediction:
The prediction, as it name implied, is one of a data
mining techniques that discovers relationship between independent
variables and relationship between dependent and independent
variables. For instance, the prediction analysis technique can be
used in sale to predict profit for the future if we consider sale is an
independent variable, profit could be a dependent variable. Then
based on the historical sale and profit data, we can draw a fitted
regression curve that is used for profit prediction.
5. Sequential Patterns:
Often used over longer-term data,
sequential patterns are a useful method for identifying trends,
or regular occurrences of similar events. For example, with
customer data you can identify that customers buy a
particular collection of products together at different times of
the year. In a shopping basket application, you can use this
information to automatically suggest that certain items be
added to a basket based on their frequency and past
purchasing history.
Decision trees:
Related to most of the other techniques
(primarily classification and prediction), the decision tree can
be used either as a part of the selection criteria, or to support
the use and selection of specific data within the overall
structure. Within the decision tree, you start with a simple
question that has two (or sometimes more) answers. Each
answer leads to a further question to help classify or identify
the data so that it can be categorized, or so that a prediction
can be made based on each answer.
Classification:
Stored data is used to locate data in
predetermined groups. You can use classification to build up
an idea of the type of customer, item, or object by describing
multiple attributes to identify a particular class. For example,
you can easily classify cars into different types (sedan, 4x4,
convertible) by identifying different attributes (number of
seats, car shape, driven wheels). Given a new car, you might
apply it into a particular class by comparing the attributes
with our known definition. You can apply the same principles
to customers, for example by classifying them by age and
social group.