Data mining and machine learning used to be two cousins. They have different parents. Now they grow increasingly like each other, almost like twins. Many times people even call data mining by the name machine learning. The field of machine learning grew out of the effort of building artificial intelligence. Its major concern is making a machine learn and adapt to new information. The field of data mining grows out of knowledge discovery from databases. Data mining is focused on better understanding of characteristics and patterns among variables in large databases using a variety of statistical and analytical tools. It is used to identify relationships among variables in large data sets and understand hidden patterns that they may contain.
http://nguyenngocbinhphuong.com/supervised-learning-vs-unsupervised-learning/
2. Data mining is focused on better understanding
of characteristics and patterns among variables
in large databases using a variety of statistical
and analytical tools.
◦ It is used to identify relationships among variables in
large data sets and understand hidden patterns that
they may contain.
◦ XLMiner software implement many basic data
mining procedures in a spreadsheet environment.
2
4. In supervised data mining techniques, there is a
dependent variable the method is trying to predict.
◦ The classification and prediction/forecasting methods
are supervised data mining techniques.
In unsupervised data mining techniques, there is
no dependent variable. Instead, these techniques
search for patterns and structure among all of the
variables.
◦ One popular unsupervised method is association
analysis (known in marketing as market basket analysis)
◦ The most common unsupervised method is clustering
(known in marketing as segmentation).
4
DescriptiveanalyticsPredictiveanalytics
5. Suppose you had a basket and
filled it with different kinds of fruits.
5
6. We have four types of fruits:
6
APPLE
BANANA
GRAPE
CHERRY
7. You already learn from your previous work about the
physical characters of fruits. So arranging the same type
of fruits at one place is easy now.
In data mining terminology the earlier work is called as
training data. You already learn the things from your
training data. This is because of response variable.
7
8. Suppose you have taken a new fruit from the
basket then you will see the size, color, and shape
of that particular fruit.
◦ If size is Big, color is Red, the shape is rounded shape
with a depression at the top, you will confirm the fruit
name as Apple and you will put in Apple group.
If you learn the thing before from training data and
then applying that knowledge to the test data (for
new fruit), this type of learning is called as
supervised learning.
8
9. This time, you don’t know anything about the
fruits, honestly saying this is the first time you
have seen them. You have no clue about those.
So, how will you arrange them? What will you do
first?
9
You will take a fruit and you
will arrange them by
considering the physical
character of that particular
fruit.
10. Suppose you have considered color. Then you will
arrange them on considering base condition as
color. Then the groups will be something like this:
◦ RED COLOR GROUP: apples & cherries.
◦ GREEN COLOR GROUP: bananas & grapes.
So now you will take another physical character
such as size.
◦ RED COLOR AND BIG SIZE: apples.
◦ RED COLOR AND SMALL SIZE: cherries.
◦ GREEN COLOR AND BIG SIZE: bananas.
◦ GREEN COLOR AND SMALL SIZE: grapes.
10
11. Here you did not learn anything before, means no
training data and no response variable.
In data mining, this kind of learning is known as
unsupervised learning.
11