3. Follow Us:3
Introduction
What is Data?
Facts and statistics collected together for
reference or analysis.
As per Moore’s Law,
The information density on silicon integrated
circuits double every 18 to 24 months.
4. Follow Us:4
There has been a huge increase in the amount
of data being stored in database over past
twenty years.
All that user wants is more sophisticated
information.
This surges up the demand of Data mining.
5. Follow Us:5
What is Data mining?
Data mining refers to extracting or “mining”
Knowledge from large amount of data.
- By Jiawei Han, Micheline Kamber,
6. Follow Us:6
Data mining focuses on extraction of information
from a large set of data and transforms it into an
easily interpretable structure for further use.
As data is growing at very remarkable rate, there
comes a need to analyze large, complex and
information rich data sets to gain the hidden
information. This may result into greater customer
satisfaction and remarkable turn over for the firm.
7. Follow Us:
Why Data mining?
7
I am not able to find the data I need.
(Data is dispersed over the network)
The data I have access to is poorly documented.
(Proper information is missing)
I am not able to use the data I have.
(Unexpected results)
8. Follow Us:
What is Data Warehouse?
8
Data Warehouse is an integrated, subject oriented
and time-varying collection of data that is used
primarily in organizational decision making.
-Bill Inmon, Building the Data Warehouse 1996
Data Warehousing:
Data Warehousing is a process of managing data
from various sources in order of answering the
Business questions.
9. Follow Us:9
Examples of Data warehousing:
Wal-Mart – Terabytes (10^12 bytes)
Geographic Information System – Peta bytes(10^15 bytes)
National Medical Records – Extra bytes (10^18 bytes)
Weather Images- Zettabyte(10^21 bytes)
Intelligent video agency- Yottabyte (10^24 bytes)
10. Follow Us:
How data mining works with
warehouse data?
10
Data warehouse avails enterprise with memory.
Data warehouse avails enterprise with intelligence.
11. Follow Us:
Example of Data mining in IT:
11
A company that offers online services in several
countries faces a problem in times of loading the
homepage. Even IT department employees could
not solve the bug. Based on customer feedback and
last mile measurements, company gets to know that
IT services are suffering from insufficient time for
loading the homepage for some users.
12. Follow Us:12
Management has the high interest in solving
the problem because the response time of the
site directly affects the online sales, IT
department has already looked in to log files
with the help of mining tool.
It has been discovered that there’s no such
single type of typical problem. Instead data
mining program discovers that there are
various problem clusters.
13. Follow Us:13
The patterns for this problem is determined by a
combination of country code , operating system, and
the version of a browser that the customers use.
One another pattern shows that the same bad
performance in loading times of the website is
caused by a different problem.
14. Follow Us:14
Speed of internet and cookie settings also lead
to greater time for loading the site. This is how
data mining helps in indentifying the problem
and hence solving the problem becomes quick.
This is just one example of how data mining
can be so useful.
15. Follow Us:15
Data mining Models and Tasks
Data mining is widely divided into two
parts:
Predictive Data mining
Descriptive Data mining
17. Follow Us:17
Predictive Data mining
The objective of predictive tasks is to use the
values of some variable to predict the values
of other variable.
Ex: Web mining is used by the online
marketers to predict the purchase by online
user on a website.
18. Follow Us:18
Classification is used to map data in a predefined
groups.
Regression maps a data item to a real valued
prediction variable.
Clustering forms a similar data together.
Summarization is used to map data in a subsets.
Link Analysis defines relationships among data.
19. Follow Us:19
Descriptive Data mining
The objective of descriptive tasks is to find
human readable patterns which describes the
relationships between data.
20. Follow Us:20
Architectures of a data mining
system
There are four Data mining architecture:
I. No- Coupling
II. Loose Coupling
III.Semi tight Coupling
IV.Tight coupling
21. Follow Us:21
• In this architecture, data mining system doesn’t use
any functionality of a database or data warehouse
system.
• Data is retrieved from data sources like file system and
processed using data mining algorithms which are
stored into file system.
• This architecture is considered as a poor architecture
for data mining system as it does not take any
advantages of database or data warehouse.
• However it is used for simple data mining
processes
No-coupling:
22. Follow Us:22
The loose coupling data mining system uses
database or data warehouse for data retrieval.
In this architecture, data mining system retrieves
data from database or data warehouse, processes
data using data mining algorithms and stores the
result in those systems.
Loose coupling architecture is for memory-based
data mining system which does not require high
scalability and high performance.
Loose Coupling:
23. Follow Us:23
In semi-tight coupling data mining architecture, it
not only links it to database or data warehouse
system, but it also uses several features of
database or data warehouse systems which
perform some data mining tasks like sorting and
indexing etc. Moreover the intermediate result can
also be stored in database or data warehouse
system for better performance.
Semi-tight Coupling:
24. Follow Us:24
Tight-coupling:
In this architecture, database or data warehouse is
treated as an information retrieval component.
Tight-coupling data mining architecture provides
scalability, high performance and integrated
information.
25. Follow Us:25
Three tiers of Tight-coupling
Data mining Architecture:
i. Data Layer
ii. Data mining application layer
iii.Front-end Layer
26. Follow Us:26
Data layer can be database or data warehouse
systems which is an interface for all data
sources. It stores results in data layer so it can
be presented to end-user in form of reports.
Datalayer
27. Follow Us:27
This layer is used to retrieve data from
database. And by performing transformation
routine, we can get data into desired format.
Then data is processed using various data
mining algorithms.
Data mining application layer
28. Follow Us:28
Front-end layer provides visceral and user
friendly interface for end-user to interact with
data mining system.
The result of data mining is presented in
visualization form in front-end layer.
Front-end layer:
29. Follow Us:29
Although data mining is still in its early stage;
companies in a wide range of industries such
as retail, heath care, manufacturing, finance,
and transportation are already using data
mining techniques to take advantage of
historical data.
Data mining helps analysts to recognize
significant facts, relationships, trends, patterns
and anomalies which might go unnoticed
otherwise.
Application Area of Data mining:
30. Follow Us:30
Data mining uses pattern recognition technologies
and mathematical techniques to sift through
warehoused information.
In business, data mining is useful for discovering
patterns and relationships in data to help make
better decisions.
Data mining helps in developing smarter
marketing campaigns and to predict customer
loyalty.
31. Follow Us:31
Advantages of Data Mining
Marketing /Retail :
Data mining helps marketing companies to build
models based on historical data which will precisely
predict responders to the new marketing campaigns.
Marketers will have appropriate approach for targeted
customers.
Data mining helps retail companies as well. By using
market basket analysis, a store can have an
appropriate arrangement in such a way that customers
can purchase frequent buying products together with
pleasant. It also helps the retail companies to offer
certain discounts which will attract more customers.
32. Follow Us:32
Finance / Banking
By building a model from historical customer’s
data of loans, the bank officials and financial
institution can determine good and bad loans.
Data mining also helps banks to detect fraudulent
credit card transactions.
33. Follow Us:33
Manufacturing
Data mining is useful in operational engineering
data which can detect faulty equipments and
determines optimal control parameters.
Data mining can determine the range of control
parameters which leads to the production of perfect
product. Hence optimal control parameters can
provide the desired quality.
34. Follow Us:34
Governments
Data mining helps government agency to analyze
records of financial transaction which will help in
building patterns that can detect money laundering
or criminal activities.
35. Follow Us:35
Market segmentation - Data mining helps to
identify the common characteristics of customers
who buy the same products from your company.
Customer anticipation - It helps to predict which
customers may leave your company and go to a
competitor.
Fraud detection - It indentifies which transactions
are most likely to be fraudulent.
Specific use of data mining
include:
36. Follow Us:36
Direct marketing - Direct marketing identifies
which prospects should be included to obtain the
highest response rate.
Interactive marketing - It is useful for predicting
what each user on a Web site is most likely
interested in seeing.
Market basket analysis - It helps to understand
what products or services are commonly
purchased together.
Trend analysis - Trend analysis identifies the
difference between a typical customer this month
and last.
37. Follow Us:37
Disadvantages of data mining
Privacy Issues
The internet is booming with social networks, e-
commerce, blogs etc, the concerns about the
personal privacy has been increasing.
This worries the users as the information might be
collected and used in unethical way which can
potentially cause a lot of troubles.
Businesses collect the information of its users for
setting up the marketing strategies but there are
chances that business might be taken by other firms
or gets shut down and that’s where a concern of
misusing or leaking the personal information arises.
38. Follow Us:38
Security issues
Security is the biggest concern in data mining.
Businesses own all the information of their
employees which even includes personal and
financial information, there are the chances of
misusing data by hackers and which cause
serious trouble to the organization and its
employees.
39. Follow Us:39
Misuse of information/inaccurate information
The information collected by the data mining may
be exploited by unethical people or businesses in
order to take benefits of vulnerable people.
Data mining techniques is not totally accurate. So
inaccurate information may lead the wrong
decision-making which may cause serious
consequences.
40. Follow Us:40
Challenges of Data Mining
Scalability: Scalable techniques are needed for
handling massive size of datasets that are now
being created.
Poor efficiency: Such large datasets require the
use of efficient methods for storing, indexing, and
retrieving data from secondary or even tertiary
storage systems.
In order to perform the data mining task in timely
manner, data mining techniques are needed.
41. Follow Us:41
Complexity: Such techniques can dramatically
increase the size of the datasets which can be
handled and for that it requires new designs and
algorithms.
Dimensionality: Some domains have number of
dimensions which are very large and makes the data
analyzing difficult. That is called as 'curse of
dimensionality’. For example, bioinformatics.