This document provides an overview of business intelligence, data mining, and predictive analytics. It defines business intelligence as information used to support decision making, and notes that data mining and predictive analytics fall under the business intelligence field. Data mining is defined as the process of discovering patterns in large datasets using methods from artificial intelligence, machine learning, statistics, and databases. The overall goal of data mining is to extract useful information from data to support predictive analytics like forecasting. Data analytics focuses on making inferences from existing data to verify or disprove models, while predictive analytics predicts outcomes at the individual level. The document discusses various data mining techniques and applications in domains like marketing, fraud detection, and medicine. It also covers advantages and challenges of using data
2. What is Business Intelligence
• Basic Definition :Information that people use to
support their decision making efforts. Data
Mining and Data Analytics/predictive Analytics
falls within this field.
3. What is Data Mining
Data Mining (The analysis step of Knowledge
Discovery in Databases” Process or KDD), an
interdisciplinary subfield of computer Science, is
the computational process of discovering patterns
in large data sets involving methods at the
intersection of artificial intelligence, machine
learning, statistics, and database management
systems.
4. Basic-Definitions of Data Mining
• The discovery of new, non-obvious, valuable information from
a large collection of raw data
• Data Mining (DM) is the core of the KDD [Knowledge Discovery in
Databases] process, involving the inferring of algorithms that explore
the data, develop the model and discover previously unknown
patterns.
• The set of activities used to find new, hidden or unexpected
patterns in data
5. Definitions of Data Mining
The detection of patterns from existing data.
pattern n. (păt’ ərn)
1. A consistent, trait, feature, or method.
2. Any combination of values that contain meaning within the
context or domain for which they are being reviewed
6. Data Mining -continued
The overall goal of the data mining process is to
extract information from a data set and transform it
into an understandable structure for further use:
(Predictive analytics)
Discovering meaningful new corrections, patterns,
trends.
Example : Forecasting
7. Data Analytics/Predictive Analytics
Data analytics (DA) is the science of
examining raw data with the purpose of
drawing conclusions about that information.
Data analytics is used in many industries to
allow companies and organization to make
better business decisions and in the sciences
to verify or disprove existing models or
theories
8. Data analytics is distinguished from Data
mining by the scope, purpose and focus of the
analysis. Data miners sort through huge data sets
using sophisticated software to identify
undiscovered patterns and establish hidden
relationships. Data analytics focuses on inference,
the process of deriving a conclusion based solely
on what is already known by the researcher.
9. Predictive analytics - Focus
Uses lower level of Granularity, meaning it looks at
the individual level. Instead of looking at which
candidate will win the Presidential election in the
state of Ohio, which is forecasting. It looks at the
individual level.
Which person is voting for or against.
Predicts which individuals can be persuaded, which
ones will not change, etc. Now with this
information we ca change the outcome of the race.
Obama used this technique very well.
10. Emerging Technology
Data mining is one of the “10 emerging technologies
that will change the world” listed by the MIT
Technology Review (Larose).
There is no doubt why many firms embrace data mining
in their operations. An article in Information System
Management points out that “data mining has become a
widely accepted process for organizations to enhance
their organizational performance and gain a competitive
advantage”
11. Data Mining: Business
• What is it?
• Decision making
• Marketing
• Detecting Fraud
• This technology is popular with many businesses
because it allows them to learn more about their
customers, prevent frauds and identity theft, and also
make smart marketing decisions
12. Keys to a Successful Data Mining Project
• Credible source of data
• Knowledgeable personnel
• Appropriate algorithms
13. Classification classify a data item into one of
several predefined classes
Regression map a data item to a real-value
prediction variable
Clustering identify a finite set of
categories or clusters to
describe the data
Summarization find a compact description for a
set (or subset) of data
Dependency Modeling describe significant dependencies
between variables or between the
values of a feature
Change and Deviation
Detection
Discover the most significant
changes
Primary Tasks of Data Mining
14. Some of the commonly used data
mining methods are:
• Statistical Data Analysis
• Cluster Analysis
• Decision Trees and Decision Rules
• Association Rules
• Artificial Neural Networks
• Genetic Algorithms
• Fuzzy Sets and Fuzzy Logic
15. Data Mining Applications
In direct marketing a company saves much time
by marketing to prospects that would have the
highest reply rate. Instead of random selection on
which customers to pick for their surveys, a
company could use direct marketing from data
mining to find the “correct” customers to ask.
16. Direct Marketing-Example 1 million mailers- cost
$.40 to ship letter=400,000 cost
Conversion is 1percent without data mining
17. Direct Marketing using data mining, gives us 3%
Conversion
• Identifies smaller group, example ¼ of
population and gets a higher conversion, 3% ,
18. Data Mining Applications
Market segmentation is used in data mining in
order to identify the common characteristics of
customers who buy the products from one’s
company.
With market segmentation, you will be able to
find behaviors that are common among your
customers. As a company seeks customer’s trends,
it helps them find necessities in order to help them
improve their business.
19. Data Mining Applications
Customer churn predicts which
customers will have a change of heart
towards your company and join another
company (competitor). Although
customer churns are negative to one’s
business, it allows the corporation to seek
out the problem they are facing and create
solutions.
21. Data Mining Applications
Market basket analysis- involves researching
customer characteristics in respect to their
purchase patterns
Example: Ralphs Club Card
Cereal and Milk
23. Prediction based on Data
mining/Predictive analysis
• Examples of real life.
• Target – can predict which customers will be
pregnant
• Hospitals can predict which payments may need
to be admitted
• Credit card – can predict which customers may
miss their payment based upon where card is
used. Example Bar-alcohol=missed payments
25. Data Mining Applications
Class identification, which consists of
mathematical taxonomy and concept clustering.
Mathematical taxonomy focuses on what makes the
members of a certain class similar, as opposed to
differentiating one class from another.
For example, Ralphs can classify its customers based
on their income or past purchases
26. Data Mining Applications
Concept clustering - determines clusters according to
attribute similarity.
Consider the pattern a purchase of toys for age group 3–5
years, is followed by purchase of kid’s bicycle within 6
months about 90% of the time by high income customers,
which was discovered by data mining. The Company can
identify the prospective customers for kid’s bicycle based
on toy purchase details and adjust the mail catalog
accordingly.
27. Data mining Applications
Deviation analysis, A deviation can be fraud or a
change. In the past, such deviations were difficult
to detect in time to take corrective action. Data
mining tools help identify such deviations .
For example, a higher than normal credit purchase
on a credit card can be a fraud, or a genuine
purchase by the customer. Once a deviation has
been discovered as a fraud, the company takes
steps to prevent such frauds and initiates
corrective action
29. •Sensitive information
Data mining increase
incentives to get
more sensitive data
Seeing into private
future- Target
Do we have the right
Employers try to
predict churn
Privacy
30. • Types
o –Coverage or frame error
o –Sampling error
o –Nonresponsive error
o –Measurement error
• Flawed data
Response Bias Issues
31. Data Mining in Medical
The most recent and most promising use of data
mining has been the development of data mining
tools for the medical sector. The use of
data mining to extract patterns from medical data
provides near endless opportunities for symptom
trend detection, earlier detection of illness, DNA
trend analysis and improved patient reactions to
medicines. These many advantages allow doctors and
hospitals to be more effective and more efficient.
32. Advantages of Data Mining: Medicine
• Earlier detection of illness
• Symptom trends
• Data analysis
• Improved drug reactions
33. • No uniform language - Medical
• Incomplete records
• Privacy
Disadvantages of Data Mining: Medicine
34. Data mining - Medical
How data mining is actually used to analyze
individual data can become quite complex due to the
data. The goal of the process is to take the medical
data which contain many attributes and determine
which ones are actually relevant to the diagnosis,
symptom or result.
Two methods used in medical data mining are
clustering, discussed previously and biclustering.
35. BiClustering
A new form of clustering called biclustering is
now being used to help associate certain genetic
patterns with illnesses. Biclustering for genetic
research is that it does not simply assign a sample
to a certain classification across the board it takes
into account other variables which increases
accuracy since not all genetic traits are active all
the time, special conditions are sometimes
necessary for traits to surface.
36. Study shows automated data mining surveillance helps detect blood culture
contamination.(Clinical report)
World Disease Weekly, Jul 11, 2006. Reading Level (Lexile):1440
MedMined, Inc., a medical information technology company, announced results of a study
that showed how automated data mining surveillance helped in detecting and reducing an
endemic issue of blood culture contamination in a hospital's hematology/oncology unit.
The study was conducted at Hendrick Medical Center, a 453-bed hospital in Abilene,
Texas, that began using MedMined's Data Mining Surveillance service in June 2005.
After incorporating data mining surveillance house-wide, the hospital discovered that, in
patients who had a hospital stay of 3 or more days, blood isolates caused primarily from
skin flora occurred with the third most frequency. Further, the unit with the highest rate of
blood isolates was hematology/oncology.
After implementing process improvements, results from October to December 2005
demonstrated a 50% decrease in the overall number of isolates from patients on the unit,
with a 61.5% decrease in isolates of skin flora.
MedMined is a Birmingham, Alabama-based company founded in 2000 to provide data
mining analysis and related technical, clinical, and financial consulting services to the
healthcare community. MedMined's patented infection-tracking technology and proprietary
service are being used by nearly 200 hospitals.
This article was prepared by World Disease Weekly editors from staff and other reports.
Copyright 2006, World Disease Weekly via NewsRx.com.
Data mining is the process of detecting patterns in existing database. For a successful data
mining project, credible source, knowledgeable personnel, and appropriate methodology are the
key factors. Today data mining is widely implemented in many fields. In business, data mining
enables firms to better understand their customers, prevent theft and fraud, and make better
business decision. However, there are issues regarding privacy, security, and misuse of
information as well. In addition, marketers find data mining helpful in strategies development,
class identification, and deviation analysis even though privacy issue and response bias are still
the problems. Furthermore, data mining provides new opportunities for symptom trend
detection, early detection of new illnesses, DNA analysis, and the study of patients’ reactions to
medicines. Using data mining in medical industry, on the other hand, has a number of
disadvantages including lack of uniform code, outdated translation, huge amount of information,
and privacy issues.