2. Introduction
What is Data Mining?
Data mining is the process of searching and analyzing a
large batch of raw data in order to identify patterns and
extract useful information.
Data mining involves exploring and analyzing
large blocks of information to glean meaningful
patterns and trends.
3. Process of Data Mining
The data mining process breaks down into four
steps:
• Data Gathering: Data is collected and loaded into data
warehouses on-site or on a cloud service.
• Data Preparation: Business analysts, management teams, and
information technology professionals access the data and
determine how they want to organize it.
• Data Mining: Custom application software sorts and organizes
the data.
• Data Analysis & Interpretation: The end user presents the data
in an easy-to-share format, such as a graph or table.
6. Data Mining Techniques
Data mining uses algorithms and various other techniques to
convert large collections of data into useful output. The most
popular types of data mining techniques include:
• Association rules, also referred to as market basket analysis,
search for relationships between variables. For example,
association rules would search a company’s sales history to
see which products are most commonly purchased together;
with this information, stores can plan, promote, and forecast.
• Classification uses predefined classes to assign to objects.
These classes describe the characteristics of items or
represent what the data points have in common with each.
This data mining technique allows the underlying data to be
more neatly categorized and summarized across similar
features or product lines.
7. Data Mining Techniques
• Clustering is similar to classification. However, clustering identifies
similarities between objects, then groups those items based on what
makes them different from other items. While classification may
result in groups such as "shampoo," "conditioner," "soap," and
"toothpaste," clustering may identify groups such as "hair care" and
"dental health.“
• Decision Trees are used to classify or predict an outcome based on
a set list of criteria or decisions. A decision tree is used to ask for the
input of a series of cascading questions that sort the dataset based
on the responses given. Sometimes depicted as a tree-like visual, a
decision tree allows for specific direction and user input when
drilling deeper into the data.
8. Data Mining Techniques
• K-Nearest Neighbor (KNN) is an algorithm that classifies data
based on its proximity to other data. The basis for KNN is rooted in
the assumption that data points that are close to each other are
more similar to each other than other bits of data.
• Neural networks process data through the use of nodes. These
nodes are comprised of inputs, weights, and an output. Data is
mapped through supervised learning, similar to the ways in which
the human brain is interconnected.
• Predictive analysis strives to leverage historical information to
build graphical or mathematical models to forecast future
outcomes. Overlapping with regression analysis, this technique aims
at supporting an unknown figure in the future based on current
data on hand.
9. Uses of Data Mining
Basket Analysis
In its most basic application, retailers use basket analysis to analyze
what consumers buy or put in their “baskets”. This is a form of the
association technique, giving retailers insight into buying habits and
allowing them to recommend other purchases.
Sales Forecasting
Sales forecasting is a form of predictive analysis to which businesses
are devoting more of their budgets. Data mining can help businesses
project sales and set targets by examining historical data such as
sales records, financial indicators (e.g., consumer price index, S&P
500, inflation markers), consumer spending habits, sales attributed to
a specific time of year, and trends which may impact standard
assumptions about the business.
10. Uses of Data Mining
Database Marketing
Businesses build large databases of consumer data that they use to
shape and focus their marketing efforts. These businesses need
ways to manage and harness this data to develop targeted,
personalized marketing communications. Data mining helps
businesses understand consumer behaviors, track contact
information and leads, and engage more customers in their marketing
databases.
Inventory Planning
Data mining can provide businesses with up-to-date information
regarding product inventory, delivery schedules, and production
requirements. Data mining also can help remove some of the
uncertainty that comes with simple supply-and-demand issues within
the supply chain. The speed with which data mining can discern
patterns and devise projections helps companies better manage their
product stock and operate more efficiently.
11. Uses of Data Mining
Customer Loyalty
Businesses — particularly retailers — generate an enormous amount of
data through loyalty programs. Data mining allows these businesses to
build and enhance customer relationships through that data. For
example, by clustering customers according to basket totals, shopping
frequency, and likely grocery spend per week, retailers can offer
customers discounts to “ratchet” them up to a spending level (e.g.,
spend ₹50 get ₹5 off; spend ₹75, get ₹10 off). This not only provides the
customer with an incentive to shop, but it also helps to retain rupees
being targeted by competitors.
12. Difference Between Data Mining & Machine Learning
Data mining and machine learning are unique processes that are often
considered synonymous. However, while they are both useful for detecting
patterns in large data sets, they operate very differently.
Data mining is the process of finding patterns in data. The beauty of data
mining is that it helps to answer questions we didn’t know to ask by proactively
identifying non-intuitive data patterns through algorithms (e.g., consumers who
buy peanut butter are more likely to buy paper towels). However, the
interpretation of these insights and their application to business decisions still
require human involvement.
Machine learning, meanwhile, is the process of teaching a computer to learn
as humans do. With machine learning, computers learn how to determine
probabilities and make predictions based on their data analysis. And, while
machine learning sometimes uses data mining as part of its process, it
ultimately doesn’t require frequent human involvement on an ongoing basis
(e.g., a self-driving car relies on data mining to determine where to stop,
accelerate, and turn).
13. Benefits of Data Mining
• More effective marketing and sales: Data mining helps marketers
better understand customer behavior and preferences, which helps
them create targeted marketing and advertising campaigns. Similarly,
sales teams can use data mining results to improve lead conversion
rates and sell additional products and services to existing customers.
• Better customer service: Data mining helps companies identify
potential customer service issues more promptly and give contact
center agents up-to-date information to use in calls and online chats
with customers.
• Improved SCM: Organizations can spot market trends and forecast
product demand more accurately, enabling them to better manage
inventories of goods and supplies. Supply chain managers can also
use information from data mining to optimize warehousing,
distribution and other logistics operations.
14. Benefits of Data Mining
• Increased production uptime. Mining operational data from
sensors on manufacturing machines and other industrial
equipment supports predictive maintenance applications to
identify potential problems before they occur, helping to avoid
unscheduled downtime.
• Stronger risk management. Risk managers and business
executives can better assess financial, legal, cybersecurity and other
risks to a company and develop plans for managing them.
• Lower costs. Data mining helps improve cost savings through
operational efficiencies in business processes and reduces
redundancy and waste in corporate spending.
15.
16. Applications of Data Mining
• Retail: Online retailers mine customer data and internet clickstream
records to help them target marketing campaigns, ads and promotional
offers to individual shoppers. Data mining and predictive modeling also
power the recommendation engines that suggest possible purchases to
website visitors, as well as inventory and SCM activities.
• Financial services: Banks and credit card companies use data mining tools
to build financial risk models, detect fraudulent transactions, and vet loan
and credit applications. Data mining also plays a key role in marketing and
identifying potential upselling opportunities with existing customers.
• Insurance: Insurers rely on data mining to aid in pricing insurance policies
and deciding whether to approve policy applications, as well as for risk
modeling and managing prospective customers.
• Manufacturing: Data mining applications for manufacturers include
efforts to improve uptime and operational efficiency in production plants,
supply chain performance and product safety.
17. Applications of Data Mining
• Entertainment: Streaming services analyze what users are watching or
listening to and make personalized recommendations based on their
viewing and listening habits. Likewise, individuals might data mine
software to learn more about it.
• Healthcare: Data mining helps doctors diagnose medical conditions, treat
patients, and analyze X-rays and other medical imaging results. Medical
research also depends heavily on data mining, machine learning and other
forms of analytics.
• HR: HR departments typically work with large amounts of data. This
includes retention, promotion, salary and benefit data. Data mining
compares this data to better help HR processes.
• Social media: Social media companies use data mining to gather large
amounts of data about users and their online activities. This data is
controversially either used for targeted advertising or might be sold to
third parties.
18. Data Mining and Social Media
• One of the most lucrative applications of data mining has been
undertaken by social media companies. Platforms like Facebook,
TikTok, Instagram, and X platform (formerly Twitter) gather reams
of data about their users, based on their online activities.
• That data can be used to make inferences about their preferences.
Advertisers can target their messages to the people who appear to
be most likely to respond positively.
• Data mining on social media has become a big point of contention,
with several investigative reports and exposes showing just how
intrusive mining users' data can be. At the heart of the issue, users
may agree to the terms and conditions of the sites not realizing
how their personal information is being collected or to whom their
information is being sold.
19.
20. Pros of Data Mining
• Data mining ensures a company is collecting and analyzing reliable data. It
is often a more rigid, structured process that formally identifies a problem,
gathers data related to the problem, and strives to formulate a solution.
Therefore, data mining helps a business become more profitable, more
efficient, or operationally stronger.
• Data mining can look very different across applications, but the overall
process can be used with almost any new or legacy application. Essentially
any type of data can be gathered and analyzed, and almost every business
problem that relies on qualifiable evidence can be tackled using data
mining.
• The end goal of data mining is to take raw bits of information and
determine if there is cohesion or correlation among the data. This benefit of
data mining allows a company to create value with the information they
have on hand that would otherwise not be overly apparent. Though data
models can be complex, they can also yield fascinating results, unearth
hidden trends, and suggest unique strategies.
21. Cons of Data Mining
• This complexity of data mining is one of its greatest disadvantages. Data
analytics often requires technical skill sets and certain software tools. Smaller
companies may find this to be a barrier of entry too difficult to overcome.
• Data mining doesn't always guarantee results. A company may perform
statistical analysis, make conclusions based on strong data, implement
changes, and not reap any benefits. Through inaccurate findings, market
changes, model errors, or inappropriate data populations, data mining can
only guide decisions and not ensure outcomes.
• There is also a cost component to data mining. Data tools may require costly
subscriptions, and some bits of data may be expensive to obtain. Security
and privacy concerns can be pacified, though additional IT infrastructure may
be costly as well. Data mining may also be most effective when using huge
data sets; however, these data sets must be stored and require heavy
computational power to analyze.