Data Mining and Big Data
Analytics
March 2023
Bahir Dar Institute of Technology
Bahir Dar University
Objectives of the course
 To introduce advanced concepts in data mining;
 To understand the strengths and limitations of various data
mining models;
 To provide hands-on experience in applying these concepts to
 real-world applications.
What is Data Mining
 Data mining (knowledge discovery from data)
 Extraction of interesting (non-trivial, implicit, previously unknown
and potentially useful) patterns or knowledge from huge amount
of data
 Data mining is a business process for exploring large amounts of
data to discover meaningful patterns and rules.
 Alternative names
 Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.
DM definition
 Data mining as business process
 DM is a business process that interacts with other processes in an
organization.
 Businesses that want to grow include data collection, data analysis for
long-term benefit and acting on it.
 Market research, customer relation management are compatible with
DM.
 Large amounts of Data
 How much is a lot of data?
 In the 1960s and 1970s data was scarce,
 The availability of computing power makes large amount of data an
advantage instead of being a handicap.
 Many of DM techniques work better large amounts of data than on small
amounts.
 Meaningful patterns and rules
 the goal of DM is not to find just any patterns in data, but to find patterns that
are useful for the business that can help routine business operations.
 For example : consider a call center that assigns colors for customers. Green
(for nice and loyal customer), Yellow( customer probably valuable but also has
some risk), Red(don’t give any treatment because the customer is highly risky).
Finding patterns, in this case, means targeting retention campaigns to
customers who are most likely to leave.
 Companies develop business models using DM.
A company helps retailers make recommendations on the web and gets
paid when web shoppers click on its recommendations.
LinkedIn provide premium services by aggregating data to get a more
complete customer picture. By doing so, recruiters can get right candidate
for their vacant positions.
In all these cases, the goal is to direct products and services to the people
who are most likely to need them, making the process of buying and
selling more efficient for everyone involved.
Knowledge Discovery from Database(KDD)
The knowledge Discovery from Database (Fayyad et.al.)
DM and CRM
 Forward looking companies are moving toward the goal of understanding each
customer individually to understand the value of each customer so that thy know
which ones are worth investing money and effort to hold on, and which one to let
them go.
 Turning a product-oriented organization into a customer-centric one takes more
than data mining.
 In a narrow sense, DM is a collection of tools and techniques which is one of the
many technologies required to support customer-centric enterprise.
 In a broader sense DM is an attitude that business action should be based on
learning, that informed decisions are better than uninformed decisions.
 DM is also a process and a methodology for applying analytic tools and
techniques.
 To form a learning relationship with customers, a company must be able to:
 Notice what its customers are doing
 Remember what it and its customers have done over time
 Learn from what it has remembered
 Act on what it has learned to make customers more profitable
 Learning can’t take place in a vacuum. Hence :
Transaction
Processing System
Data Warehouse
Data Mining
Captures customer
interactions
Store historical customer
behavior information
Translates history into plans
for future action and
customer strategy
9
Integration of Multiple Technologies
Machine
Learning
Database
Management
Artificial
Intelligence
Statistics
Data
Mining
Visualization
Algorithms
Data Mining: Confluence of Multiple Disciplines
Data Mining
Database
Systems
Statistics
Other
Disciplines
Algorithm
Machine
Learning
Visualization
DM techniques
 Descriptive mining tasks characterize properties of the
data in a target data set. For example: data
summarization, clustering, association rules, sequence
pattern analysis
 Predictive mining tasks perform induction on the current
data in order to make predictions. For example
classification,
Why Data Mining Now
 DM has caught attention since the 1990s due to the
convergence of several factors:
Data is being produced
Data is being warehoused
Computing power is affordable.
Interest in customer relationship management is strong
Commercial DM software products are readily available
 Data is being produced
 Data mining makes the most sense where large volumes of data are available. In
fact, most data mining algorithms require somewhat large amounts of data to
build and train models..
 A single person browsing a website can generate tens of kilobytes of data in a
day.
 Telephone companies and credit card companies were the first to work with
terabyte-sized databases.
 Data is available, and in large volumes, but how do you make any sense out of it?
Why Data Mining Now
 Data is being warehoused
 Data warehousing brings together data from many different sources in a common
format with consistent definitions for keys and fields.
 the data warehouse should be designed exclusively for decision support, which
can simplify the job of the data miner.
 Computing power is affordable
 DM algorithms require multiple passes over huge quantities of data which is
computationally expensive. The dramatic decline in the prices of computer
hardware has made the once-costly techniques reachable to ordinary business
 Interest in CRM
 Many companies are moving to customer centric business model and customer
information is one of the key assets.
 Commercial DM software products are readily available
 Many DM algorithms are incorporated with commercial DM software.
Why Data Mining Now
What kind of data can be mined?
 Basic data for database application are:
 Database data
 Data warehouse data
 Transactional data
 Data streams
 Ordered sequence data
 Graph or networked data
 Spatial data
 Text data
 Multimedia data
 And the WWW
Skills needed for data miner
 Statistical skill
 Familiarity with DM techniques. Understanding when and how to use is
important
 An attitude to work with large data sets, data warehouse, and analytic
sandboxes
 Ability to work with other people, communicate results and recognize what
is really needed are critical skills for data miner

Introduction-Advanced concepts in Data Mining_ITMsc.pptx

  • 1.
    Data Mining andBig Data Analytics March 2023 Bahir Dar Institute of Technology Bahir Dar University
  • 2.
    Objectives of thecourse  To introduce advanced concepts in data mining;  To understand the strengths and limitations of various data mining models;  To provide hands-on experience in applying these concepts to  real-world applications.
  • 3.
    What is DataMining  Data mining (knowledge discovery from data)  Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data  Data mining is a business process for exploring large amounts of data to discover meaningful patterns and rules.  Alternative names  Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.
  • 4.
    DM definition  Datamining as business process  DM is a business process that interacts with other processes in an organization.  Businesses that want to grow include data collection, data analysis for long-term benefit and acting on it.  Market research, customer relation management are compatible with DM.  Large amounts of Data  How much is a lot of data?  In the 1960s and 1970s data was scarce,  The availability of computing power makes large amount of data an advantage instead of being a handicap.  Many of DM techniques work better large amounts of data than on small amounts.
  • 5.
     Meaningful patternsand rules  the goal of DM is not to find just any patterns in data, but to find patterns that are useful for the business that can help routine business operations.  For example : consider a call center that assigns colors for customers. Green (for nice and loyal customer), Yellow( customer probably valuable but also has some risk), Red(don’t give any treatment because the customer is highly risky). Finding patterns, in this case, means targeting retention campaigns to customers who are most likely to leave.  Companies develop business models using DM. A company helps retailers make recommendations on the web and gets paid when web shoppers click on its recommendations. LinkedIn provide premium services by aggregating data to get a more complete customer picture. By doing so, recruiters can get right candidate for their vacant positions. In all these cases, the goal is to direct products and services to the people who are most likely to need them, making the process of buying and selling more efficient for everyone involved.
  • 6.
    Knowledge Discovery fromDatabase(KDD) The knowledge Discovery from Database (Fayyad et.al.)
  • 7.
    DM and CRM Forward looking companies are moving toward the goal of understanding each customer individually to understand the value of each customer so that thy know which ones are worth investing money and effort to hold on, and which one to let them go.  Turning a product-oriented organization into a customer-centric one takes more than data mining.  In a narrow sense, DM is a collection of tools and techniques which is one of the many technologies required to support customer-centric enterprise.  In a broader sense DM is an attitude that business action should be based on learning, that informed decisions are better than uninformed decisions.  DM is also a process and a methodology for applying analytic tools and techniques.
  • 8.
     To forma learning relationship with customers, a company must be able to:  Notice what its customers are doing  Remember what it and its customers have done over time  Learn from what it has remembered  Act on what it has learned to make customers more profitable  Learning can’t take place in a vacuum. Hence : Transaction Processing System Data Warehouse Data Mining Captures customer interactions Store historical customer behavior information Translates history into plans for future action and customer strategy
  • 9.
    9 Integration of MultipleTechnologies Machine Learning Database Management Artificial Intelligence Statistics Data Mining Visualization Algorithms
  • 10.
    Data Mining: Confluenceof Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization
  • 11.
  • 12.
     Descriptive miningtasks characterize properties of the data in a target data set. For example: data summarization, clustering, association rules, sequence pattern analysis  Predictive mining tasks perform induction on the current data in order to make predictions. For example classification,
  • 13.
    Why Data MiningNow  DM has caught attention since the 1990s due to the convergence of several factors: Data is being produced Data is being warehoused Computing power is affordable. Interest in customer relationship management is strong Commercial DM software products are readily available
  • 14.
     Data isbeing produced  Data mining makes the most sense where large volumes of data are available. In fact, most data mining algorithms require somewhat large amounts of data to build and train models..  A single person browsing a website can generate tens of kilobytes of data in a day.  Telephone companies and credit card companies were the first to work with terabyte-sized databases.  Data is available, and in large volumes, but how do you make any sense out of it? Why Data Mining Now
  • 15.
     Data isbeing warehoused  Data warehousing brings together data from many different sources in a common format with consistent definitions for keys and fields.  the data warehouse should be designed exclusively for decision support, which can simplify the job of the data miner.  Computing power is affordable  DM algorithms require multiple passes over huge quantities of data which is computationally expensive. The dramatic decline in the prices of computer hardware has made the once-costly techniques reachable to ordinary business  Interest in CRM  Many companies are moving to customer centric business model and customer information is one of the key assets.  Commercial DM software products are readily available  Many DM algorithms are incorporated with commercial DM software. Why Data Mining Now
  • 16.
    What kind ofdata can be mined?  Basic data for database application are:  Database data  Data warehouse data  Transactional data  Data streams  Ordered sequence data  Graph or networked data  Spatial data  Text data  Multimedia data  And the WWW
  • 17.
    Skills needed fordata miner  Statistical skill  Familiarity with DM techniques. Understanding when and how to use is important  An attitude to work with large data sets, data warehouse, and analytic sandboxes  Ability to work with other people, communicate results and recognize what is really needed are critical skills for data miner