SlideShare a Scribd company logo
By
Dr.T.GopiKrishna
Assistant Professor
Dept of Computer Sciene and Engineering
What is Data Mining?
Extracting and ‘Mining’ knowledge from large amounts of data(Big data).
(Or)
Non-trivial extraction of implicit, previously unknown and potentially useful
information from data.
(or)
Exploration & analysis, by automatic or semi-automatic means, of large
quantities of data in order to discover meaningful patterns.
“Gold Mining from rock or sand” is same as “Knowledge mining from data”
Other terms for Data Mining:
o Knowledge Mining
o Knowledge Extraction
o Pattern Analysis
o Data Archaeology
Data Mining is not same as KDD (Knowledge Discovery from Data)
Data Mining is a step in KDD
Why Mine Data? Commercial Viewpoint
Why Mine Data? Scientific Viewpoint
 There is often information “hidden” in the data that is
not readily evident.
 Human analysts may take weeks to discover useful information.
 Much of the data is never analyzed at all
• Huge Volume of data
• Major Sources of Abundant data: - Business – Web, E-commerce,
Transactions, Stocks - Science – Remote Sensing, Bio informatics,
Scientific Simulation - Society and Everyone – News, Digital Cameras,
You Tube
• Need for turning data into knowledge – Drowning in data, but starving
for knowledge
• Applications that use data mining: - Market Analysis - Fraud
Detection - Customer Retention - Production Control - Scientific
Exploration
• Data rich and information poor situation
Machine Learning/
Pattern
Recognition
Statistics/
AI
Data Mining
Database
systems
 Mostly reads
 Queries are long and complex
 Gb - Tb of data
 History
 Lots of scans
 Summarized, reconciled data
 Hundreds of users (e.g., decision-
makers, analysts)
Data Warehouse:-Data spread in several databases – physically
located at numerous sites Data warehouse – repository of multiple
DBs in single schema; resides at single site.
 Machine learning is a field of artificial intelligence that uses
statistical techniques to give computer systems the ability to "learn"
 Machine learning explores the study and construction
of algorithms that can learn from and make predictions on data.
 Machine learning is closely related to (and often overlaps
with) computational statistics, which also focuses on prediction-
making through the use of computers.
“Machine Learning is the science of getting
computers to learn and act like humans do,
and improve their learning over time in
autonomous fashion, by feeding them data
and information in the form of observations
and real-world interactions.”
 Statistics – “Learning from Data” or “Turning data into
information”.
 Data – Crude Information – Does not makes sense – What we
capture & store
e.g. customer data, store data, demographical data,
geographical data
 Information – relates items of data – relevant to the decision
problem
e.g. X lives in Z; S is Y years old; X and S moved; W has money
in Z
 Facts – Information becomes facts when data can support it
 Knowledge – What we know or infer – relates items of information
e.g. a quantity Q of product A is used in region Z; customers of
class L use N% of C in period D
 Databases
 Data Warehousing
 Statistics
 Machine Learning
 Information Retrieval
 Image and Signal Processing
 Pattern Recognition
 Neural Networks
 Data Visualization
 Spatial / Temporal Data Analysis
Database-oriented data sets and applications
o Relational database, data warehouse, transactional database
Advanced data sets and advanced applications
o Data streams and sensor data
o Time-series data, temporal data, sequence data (incl. bio-
sequences)
o Structure data, graphs, social networks and multi-linked data
 Object-relational databases
o Heterogeneous databases and legacy databases
o Spatial data and spatiotemporal data
o Multimedia database o Text databases
o The World-Wide Web
 Prediction Methods
Use some variables to predict unknown or
future values of other variables.
 Description Methods
Find human-interpretable patterns that
describe the data.
Classification [Predictive]
Clustering [Descriptive]
Association Rule Discovery [Descriptive]
Sequential Pattern Discovery [Descriptive]
Regression [Predictive]
Deviation Detection [Predictive]
 Data mining uncovers this in-depth
business intelligence by using
advanced analytical and modelling
techniques.
 With data mining, you can ask far
more sophisticated questions of your
data than you can with conventional
querying methods.
Data mining is simply the acquisition of
information that is already present in your
CRM (Customer Relationship Management
System) that is intended to be utilized for
marketing, customer service, customer
informative services and similar
applications.
 Data mining tools ease and automate the process of
discovering this kind of information from large stores of data.
 Data mining can identify patterns in company data,
for example, in records of supermarket purchases.
If, for example, customers buy product A and product B,
which product C are they most likely to buy as well?
Accurate answers to questions like these are invaluable
aids to marketing strategies.
 Data mining can identify the characteristics of a known
group of customers, for example, those who have a proven
record as poor credit risks.
 Relational Databases:
Consists of Database (inter related data) and set of software programs to
manage and access data.
Collection of tables
Each table has a set of attributes (columns / fields) and large set of tuples
(records or rows) .
 Transactional Databases:
Consists of a file with records where each record is a transaction.
Each transaction has a unique transaction ID and list of items that make
up transactions.
 Object-Relational Databases:
 Temporal Databases, Sequence Databases and Time-Series Databases
 Spatial Databases and Spatiotemporal Databases:
 Text Databases and Multimedia Databases:
 Heterogeneous Databases and Legacy Databases:
Stages of Data Mining Process
KDD Process
Brief explanation of data mining stages
There are several major data
mining techniques have been developing and
using in data mining projects recently
including
 association,
 classification,
 clustering,
 prediction,
 sequential patterns and
 decision tree.
Data Mining Techniques(Association)
 Association is one of the best-known data mining technique. In
association, a pattern is discovered based on a relationship
between items in the same transaction.
 That’s is the reason why association technique is also known
as relation technique. The association technique is used in market
basket analysis to identify a set of products that customers
frequently purchase together.
 Retailers are using association technique to research customer’s
buying habits. Based on historical sale data, retailers might find
out that customers always buy crisps when they buy beers, and,
therefore, they can put beers and crisps next to each other to save
time for the customer and increase sales.
Classification
 Classification is a classic data mining technique based on
machine learning. Basically, classification is used to classify
each item in a set of data into one of a predefined set of
classes or groups.
 Classification method makes use of mathematical techniques
such as decision trees, linear programming, neural network,
and statistics.
 In classification, we develop the software that can learn how
to classify the data items into groups.
For example, we can apply classification in the
application that “given all records of employees who left
the company, predict who will probably leave the
company in a future period.”
Clustering
 Clustering is a data mining technique that makes a
meaningful or useful cluster of objects which have
similar characteristics using the automatic technique.
 The clustering technique defines the classes and puts
objects in each class, while in the classification
techniques, objects are assigned into predefined
classes.
 To make the concept clearer, we can take book
management in the library as an example. In a
library, there is a wide range of books on various
topics available.
 The challenge is how to keep those books in a way
that readers can take several books on a particular
topic without hassle.
 By using the clustering technique, we can keep books
that have some kinds of similarities in one cluster or
one shelf and label it with a meaningful name. If
readers want to grab books in that topic, they would
only have to go to that shelf instead of looking for the
entire library.
Prediction
 The prediction, as its name implied, is one of a data
mining techniques that discovers the relationship
between independent variables and relationship
between dependent and independent variables.
 For instance, the prediction analysis technique can
be used in the sale to predict profit for the future if
we consider the sale is an independent variable,
profit could be a dependent variable.
 Then based on the historical sale and profit data,
we can draw a fitted regression curve that is used
for profit prediction.
Sequential Patterns
 Sequential patterns analysis is one of data mining
technique that seeks to discover or identify similar
patterns, regular events or trends in transaction data
over a business period.
 In sales, with historical transaction data, businesses
can identify a set of items that customers buy
together different times in a year.
 Then businesses can use this information to
recommend customers buy it with better deals based
on their purchasing frequency in the past.
Decision trees
The A decision tree is one of the most commonly used data mining
techniques because its model is easy to understand for users.
In decision tree technique, the root of the decision tree is a simple
question or condition that has multiple answers.
Each answer then leads to a set of questions or conditions that help us
determine the data so that we can make the final decision based on it.
For example, We use the following decision tree to determine whether
or not to play tennis:
Knowledge Representation
Knowledge representation is the presentation of
knowledge to the user for visualization in terms
of trees, tables, rules graphs, charts, matrices,
etc.
For Example: Histograms
Histograms
•Histogram provides the representation of a distribution of
values of a single attribute.
•It consists of a set of rectangles, that reflects the counts
or frequencies of the classes present in the given data.
Example: Histogram of an electricity bill generated for 4
months, as shown in diagram given below.
Data Visualization
It deals with the representation of data in a
graphical or pictorial format.
Patterns in the data are marked easily by
using the data visualization technique.
Pixel- oriented visualization technique
In pixel based visualization techniques, there
are separate sub-windows for the value of
each attribute and it is represented by one
colored pixel.
Pixel- oriented visualization technique
•The color mapping of the
pixel is decided on the basis
of data characteristics and
visualization tasks.
Geometric projection visualization
technique
i. Scatter-plot matrices
It consists of scatter plots of all possible pairs of variables in a dataset.
ii. Hyper slice
It is an extension to scatter-plot matrices. They represent multi-
dimensional
function as a matrix of orthogonal two dimensional slices.
iii. Parallel co-ordinates T he parallel vertical lines which are separated
defines the axes.
A point in the Cartesian coordinates corresponds to a polyline in parallel
coordinates.
3. Icon-based visualization techniques
Icon-based visualization techniques are also known as iconic display
techniques.
Each multidimensional data item is mapped to an icon.
This technique allows visualization of large amount of data.
The most commonly used technique is Chernoff faces.
Chernoff faces
For example: The face width, the length of the mouth and the length of
nose, etc. as shown in the following diagram.
Visualization techniques
Hierarchical visualization techniques
 Hierarchical visualization techniques are
used for partitioning of all dimensions in to
subset.
 These subsets are visualized in
hierarchical manner.
Some of the visualization techniques are:
i. Dimensional stacking In dimension stacking,
n-dimensional attribute space is partitioned in
2-dimensional subspaces.
Attribute values are partitioned into various classes.
Each element is two dimensional space in the form of xy
plot.
Helps to mark the important attributes and are used on
the outer level.
ii. Mosaic plotMosaic plot gives the graphical
representation of successive decompositions.
Rectangles are used to represent the count of
categorical data and at every stage, rectangles are split
parallel.
Tree maps visualization
 Techniques are well suited for displaying large amount of
hierarchical structured data.
 The visualization space is divided into the multiple rectangles
that are ordered, according to a quantitative variable.
 The levels in the hierarchy are seen as rectangles containing
the other rectangle.
 Each set of rectangles on the same level in the hierarchy
represents a category, a column or an expression in a data set.
 Visualization complex data and relations
 This technique is used to visualize non-numeric data.
For example: text, pictures, blog entries and product reviews.
Expert systems
Rely on domain experts for decision making - using their knowledge intuition
o Time consuming, costly, error prone, biased
So the solution is to use Data Mining tools
– performs data analysis,
- finds data patterns
Knowledge Base:
Domain knowledge is used to guide search – used to evaluate
interestingness of patterns.
Includes concept hierarchies, user benefits, thresholds, metadata
Database / Data warehouse Server:
Responsible for fetching relevant data based on data mining
request.
Data Mining Engine:
Consists of modules for characterization, association, correlation analysis,
classification, cluster analysis, prediction, outlier analysis and evolution
analysis.
Pattern Evaluation Module:
Interacts with data mining modules. Focuses the search
towards interesting patterns.
Pattern evaluation module may be integrated with mining module
to confine the search.
User Interface:
Communicates between users and data mining system
Specifies data mining query – to focus search
Uses intermediate data mining results to perform exploratory
Major Issues in Data Mining:
Mining Methodology Issues:
o Mining different kinds of knowledge in databases.
o Incorporation of background knowledge
o Handling noisy or incomplete data
o Pattern Evaluation – Interestingness Problem
User Interaction Issues:
o Interactive mining of knowledge at multiple levels of abstraction
o Data mining query languages and ad-hoc data mining.
o Presentation and visualization of data mining results.
Performance Issues:
o Efficiency and Scalability of Data Mining Algorithms.
o Parallel, distributed and incremental mining algorithms.
Issues related to diversity of data types:
o Handling of relational and complex types of data.
o Mining information from heterogeneous databases and global I
nformation systems.
Review Questions
1. What motivated Data Mining? Why is it
important?
2. What is Data Mining?
3. Explain the steps in the Knowledge Discovery
Process.
4. Detail on the Architecture of Data Mining
Systems with a suitable diagram.
5. Explain about various Data Mining functionalities
6. Discuss about the major issues in data mining.

More Related Content

What's hot

Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
Devakumar Jain
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learning
Akhilesh Joshi
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
 
Regression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machineRegression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machine
Dr. Radhey Shyam
 
Rdbms
RdbmsRdbms
Rdbms
rdbms
 
Logical Agents
Logical AgentsLogical Agents
Logical AgentsYasir Khan
 
Association rule Mining
Association rule MiningAssociation rule Mining
Association rule Mining
afsana40
 
DBMS and its Models
DBMS and its ModelsDBMS and its Models
DBMS and its Models
AhmadShah Sultani
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
moni sindhu
 
Relational algebra.pptx
Relational algebra.pptxRelational algebra.pptx
Relational algebra.pptx
RUpaliLohar
 
Nosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptxNosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptx
Radhika R
 
OLAP IN DATA MINING
OLAP IN DATA MININGOLAP IN DATA MINING
OLAP IN DATA MINING
wilifred
 
Dbms role advantages
Dbms role advantagesDbms role advantages
Dbms role advantagesjeancly
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Harri Kauhanen
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
Pongsakorn U-chupala
 
Data mining
Data miningData mining
Data mining
Ritesh Tiwari
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
Yanchang Zhao
 
Predicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksPredicting Flights with Azure Databricks
Predicting Flights with Azure Databricks
Sarah Dutkiewicz
 
Introduction to ai
Introduction to aiIntroduction to ai
Introduction to ai
Shiwani Gupta
 

What's hot (20)

Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learning
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Regression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machineRegression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machine
 
Rdbms
RdbmsRdbms
Rdbms
 
Logical Agents
Logical AgentsLogical Agents
Logical Agents
 
Association rule Mining
Association rule MiningAssociation rule Mining
Association rule Mining
 
DBMS and its Models
DBMS and its ModelsDBMS and its Models
DBMS and its Models
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
Relational algebra.pptx
Relational algebra.pptxRelational algebra.pptx
Relational algebra.pptx
 
Nosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptxNosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptx
 
OLAP IN DATA MINING
OLAP IN DATA MININGOLAP IN DATA MINING
OLAP IN DATA MINING
 
Dbms role advantages
Dbms role advantagesDbms role advantages
Dbms role advantages
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
 
Data mining
Data miningData mining
Data mining
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
 
Predicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksPredicting Flights with Azure Databricks
Predicting Flights with Azure Databricks
 
Introduction to ai
Introduction to aiIntroduction to ai
Introduction to ai
 

Similar to Week-1-Introduction to Data Mining.pptx

Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
Er. Nawaraj Bhandari
 
2 Data-mining process
2   Data-mining process2   Data-mining process
2 Data-mining process
Mahmoud Alfarra
 
Introduction of Data Science and Data Analytics
Introduction of Data Science and Data AnalyticsIntroduction of Data Science and Data Analytics
Introduction of Data Science and Data Analytics
VrushaliSolanke
 
notes_dmdw_chap1.docx
notes_dmdw_chap1.docxnotes_dmdw_chap1.docx
notes_dmdw_chap1.docx
Abshar Fatima
 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Bikramjit Sarkar, Ph.D.
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
hktripathy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
hktripathy
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
editorijettcs
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
editorijettcs
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
Hatem Magdy
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
thamizh arasi
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
Sanzid Kawsar
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
Basma Gamal
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
Sushil Kulkarni
 
Mining internal sources of data
Mining internal sources of dataMining internal sources of data
Mining internal sources of data
nomanbhutta
 
Data mining
Data miningData mining
Data mining
pradeepa n
 

Similar to Week-1-Introduction to Data Mining.pptx (20)

Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
2 Data-mining process
2   Data-mining process2   Data-mining process
2 Data-mining process
 
Data mining
Data miningData mining
Data mining
 
Introduction of Data Science and Data Analytics
Introduction of Data Science and Data AnalyticsIntroduction of Data Science and Data Analytics
Introduction of Data Science and Data Analytics
 
notes_dmdw_chap1.docx
notes_dmdw_chap1.docxnotes_dmdw_chap1.docx
notes_dmdw_chap1.docx
 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Data Mining
Data MiningData Mining
Data Mining
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
Abstract
AbstractAbstract
Abstract
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Mining internal sources of data
Mining internal sources of dataMining internal sources of data
Mining internal sources of data
 
Data mining
Data miningData mining
Data mining
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 

Recently uploaded

Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 

Recently uploaded (20)

Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 

Week-1-Introduction to Data Mining.pptx

  • 1. By Dr.T.GopiKrishna Assistant Professor Dept of Computer Sciene and Engineering
  • 2. What is Data Mining? Extracting and ‘Mining’ knowledge from large amounts of data(Big data). (Or) Non-trivial extraction of implicit, previously unknown and potentially useful information from data. (or) Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns. “Gold Mining from rock or sand” is same as “Knowledge mining from data” Other terms for Data Mining: o Knowledge Mining o Knowledge Extraction o Pattern Analysis o Data Archaeology Data Mining is not same as KDD (Knowledge Discovery from Data) Data Mining is a step in KDD
  • 3. Why Mine Data? Commercial Viewpoint
  • 4. Why Mine Data? Scientific Viewpoint
  • 5.  There is often information “hidden” in the data that is not readily evident.  Human analysts may take weeks to discover useful information.  Much of the data is never analyzed at all • Huge Volume of data • Major Sources of Abundant data: - Business – Web, E-commerce, Transactions, Stocks - Science – Remote Sensing, Bio informatics, Scientific Simulation - Society and Everyone – News, Digital Cameras, You Tube • Need for turning data into knowledge – Drowning in data, but starving for knowledge • Applications that use data mining: - Market Analysis - Fraud Detection - Customer Retention - Production Control - Scientific Exploration • Data rich and information poor situation
  • 6.
  • 8.  Mostly reads  Queries are long and complex  Gb - Tb of data  History  Lots of scans  Summarized, reconciled data  Hundreds of users (e.g., decision- makers, analysts) Data Warehouse:-Data spread in several databases – physically located at numerous sites Data warehouse – repository of multiple DBs in single schema; resides at single site.
  • 9.  Machine learning is a field of artificial intelligence that uses statistical techniques to give computer systems the ability to "learn"  Machine learning explores the study and construction of algorithms that can learn from and make predictions on data.  Machine learning is closely related to (and often overlaps with) computational statistics, which also focuses on prediction- making through the use of computers.
  • 10. “Machine Learning is the science of getting computers to learn and act like humans do, and improve their learning over time in autonomous fashion, by feeding them data and information in the form of observations and real-world interactions.”
  • 11.  Statistics – “Learning from Data” or “Turning data into information”.  Data – Crude Information – Does not makes sense – What we capture & store e.g. customer data, store data, demographical data, geographical data  Information – relates items of data – relevant to the decision problem e.g. X lives in Z; S is Y years old; X and S moved; W has money in Z  Facts – Information becomes facts when data can support it  Knowledge – What we know or infer – relates items of information e.g. a quantity Q of product A is used in region Z; customers of class L use N% of C in period D
  • 12.  Databases  Data Warehousing  Statistics  Machine Learning  Information Retrieval  Image and Signal Processing  Pattern Recognition  Neural Networks  Data Visualization  Spatial / Temporal Data Analysis
  • 13. Database-oriented data sets and applications o Relational database, data warehouse, transactional database Advanced data sets and advanced applications o Data streams and sensor data o Time-series data, temporal data, sequence data (incl. bio- sequences) o Structure data, graphs, social networks and multi-linked data  Object-relational databases o Heterogeneous databases and legacy databases o Spatial data and spatiotemporal data o Multimedia database o Text databases o The World-Wide Web
  • 14.  Prediction Methods Use some variables to predict unknown or future values of other variables.  Description Methods Find human-interpretable patterns that describe the data.
  • 15. Classification [Predictive] Clustering [Descriptive] Association Rule Discovery [Descriptive] Sequential Pattern Discovery [Descriptive] Regression [Predictive] Deviation Detection [Predictive]
  • 16.  Data mining uncovers this in-depth business intelligence by using advanced analytical and modelling techniques.  With data mining, you can ask far more sophisticated questions of your data than you can with conventional querying methods.
  • 17. Data mining is simply the acquisition of information that is already present in your CRM (Customer Relationship Management System) that is intended to be utilized for marketing, customer service, customer informative services and similar applications.
  • 18.  Data mining tools ease and automate the process of discovering this kind of information from large stores of data.  Data mining can identify patterns in company data, for example, in records of supermarket purchases. If, for example, customers buy product A and product B, which product C are they most likely to buy as well? Accurate answers to questions like these are invaluable aids to marketing strategies.  Data mining can identify the characteristics of a known group of customers, for example, those who have a proven record as poor credit risks.
  • 19.  Relational Databases: Consists of Database (inter related data) and set of software programs to manage and access data. Collection of tables Each table has a set of attributes (columns / fields) and large set of tuples (records or rows) .  Transactional Databases: Consists of a file with records where each record is a transaction. Each transaction has a unique transaction ID and list of items that make up transactions.  Object-Relational Databases:  Temporal Databases, Sequence Databases and Time-Series Databases  Spatial Databases and Spatiotemporal Databases:  Text Databases and Multimedia Databases:  Heterogeneous Databases and Legacy Databases:
  • 20. Stages of Data Mining Process
  • 22. Brief explanation of data mining stages
  • 23. There are several major data mining techniques have been developing and using in data mining projects recently including  association,  classification,  clustering,  prediction,  sequential patterns and  decision tree.
  • 24. Data Mining Techniques(Association)  Association is one of the best-known data mining technique. In association, a pattern is discovered based on a relationship between items in the same transaction.  That’s is the reason why association technique is also known as relation technique. The association technique is used in market basket analysis to identify a set of products that customers frequently purchase together.  Retailers are using association technique to research customer’s buying habits. Based on historical sale data, retailers might find out that customers always buy crisps when they buy beers, and, therefore, they can put beers and crisps next to each other to save time for the customer and increase sales.
  • 25. Classification  Classification is a classic data mining technique based on machine learning. Basically, classification is used to classify each item in a set of data into one of a predefined set of classes or groups.  Classification method makes use of mathematical techniques such as decision trees, linear programming, neural network, and statistics.  In classification, we develop the software that can learn how to classify the data items into groups. For example, we can apply classification in the application that “given all records of employees who left the company, predict who will probably leave the company in a future period.”
  • 26. Clustering  Clustering is a data mining technique that makes a meaningful or useful cluster of objects which have similar characteristics using the automatic technique.  The clustering technique defines the classes and puts objects in each class, while in the classification techniques, objects are assigned into predefined classes.  To make the concept clearer, we can take book management in the library as an example. In a library, there is a wide range of books on various topics available.  The challenge is how to keep those books in a way that readers can take several books on a particular topic without hassle.  By using the clustering technique, we can keep books that have some kinds of similarities in one cluster or one shelf and label it with a meaningful name. If readers want to grab books in that topic, they would only have to go to that shelf instead of looking for the entire library.
  • 27. Prediction  The prediction, as its name implied, is one of a data mining techniques that discovers the relationship between independent variables and relationship between dependent and independent variables.  For instance, the prediction analysis technique can be used in the sale to predict profit for the future if we consider the sale is an independent variable, profit could be a dependent variable.  Then based on the historical sale and profit data, we can draw a fitted regression curve that is used for profit prediction.
  • 28. Sequential Patterns  Sequential patterns analysis is one of data mining technique that seeks to discover or identify similar patterns, regular events or trends in transaction data over a business period.  In sales, with historical transaction data, businesses can identify a set of items that customers buy together different times in a year.  Then businesses can use this information to recommend customers buy it with better deals based on their purchasing frequency in the past.
  • 29. Decision trees The A decision tree is one of the most commonly used data mining techniques because its model is easy to understand for users. In decision tree technique, the root of the decision tree is a simple question or condition that has multiple answers. Each answer then leads to a set of questions or conditions that help us determine the data so that we can make the final decision based on it. For example, We use the following decision tree to determine whether or not to play tennis:
  • 30. Knowledge Representation Knowledge representation is the presentation of knowledge to the user for visualization in terms of trees, tables, rules graphs, charts, matrices, etc. For Example: Histograms
  • 31. Histograms •Histogram provides the representation of a distribution of values of a single attribute. •It consists of a set of rectangles, that reflects the counts or frequencies of the classes present in the given data. Example: Histogram of an electricity bill generated for 4 months, as shown in diagram given below.
  • 32. Data Visualization It deals with the representation of data in a graphical or pictorial format. Patterns in the data are marked easily by using the data visualization technique. Pixel- oriented visualization technique In pixel based visualization techniques, there are separate sub-windows for the value of each attribute and it is represented by one colored pixel.
  • 33. Pixel- oriented visualization technique •The color mapping of the pixel is decided on the basis of data characteristics and visualization tasks.
  • 34. Geometric projection visualization technique i. Scatter-plot matrices It consists of scatter plots of all possible pairs of variables in a dataset. ii. Hyper slice It is an extension to scatter-plot matrices. They represent multi- dimensional function as a matrix of orthogonal two dimensional slices. iii. Parallel co-ordinates T he parallel vertical lines which are separated defines the axes. A point in the Cartesian coordinates corresponds to a polyline in parallel coordinates. 3. Icon-based visualization techniques Icon-based visualization techniques are also known as iconic display techniques. Each multidimensional data item is mapped to an icon. This technique allows visualization of large amount of data. The most commonly used technique is Chernoff faces.
  • 35. Chernoff faces For example: The face width, the length of the mouth and the length of nose, etc. as shown in the following diagram.
  • 36. Visualization techniques Hierarchical visualization techniques  Hierarchical visualization techniques are used for partitioning of all dimensions in to subset.  These subsets are visualized in hierarchical manner.
  • 37. Some of the visualization techniques are: i. Dimensional stacking In dimension stacking, n-dimensional attribute space is partitioned in 2-dimensional subspaces. Attribute values are partitioned into various classes. Each element is two dimensional space in the form of xy plot. Helps to mark the important attributes and are used on the outer level. ii. Mosaic plotMosaic plot gives the graphical representation of successive decompositions. Rectangles are used to represent the count of categorical data and at every stage, rectangles are split parallel.
  • 38. Tree maps visualization  Techniques are well suited for displaying large amount of hierarchical structured data.  The visualization space is divided into the multiple rectangles that are ordered, according to a quantitative variable.  The levels in the hierarchy are seen as rectangles containing the other rectangle.  Each set of rectangles on the same level in the hierarchy represents a category, a column or an expression in a data set.  Visualization complex data and relations  This technique is used to visualize non-numeric data. For example: text, pictures, blog entries and product reviews.
  • 39. Expert systems Rely on domain experts for decision making - using their knowledge intuition o Time consuming, costly, error prone, biased So the solution is to use Data Mining tools – performs data analysis, - finds data patterns
  • 40.
  • 41. Knowledge Base: Domain knowledge is used to guide search – used to evaluate interestingness of patterns. Includes concept hierarchies, user benefits, thresholds, metadata Database / Data warehouse Server: Responsible for fetching relevant data based on data mining request. Data Mining Engine: Consists of modules for characterization, association, correlation analysis, classification, cluster analysis, prediction, outlier analysis and evolution analysis. Pattern Evaluation Module: Interacts with data mining modules. Focuses the search towards interesting patterns. Pattern evaluation module may be integrated with mining module to confine the search. User Interface: Communicates between users and data mining system Specifies data mining query – to focus search Uses intermediate data mining results to perform exploratory
  • 42. Major Issues in Data Mining: Mining Methodology Issues: o Mining different kinds of knowledge in databases. o Incorporation of background knowledge o Handling noisy or incomplete data o Pattern Evaluation – Interestingness Problem User Interaction Issues: o Interactive mining of knowledge at multiple levels of abstraction o Data mining query languages and ad-hoc data mining. o Presentation and visualization of data mining results. Performance Issues: o Efficiency and Scalability of Data Mining Algorithms. o Parallel, distributed and incremental mining algorithms. Issues related to diversity of data types: o Handling of relational and complex types of data. o Mining information from heterogeneous databases and global I nformation systems.
  • 43. Review Questions 1. What motivated Data Mining? Why is it important? 2. What is Data Mining? 3. Explain the steps in the Knowledge Discovery Process. 4. Detail on the Architecture of Data Mining Systems with a suitable diagram. 5. Explain about various Data Mining functionalities 6. Discuss about the major issues in data mining.