SlideShare a Scribd company logo
DATA
MINING
PRESENTED BY:
KINZA RAZZAQ
BSIT-13-F072
Supervised
vs.
Unsupervised
Learning
A brief
introduction to
Data Mining
AGENDA
What can Data
Mining do
“There are things that we know that we know(Known
knowns)…
There are things that we know that we
don’t know(Known unknowns)…
There are things that we don’t know
we don’t know(Unknown unknowns)…
There are things that we don’t
know we know(Unknown knowns)”
“There are things that we know that we know(Known
knowns)…
There are things that we know that we
don’t know(Known unknowns)…
There are things that we don’t know
we don’t know(Unknown unknowns)…
There are things that we don’t
know we know(Unknown knowns)”
Data mining has relevance to the fourth point in
red.
It is an art of digging out what exactly we don’t
know that we must know in our business.
The methodology is to first convert “unknown
unknowns” into “known unknowns” and then
finally to “known knowns”.
DATA WAREHOUSING
VS.
DATA MINING
Data Warehousing provides the
Enterprise with a memory
Data Mining provides the
Enterprise with intelligence
Data Mining works with Data
Warehouse
What is Data Mining?
• Knowledge Discovery in Databases (KDD).
• Data mining digs out valuable, non-trivial
information from large multidimensional apparently
unrelated data base.
• It’s the integration of business knowledge, people,
information, algorithms, statistics and computing
technology.
• Finding useful hidden patterns and relationships in
data.
Why Data Mining???
HUGE VOLUME- THERE IS WAY TOO MUCH DATA &
GROWING!
Bridging
the gap
Supply &
Demand
To
minimize
the
volume
Example of growing DATA
• Data collected much faster than it can be
processed or managed. NASA Earth Observation
System (EOS), alone, collected 15 Peta bytes by
2007 (15,000,000,000,000,000 bytes).
• Much of which won't be used - ever!
• Much of which won't be seen - ever!
• Why not?
• There's so much volume, usefulness of some of
it will never be discovered
Solution to the Problem of Growing
Data
Reduce the volume and/or raise the information
content by structuring, querying, filtering,
summarizing, aggregating, and mining the data.
Claude Shannon's info. theory
More volume, less information
Bridging
the gap
Supply &
Demand
To
minimize
the
volume
Decision Support
The next is the level where machine
supports decision making process by
helping in selecting appropriate
pre-defined rules.
Knowledge
Next is the level where the
machine discovers and learns
rules.
Information
In the next level is the
aggregate/summarized data.
Indexed Data
We have found short cuts, to
reach desired points in the
voluminous data sea, rather than
conventional scanning.
Raw Data
Raw data having maximum
volume
Amount of digital data recording and storage
exploded during the past decade
BUT
number of scientists, engineers, and analysts
available to analyze the data has not
grown correspondingly.
Bridging
the gap
Supply &
Demand
To
minimize
the
volume
• Limitations of OLTP systems
• Massive data sets
• high dimensionality
• new data types
• multiple heterogeneous data resources
The conventional systems couldn’t keep pace with the
ever changing and increasing data sets
• Data mining algorithms are built
Bridging
the gap
Supply &
Demand
To
minimize
the
volume
How Data Mining is different?
▪ Data Warehouses (Data-driven exploration)
 Data Mining (Knowledge-driven exploration)
 Traditional Database (Transactions)
 Knowledge Discovery (KDD)
Data Mining Vs. Statistics
Formal statistical inference is assumption driven
i.e. a hypothesis is formed and validated against
the data.
Data mining is discovery driven i.e. patterns and
hypothesis are automatically extracted from
data.
Knowledge extraction using statistics
Inflation Vs Stock inedx increase
0
10
20
30
40
1.6 1.7 1.8 1.85 1.9 1.95 2 2.9 3 3.3 4.2 4.4 5 6
Inflation (%)
Stockincrease
(%)
Q: What will be the stock increase when inflation is 6%?
A: Model non-linear relationship using a line y = mx + c.
Hence answer is 13%
0
10000
20000
30000
40000
50000
60000
70000
0 5 10 15 20 25 30 35
y = -0.0127x6 + 1.5029x5 - 63.627x4 + 1190.3x3 - 9725.3x2 + 31897x - 29263
-10000
0
10000
20000
30000
40000
50000
60000
70000
0 5 10 15 20 25 30 35
Failure of regression models
Data Mining is…
• Decision Trees
If. . . . .
Then. . .
• Rule Induction
• Clustering
• Genetic Algorithms
• Neural Networks
Supervised
vs.
Unsupervised
Learning
A brief
introduction to
Data Mining
What can Data
Mining do
What can Data Mining Do
Classification
Estimation
Prediction
Market
Basket
Analysis
Clustering
Description
What can Data Mining Do
Classification
Estimation
Prediction
Market
Basket
Analysis
Clustering
Description
What can Data Mining Do
Classification
Estimation
Prediction
Market
Basket
Analysis
Clustering
Description
What can Data Mining Do
Classification
Estimation
Prediction
Market
Basket
Analysis
Clustering
Description
98% of people who purchased items A and B
also purchased item C
What can Data Mining Do
Classification
Estimation
Prediction
Market
Basket
Analysis
Clustering
Description
segmenting a
heterogeneous
population into a
number of more
homogenous sub-
groups or clusters
How many clusters?
How many clusters, now?
How many clusters, finally?
What can Data Mining Do
Classification
Estimation
Prediction
Market
Basket
Analysis
Clustering
Description
To know what is
happening in our
databases is
Beneficial, move the
cube in different
angles to get to
the information of
interest
Comparing Methods
Accuracy
Speed
Robustness
Scalability
Interpretability
Comparing Methods
Accuracy
Speed
Robustness
Scalability
Interpretability
Comparing Methods
Accuracy
Speed
Robustness
Scalability
Interpretability
Comparing Methods
Accuracy
Speed
Robustness
Scalability
Interpretability
Comparing Methods
Accuracy
Speed
Robustness
Scalability
Interpretability
Data mining: the core of
knowledge discovery process.
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
Where does Data Mining fits
in?
Supervised vs.
Unsupervised
Learning
A brief
introduction to
Data Mining
What can Data
Mining do
Example
Example
Data Structures in Data Mining
• Data matrix
– Table or database
– n records and m
attributes,
– n >> m
C1,1 C1,2 C1,3 C1,m
C2,1 C2,2 C2,3 C2,m
C3,1 C3,2 C3,3 C3,m
Cn,1 Cn,2 Cn,3 Cn,m
…
.
.
.
…
.
.
.
1 S1,2 S1,3 S1,n
S2,1 1 S2,3 S2,n
S3,1 S3,2 1 S3,n
Sn,1 Sn,2 Sn,3 1
…
.
.
.
…
.
.
.
• Similarity matrix
– Symmetric square matrix
– n x n or m x m
Main types of DATA MINING
Supervised
• Bayesian Modeling
• Decision Trees
• Neural Networks
• Etc.
Unsupervised
• One-way Clustering
• Two-way Clustering
Type and number of
classes are NOT
known in advance
Type and number of
classes are known in
advance
Clustering: Min-Max Distance
Age
Salary
20 40 60
outlier Inter-cluster
distances are
maximized
Intra-cluster
distances are
minimized
One-way clustering example
INPUT OUTPUT
Black spots
are noise
White spots
are missing
data
Data Mining Agriculture data
INPUT Clustered OUTPUT
clusters
Created a similarity matrix using farm area, cotton variety
and pesticide used
Which class?
Classifier (model)
Unseen Data
Classification
Output
Confidence
Level (accuracy)
Inputs
How Classification work?
Classification: Model Construction
Training
Data
NAME Time Items Gender
Moin 10 2 M
Munir 16 3 M
Meher 15 1 F
Javed 5 1 M
Mahin 20 1 F
Akram 20 4 M
Classification
Algorithms
IF time/items >= 6
THEN gender = ‘F’
Classifier
(Model)
(observations, measurements, etc.)
Relationship between shopping time and items bought
Classification : Use in Prediction
Testing
Data Unseen Data
(Addan, Time= 15 Items = 1)
Classifier
Gender?
NAME Time Items Gender
Tahir 20 1 M
Younas 11 2 M
Yasin 3 1 M
Clustering vs. Cluster Detection
• In one-way clustering, reordering of rows (or
columns) assembles clusters.
• If the clusters are NOT assembled, they are very
difficult to detect
First you cluster your data and then detect
clusters in the clustered data
Example
A B
The K-Means Clustering
k-means clustering aims to partition ‘n’ observations
into ‘k’ clusters in which each observation belongs to
the cluster with the nearest mean.
k-means algorithm is implemented in
4 steps
1
2
3
4
k-means algorithm is implemented in
4 steps
1
k-means algorithm is implemented in
4 steps
2
k-means algorithm is implemented in
4 steps
3
k-means algorithm is implemented in
4 steps
4
Go back to Step 2,
stop when no more
new assignment
Example
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
A B
D C
Data Mining is FRUITFUL..!!
Data mining
Data mining

More Related Content

What's hot

Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
Dr. Abdul Ahad Abro
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Jason Rodrigues
 
Data Mining
Data MiningData Mining
Data Mining
SHIKHA GAUTAM
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
neelamoberoi1030
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
DHIVYADEVAKI
 
Data warehouse
Data warehouseData warehouse
Data warehouse
krishna kumar singh
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
Seerat Malik
 
Data mining
Data mining Data mining
Data mining
AthiraR23
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & Applications
Fazle Rabbi Ador
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseShanthi Mukkavilli
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
Dr-Dipali Meher
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
Zalpa Rathod
 
Data mining
Data miningData mining
Data mining
Annies Minu
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
Institute of Technology Telkom
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Edureka!
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Gajanand Sharma
 

What's hot (20)

Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Mining
Data MiningData Mining
Data Mining
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Introduction data mining
Introduction data miningIntroduction data mining
Introduction data mining
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
 
Data mining
Data mining Data mining
Data mining
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & Applications
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Data mining
Data miningData mining
Data mining
 
Introduction
IntroductionIntroduction
Introduction
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 

Similar to Data mining

lec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptxlec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptx
AmjadAlDgour
 
Data Driven Disruption - Why Marketing and Advertising in WA lags - ADMA WA 2...
Data Driven Disruption - Why Marketing and Advertising in WA lags - ADMA WA 2...Data Driven Disruption - Why Marketing and Advertising in WA lags - ADMA WA 2...
Data Driven Disruption - Why Marketing and Advertising in WA lags - ADMA WA 2...
Coert Du Plessis (杜康)
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and butest
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
Mahmoud Alfarra
 
Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .ppt
SangrangBargayary3
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
PadmajaLaksh
 
Data mining
Data miningData mining
Data mining
heba_ahmad
 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Bikramjit Sarkar, Ph.D.
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
DeepaR42
 
DM-Unit-1-Part 1-R.pdf
DM-Unit-1-Part 1-R.pdfDM-Unit-1-Part 1-R.pdf
DM-Unit-1-Part 1-R.pdf
ssuserb933d8
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
COSTARCH Analytical Consulting (P) Ltd.
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
VijayasankariS
 
Data mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid languageData mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid language
q-Maxim
 
Unit 1.pptx
Unit 1.pptxUnit 1.pptx
Unit 1.pptx
DrThenmozhiSPESUMCA
 
Data mining final year project in ludhiana
Data mining final year project in ludhianaData mining final year project in ludhiana
Data mining final year project in ludhiana
deepikakaler1
 
Data mining final year project in jalandhar
Data mining final year project in jalandharData mining final year project in jalandhar
Data mining final year project in jalandhar
deepikakaler1
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining Technique
Mehmet Beyaz
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
CodePolitan
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
hktripathy
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
Hatem Magdy
 

Similar to Data mining (20)

lec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptxlec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptx
 
Data Driven Disruption - Why Marketing and Advertising in WA lags - ADMA WA 2...
Data Driven Disruption - Why Marketing and Advertising in WA lags - ADMA WA 2...Data Driven Disruption - Why Marketing and Advertising in WA lags - ADMA WA 2...
Data Driven Disruption - Why Marketing and Advertising in WA lags - ADMA WA 2...
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .ppt
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
Data mining
Data miningData mining
Data mining
 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 
DM-Unit-1-Part 1-R.pdf
DM-Unit-1-Part 1-R.pdfDM-Unit-1-Part 1-R.pdf
DM-Unit-1-Part 1-R.pdf
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
 
Data mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid languageData mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid language
 
Unit 1.pptx
Unit 1.pptxUnit 1.pptx
Unit 1.pptx
 
Data mining final year project in ludhiana
Data mining final year project in ludhianaData mining final year project in ludhiana
Data mining final year project in ludhiana
 
Data mining final year project in jalandhar
Data mining final year project in jalandharData mining final year project in jalandhar
Data mining final year project in jalandhar
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining Technique
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 

More from Kinza Razzaq

Leadership in technology
Leadership in technologyLeadership in technology
Leadership in technology
Kinza Razzaq
 
Governance Analysis using enterprise architecture
Governance Analysis using enterprise architectureGovernance Analysis using enterprise architecture
Governance Analysis using enterprise architecture
Kinza Razzaq
 
Risk Management
Risk ManagementRisk Management
Risk Management
Kinza Razzaq
 
Ipv4 and Ipv6
Ipv4 and Ipv6Ipv4 and Ipv6
Ipv4 and Ipv6
Kinza Razzaq
 
Internet wan
Internet wanInternet wan
Internet wan
Kinza Razzaq
 
The internet protocols and OSI Model
The internet protocols and OSI ModelThe internet protocols and OSI Model
The internet protocols and OSI Model
Kinza Razzaq
 
HDLC and Point to point protocol
HDLC and Point to point protocolHDLC and Point to point protocol
HDLC and Point to point protocol
Kinza Razzaq
 
Operating system
Operating systemOperating system
Operating system
Kinza Razzaq
 
Point to point interconnect
Point to point interconnectPoint to point interconnect
Point to point interconnect
Kinza Razzaq
 
Recruitment and selection
Recruitment and selectionRecruitment and selection
Recruitment and selection
Kinza Razzaq
 

More from Kinza Razzaq (10)

Leadership in technology
Leadership in technologyLeadership in technology
Leadership in technology
 
Governance Analysis using enterprise architecture
Governance Analysis using enterprise architectureGovernance Analysis using enterprise architecture
Governance Analysis using enterprise architecture
 
Risk Management
Risk ManagementRisk Management
Risk Management
 
Ipv4 and Ipv6
Ipv4 and Ipv6Ipv4 and Ipv6
Ipv4 and Ipv6
 
Internet wan
Internet wanInternet wan
Internet wan
 
The internet protocols and OSI Model
The internet protocols and OSI ModelThe internet protocols and OSI Model
The internet protocols and OSI Model
 
HDLC and Point to point protocol
HDLC and Point to point protocolHDLC and Point to point protocol
HDLC and Point to point protocol
 
Operating system
Operating systemOperating system
Operating system
 
Point to point interconnect
Point to point interconnectPoint to point interconnect
Point to point interconnect
 
Recruitment and selection
Recruitment and selectionRecruitment and selection
Recruitment and selection
 

Recently uploaded

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 

Recently uploaded (20)

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 

Data mining

  • 3. “There are things that we know that we know(Known knowns)… There are things that we know that we don’t know(Known unknowns)… There are things that we don’t know we don’t know(Unknown unknowns)… There are things that we don’t know we know(Unknown knowns)”
  • 4. “There are things that we know that we know(Known knowns)… There are things that we know that we don’t know(Known unknowns)… There are things that we don’t know we don’t know(Unknown unknowns)… There are things that we don’t know we know(Unknown knowns)”
  • 5. Data mining has relevance to the fourth point in red. It is an art of digging out what exactly we don’t know that we must know in our business. The methodology is to first convert “unknown unknowns” into “known unknowns” and then finally to “known knowns”.
  • 7. Data Warehousing provides the Enterprise with a memory Data Mining provides the Enterprise with intelligence Data Mining works with Data Warehouse
  • 8. What is Data Mining? • Knowledge Discovery in Databases (KDD). • Data mining digs out valuable, non-trivial information from large multidimensional apparently unrelated data base. • It’s the integration of business knowledge, people, information, algorithms, statistics and computing technology. • Finding useful hidden patterns and relationships in data.
  • 9.
  • 11. HUGE VOLUME- THERE IS WAY TOO MUCH DATA & GROWING! Bridging the gap Supply & Demand To minimize the volume
  • 12. Example of growing DATA • Data collected much faster than it can be processed or managed. NASA Earth Observation System (EOS), alone, collected 15 Peta bytes by 2007 (15,000,000,000,000,000 bytes). • Much of which won't be used - ever! • Much of which won't be seen - ever! • Why not? • There's so much volume, usefulness of some of it will never be discovered
  • 13. Solution to the Problem of Growing Data Reduce the volume and/or raise the information content by structuring, querying, filtering, summarizing, aggregating, and mining the data.
  • 14. Claude Shannon's info. theory More volume, less information Bridging the gap Supply & Demand To minimize the volume
  • 15. Decision Support The next is the level where machine supports decision making process by helping in selecting appropriate pre-defined rules. Knowledge Next is the level where the machine discovers and learns rules. Information In the next level is the aggregate/summarized data. Indexed Data We have found short cuts, to reach desired points in the voluminous data sea, rather than conventional scanning. Raw Data Raw data having maximum volume
  • 16. Amount of digital data recording and storage exploded during the past decade BUT number of scientists, engineers, and analysts available to analyze the data has not grown correspondingly. Bridging the gap Supply & Demand To minimize the volume
  • 17. • Limitations of OLTP systems • Massive data sets • high dimensionality • new data types • multiple heterogeneous data resources The conventional systems couldn’t keep pace with the ever changing and increasing data sets • Data mining algorithms are built Bridging the gap Supply & Demand To minimize the volume
  • 18. How Data Mining is different? ▪ Data Warehouses (Data-driven exploration)  Data Mining (Knowledge-driven exploration)  Traditional Database (Transactions)  Knowledge Discovery (KDD)
  • 19. Data Mining Vs. Statistics Formal statistical inference is assumption driven i.e. a hypothesis is formed and validated against the data. Data mining is discovery driven i.e. patterns and hypothesis are automatically extracted from data.
  • 20. Knowledge extraction using statistics Inflation Vs Stock inedx increase 0 10 20 30 40 1.6 1.7 1.8 1.85 1.9 1.95 2 2.9 3 3.3 4.2 4.4 5 6 Inflation (%) Stockincrease (%) Q: What will be the stock increase when inflation is 6%? A: Model non-linear relationship using a line y = mx + c. Hence answer is 13%
  • 21. 0 10000 20000 30000 40000 50000 60000 70000 0 5 10 15 20 25 30 35 y = -0.0127x6 + 1.5029x5 - 63.627x4 + 1190.3x3 - 9725.3x2 + 31897x - 29263 -10000 0 10000 20000 30000 40000 50000 60000 70000 0 5 10 15 20 25 30 35 Failure of regression models
  • 22. Data Mining is… • Decision Trees If. . . . . Then. . . • Rule Induction • Clustering • Genetic Algorithms • Neural Networks
  • 24. What can Data Mining Do Classification Estimation Prediction Market Basket Analysis Clustering Description
  • 25. What can Data Mining Do Classification Estimation Prediction Market Basket Analysis Clustering Description
  • 26. What can Data Mining Do Classification Estimation Prediction Market Basket Analysis Clustering Description
  • 27. What can Data Mining Do Classification Estimation Prediction Market Basket Analysis Clustering Description 98% of people who purchased items A and B also purchased item C
  • 28. What can Data Mining Do Classification Estimation Prediction Market Basket Analysis Clustering Description segmenting a heterogeneous population into a number of more homogenous sub- groups or clusters
  • 31. How many clusters, finally?
  • 32. What can Data Mining Do Classification Estimation Prediction Market Basket Analysis Clustering Description To know what is happening in our databases is Beneficial, move the cube in different angles to get to the information of interest
  • 38. Data mining: the core of knowledge discovery process. Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation Where does Data Mining fits in?
  • 39. Supervised vs. Unsupervised Learning A brief introduction to Data Mining What can Data Mining do
  • 42. Data Structures in Data Mining • Data matrix – Table or database – n records and m attributes, – n >> m C1,1 C1,2 C1,3 C1,m C2,1 C2,2 C2,3 C2,m C3,1 C3,2 C3,3 C3,m Cn,1 Cn,2 Cn,3 Cn,m … . . . … . . . 1 S1,2 S1,3 S1,n S2,1 1 S2,3 S2,n S3,1 S3,2 1 S3,n Sn,1 Sn,2 Sn,3 1 … . . . … . . . • Similarity matrix – Symmetric square matrix – n x n or m x m
  • 43. Main types of DATA MINING Supervised • Bayesian Modeling • Decision Trees • Neural Networks • Etc. Unsupervised • One-way Clustering • Two-way Clustering Type and number of classes are NOT known in advance Type and number of classes are known in advance
  • 44. Clustering: Min-Max Distance Age Salary 20 40 60 outlier Inter-cluster distances are maximized Intra-cluster distances are minimized
  • 45. One-way clustering example INPUT OUTPUT Black spots are noise White spots are missing data
  • 46. Data Mining Agriculture data INPUT Clustered OUTPUT clusters Created a similarity matrix using farm area, cotton variety and pesticide used
  • 47.
  • 50. Classification: Model Construction Training Data NAME Time Items Gender Moin 10 2 M Munir 16 3 M Meher 15 1 F Javed 5 1 M Mahin 20 1 F Akram 20 4 M Classification Algorithms IF time/items >= 6 THEN gender = ‘F’ Classifier (Model) (observations, measurements, etc.) Relationship between shopping time and items bought
  • 51. Classification : Use in Prediction Testing Data Unseen Data (Addan, Time= 15 Items = 1) Classifier Gender? NAME Time Items Gender Tahir 20 1 M Younas 11 2 M Yasin 3 1 M
  • 52. Clustering vs. Cluster Detection • In one-way clustering, reordering of rows (or columns) assembles clusters. • If the clusters are NOT assembled, they are very difficult to detect First you cluster your data and then detect clusters in the clustered data
  • 54. The K-Means Clustering k-means clustering aims to partition ‘n’ observations into ‘k’ clusters in which each observation belongs to the cluster with the nearest mean.
  • 55. k-means algorithm is implemented in 4 steps 1 2 3 4
  • 56. k-means algorithm is implemented in 4 steps 1
  • 57. k-means algorithm is implemented in 4 steps 2
  • 58. k-means algorithm is implemented in 4 steps 3
  • 59. k-means algorithm is implemented in 4 steps 4 Go back to Step 2, stop when no more new assignment
  • 60. Example 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 A B D C
  • 61. Data Mining is FRUITFUL..!!