SlideShare a Scribd company logo
1
From Data to Wisdom
 Data
 The raw material of
information
 Information
 Data organized and
presented by someone
 Knowledge
 Information read, heard or
seen and understood and
integrated
 Wisdom
 Distilled knowledge and
understanding which can
lead to decisions
Wisdom
Knowledge
Information
Data
The Information Hierarchy
Why Data Mining?
The Explosive Growth of Data: from terabytes to
petabytes
Data collection and data availability
Automated data collection tools, database systems, Web,
computerized society
Major sources of abundant data
Business: Web, e-commerce, transactions, stocks, …
Science: Remote sensing, bioinformatics, scientific simulation, …
Society and everyone: news, images, video, documents
Internet …
2
3
Source: Intel
How much data?
 Google: ~20-30 PB a day
 Wayback Machine has ~4 PB + 100-200 TB/month
 Facebook: ~3 PB of user data + 25 TB/day
 eBay: ~7 PB of user data + 50 TB/day
 CERN’s Large Hydron Collider generates 15 PB a year
 In 2010, enterprises stored 7 Exabytes = 7,000,000,000 GB
640K ought to be
enough for anybody.
Big Data Growing
5
The Untapped Data Gap:
Most of the useful data will
not be tagged or analyzed –
partly due to skill shortage
IDC predicts: From 2005 to 2020, the
digital universe will double every 2
years and grow from 130 exabytes to
40,000 exabytes
or 5,200 GB / person in 2020.
What Is Data Mining?
We are drowning in data, but starving for knowledge!
“Necessity is the mother of invention”—Data mining—
Automated analysis of massive data sets
6
The non-trivial extraction of implicit, previously unknown and
potentially useful knowledge from data in large data repositories
 Data Mining: A Definition
 Non-trivial: obvious knowledge is not useful
 implicit: hidden difficult to observe knowledge
 previously unknown
 potentially useful: actionable; easy to understand
7
Data Mining: Confluence of Multiple Disciplines
Data Mining
Machine
Learning
Statistics
Applications
Algorithm
Pattern
Recognition
High-Performance
Computing
Visualization
Database
Technology
8
Data Mining’s Virtuous Cycle
1. Identifying the problem
2. Mining data to transform it into actionable
information
3. Acting on the information
4. Measuring the results
9
The Knowledge Discovery Process
 Data Mining v. Knowledge Discovery in Databases (KDD)
 DM and KDD are often used interchangeably
 actually, DM is only part of the KDD process
- The KDD Process
10
Types of Knowledge Discovery
 Two kinds of knowledge discovery: directed and undirected
 Directed Knowledge Discovery
 Purpose: Explain value of some field in terms of all the others (goal-oriented)
 Method: select the target field based on some hypothesis about the data; ask the
algorithm to tell us how to predict or classify new instances
 Examples:
what products show increased sale when cream cheese is discounted
which banner ad to use on a web page for a given user coming to the site
 Undirected Knowledge Discovery
 Purpose: Find patterns in the data that may be interesting (no target field)
 Method: clustering, affinity grouping
 Examples:
which products in the catalog often sell together
market segmentation (find groups of customers/users with similar
characteristics or behavioral patterns)
From Data Mining to Data Science
11
12
Data Mining: On What Kinds of Data?
 Database-oriented data sets and applications
Relational database, data warehouse, transactional database
Object-relational databases, Heterogeneous databases and legacy databases
 Advanced data sets and advanced applications
Data streams and sensor data
Time-series data, temporal data, sequence data (incl. bio-sequences)
Structure data, graphs, social networks and information networks
Spatial data and spatiotemporal data
Multimedia database
Text databases
The World-Wide Web
13
Data Mining: What Kind of Data?
Structured Databases
relational, object-relational, etc.
can use SQL to perform parts of the process
e.g., SELECT count(*) FROM Items WHERE
type=video GROUP BY category
14
Data Mining: What Kind of Data?
 Flat Files
 most common data source
 can be text (or HTML) or binary
 may contain transactions, statistical data, measurements, etc.
 Transactional databases
 set of records each with a transaction id, time stamp, and a set of items
 may have an associated “description” file for the items
 typical source of data used in market basket analysis
15
Data Mining: What Kind of Data?
 Other Types of Databases
 legacy databases
 multimedia databases (usually very high-dimensional)
 spatial databases (containing geographical information, such as maps, or
satellite imaging data, etc.)
 Time Series Temporal Data (time dependent information such as stock market
data; usually very dynamic)
 World Wide Web
 basically a large, heterogeneous, distributed database
 need for new or additional tools and techniques
information retrieval, filtering and extraction
agents to assist in browsing and filtering
Web content, usage, and structure (linkage) mining tools
 The “social Web”
User generated meta-data, social networks, shared resources, etc.
16
What Can Data Mining Do
Many Data Mining Tasks
 often inter-related
 often need to try different techniques/algorithms for each task
 each tasks may require different types of knowledge discovery
What are some of data mining tasks
 Classification
 Prediction
 Clustering
 Affinity Grouping / Association discovery
 Sequence Analysis
 Characterization
 Discrimination
17
Some Applications of Data mining
 Business data analysis and decision support
Marketing focalization
Recognizing specific market segments that respond to particular
characteristics
Return on mailing campaign (target marketing)
Customer Profiling
Segmentation of customer for marketing strategies and/or product
offerings
Customer behavior understanding
Customer retention and loyalty
Mass customization / personalization
18
Some Applications of Data mining
 Business data analysis and decision support (cont.)
Market analysis and management
Provide summary information for decision-making
Market basket analysis, cross selling, market segmentation.
Resource planning
Risk analysis and management
"What if" analysis
Forecasting
Pricing analysis, competitive analysis
Time-series analysis (Ex. stock market)
19
Some Applications of Data mining
 Fraud detection
Detecting telephone fraud:
Telephone call model: destination of the call, duration, time of day or week
Analyze patterns that deviate from an expected norm
British Telecom identified discrete groups of callers with frequent intra-group calls,
especially mobile phones, and broke a multimillion dollar fraud scheme
Detection of credit-card fraud
Detecting suspicious money transactions (money laundering)
 Text mining:
 Message filtering (e-mail, newsgroups, etc.)
 Newspaper articles analysis
 Text and document categorization
 Web Mining
 Mining patterns from the content, usage, and structure of Web resources
Types of Web Mining
Web Content
Mining
Web Structure
Mining
Web Usage
Mining
Web Mining
20
Types of Web Mining
Web Content
Mining
Web Structure
Mining
Web Usage
Mining
Web Mining
21
Applications:
• document clustering or
categorization
• topic identification / tracking
• concept discovery
• focused crawling
• content-based personalization
• intelligent search tools
Types of Web Mining
Web Content
Mining
Web Structure
Mining
Web Usage
Mining
Web Mining
Applications:
• user and customer behavior modeling
• Web site optimization
• e-customer relationship management
• Web marketing
• targeted advertising
• recommender systems
22
Types of Web Mining
Web Content
Mining
Web Structure
Mining
Web Usage
Mining
Web Mining
Applications:
• document retrieval and
ranking (e.g., Google)
• discovery of “hubs” and
“authorities”
• discovery of Web
communities
• social network analysis
23
24
The Knowledge Discovery Process
- The KDD Process
 Next: We first focus on understanding the data and data
preparation/transformation

More Related Content

What's hot

Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data mining
DataminingTools Inc
 
Data mining
Data miningData mining
Data mining
Annies Minu
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
Sanzid Kawsar
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
DanWooster1
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
Hatem Magdy
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
DataminingTools Inc
 
Secondary Research in Applied Marketing Research
Secondary Research in Applied Marketing ResearchSecondary Research in Applied Marketing Research
Secondary Research in Applied Marketing Research
Kelly Page
 
All types of mining and trends indata mining
All types of mining and trends indata miningAll types of mining and trends indata mining
All types of mining and trends indata mining
Rupal Kharya
 
Data Mining: Key definitions
Data Mining: Key definitionsData Mining: Key definitions
Data Mining: Key definitions
DataminingTools Inc
 
Data mining in Telecommunications
Data mining in TelecommunicationsData mining in Telecommunications
Data mining in Telecommunications
Mohsin Nadaf
 
Dm unit i r16
Dm unit i   r16Dm unit i   r16
Dm unit i r16
Kishore Kumar
 
Introduction to-data-mining chapter 1
Introduction to-data-mining  chapter 1Introduction to-data-mining  chapter 1
Introduction to-data-mining chapter 1
Mahmoud Alfarra
 
What is Data mining? Data mining Presentation
What is Data mining? Data mining Presentation What is Data mining? Data mining Presentation
What is Data mining? Data mining Presentation
Pralhad Rijal
 
Mining internal sources of data
Mining internal sources of dataMining internal sources of data
Mining internal sources of data
nomanbhutta
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)
Krishan Pareek
 
Big data
Big dataBig data
Big data
Ankita kamble
 
Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)
yesheeka
 

What's hot (18)

10appl
10appl10appl
10appl
 
Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data mining
 
Data mining
Data miningData mining
Data mining
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
Secondary Research in Applied Marketing Research
Secondary Research in Applied Marketing ResearchSecondary Research in Applied Marketing Research
Secondary Research in Applied Marketing Research
 
All types of mining and trends indata mining
All types of mining and trends indata miningAll types of mining and trends indata mining
All types of mining and trends indata mining
 
Data Mining: Key definitions
Data Mining: Key definitionsData Mining: Key definitions
Data Mining: Key definitions
 
Data mining in Telecommunications
Data mining in TelecommunicationsData mining in Telecommunications
Data mining in Telecommunications
 
Dm unit i r16
Dm unit i   r16Dm unit i   r16
Dm unit i r16
 
Introduction to-data-mining chapter 1
Introduction to-data-mining  chapter 1Introduction to-data-mining  chapter 1
Introduction to-data-mining chapter 1
 
What is Data mining? Data mining Presentation
What is Data mining? Data mining Presentation What is Data mining? Data mining Presentation
What is Data mining? Data mining Presentation
 
Mining internal sources of data
Mining internal sources of dataMining internal sources of data
Mining internal sources of data
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)
 
Big data
Big dataBig data
Big data
 
Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)
 

Viewers also liked

Programming for engineers in python
Programming for engineers in pythonProgramming for engineers in python
Programming for engineers in python
Young Alista
 
Database concepts
Database conceptsDatabase concepts
Database concepts
Young Alista
 
Database introduction
Database introductionDatabase introduction
Database introduction
Young Alista
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
Young Alista
 
Key exchange in crypto
Key exchange in cryptoKey exchange in crypto
Key exchange in crypto
Young Alista
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
Young Alista
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Young Alista
 
Xml stylus studio
Xml stylus studioXml stylus studio
Xml stylus studio
Young Alista
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
Young Alista
 
Tecnologías de Información y Comunicación
Tecnologías de Información y ComunicaciónTecnologías de Información y Comunicación
Tecnologías de Información y Comunicación
polivirtual972
 
List in webpage
List in webpageList in webpage
List in webpage
Young Alista
 
Python your new best friend
Python your new best friendPython your new best friend
Python your new best friend
Young Alista
 
Data visualization
Data visualizationData visualization
Data visualization
Young Alista
 
Prolog programming
Prolog programmingProlog programming
Prolog programming
Young Alista
 
Google appenginejava.ppt
Google appenginejava.pptGoogle appenginejava.ppt
Google appenginejava.ppt
Young Alista
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
Young Alista
 

Viewers also liked (20)

Network
NetworkNetwork
Network
 
Python basics
Python basicsPython basics
Python basics
 
Programming for engineers in python
Programming for engineers in pythonProgramming for engineers in python
Programming for engineers in python
 
Database concepts
Database conceptsDatabase concepts
Database concepts
 
Database introduction
Database introductionDatabase introduction
Database introduction
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
 
Abstract class
Abstract classAbstract class
Abstract class
 
Key exchange in crypto
Key exchange in cryptoKey exchange in crypto
Key exchange in crypto
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Xml stylus studio
Xml stylus studioXml stylus studio
Xml stylus studio
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Poo java
Poo javaPoo java
Poo java
 
Tecnologías de Información y Comunicación
Tecnologías de Información y ComunicaciónTecnologías de Información y Comunicación
Tecnologías de Información y Comunicación
 
List in webpage
List in webpageList in webpage
List in webpage
 
Python your new best friend
Python your new best friendPython your new best friend
Python your new best friend
 
Data visualization
Data visualizationData visualization
Data visualization
 
Prolog programming
Prolog programmingProlog programming
Prolog programming
 
Google appenginejava.ppt
Google appenginejava.pptGoogle appenginejava.ppt
Google appenginejava.ppt
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
 

Similar to Data mining and knowledge discovery

Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data miningRohit Kumar
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
thamizh arasi
 
6 weeks summer training in data mining,jalandhar
6 weeks summer training in data mining,jalandhar6 weeks summer training in data mining,jalandhar
6 weeks summer training in data mining,jalandhar
deepikakaler1
 
6months industrial training in data mining,ludhiana
6months industrial training in data mining,ludhiana6months industrial training in data mining,ludhiana
6months industrial training in data mining,ludhiana
deepikakaler1
 
6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhar6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhar
deepikakaler1
 
6 weeks summer training in data mining,ludhiana
6 weeks summer training in data mining,ludhiana6 weeks summer training in data mining,ludhiana
6 weeks summer training in data mining,ludhiana
deepikakaler1
 
Introduction.ppt
Introduction.pptIntroduction.ppt
Introduction.ppt
bommaiah
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
PadmajaLaksh
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
Subrata Kumer Paul
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
ShubhamSamrat5
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
AidaMustapha6
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
admsoyadm4
 
data mining
data miningdata mining
data mining
AMITKUMAR202236
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
VaibhavGupta447155
 
Data Mining
Data MiningData Mining
Data Mining
NafiulIslamNakib
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 

Similar to Data mining and knowledge discovery (20)

Data mining 1
Data mining 1Data mining 1
Data mining 1
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Introduction
IntroductionIntroduction
Introduction
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
6 weeks summer training in data mining,jalandhar
6 weeks summer training in data mining,jalandhar6 weeks summer training in data mining,jalandhar
6 weeks summer training in data mining,jalandhar
 
6months industrial training in data mining,ludhiana
6months industrial training in data mining,ludhiana6months industrial training in data mining,ludhiana
6months industrial training in data mining,ludhiana
 
6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhar6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhar
 
6 weeks summer training in data mining,ludhiana
6 weeks summer training in data mining,ludhiana6 weeks summer training in data mining,ludhiana
6 weeks summer training in data mining,ludhiana
 
Introduction.ppt
Introduction.pptIntroduction.ppt
Introduction.ppt
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
Data mining
Data miningData mining
Data mining
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
Data Mining
Data MiningData Mining
Data Mining
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 

More from Young Alista

Serialization/deserialization
Serialization/deserializationSerialization/deserialization
Serialization/deserialization
Young Alista
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data mining
Young Alista
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
Young Alista
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
Young Alista
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
Young Alista
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
Young Alista
 
Abstraction file
Abstraction fileAbstraction file
Abstraction file
Young Alista
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
Young Alista
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
Young Alista
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and python
Young Alista
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
Young Alista
 
Learning python
Learning pythonLearning python
Learning python
Young Alista
 
Extending burp with python
Extending burp with pythonExtending burp with python
Extending burp with python
Young Alista
 
Python language data types
Python language data typesPython language data types
Python language data types
Young Alista
 
Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your site
Young Alista
 
How to build a rest api.pptx
How to build a rest api.pptxHow to build a rest api.pptx
How to build a rest api.pptx
Young Alista
 

More from Young Alista (20)

Serialization/deserialization
Serialization/deserializationSerialization/deserialization
Serialization/deserialization
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data mining
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Cache recap
Cache recapCache recap
Cache recap
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
 
Object model
Object modelObject model
Object model
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 
Abstraction file
Abstraction fileAbstraction file
Abstraction file
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Inheritance
InheritanceInheritance
Inheritance
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and python
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
 
Api crash
Api crashApi crash
Api crash
 
Learning python
Learning pythonLearning python
Learning python
 
Extending burp with python
Extending burp with pythonExtending burp with python
Extending burp with python
 
Python language data types
Python language data typesPython language data types
Python language data types
 
Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your site
 
How to build a rest api.pptx
How to build a rest api.pptxHow to build a rest api.pptx
How to build a rest api.pptx
 

Recently uploaded

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Data mining and knowledge discovery

  • 1. 1 From Data to Wisdom  Data  The raw material of information  Information  Data organized and presented by someone  Knowledge  Information read, heard or seen and understood and integrated  Wisdom  Distilled knowledge and understanding which can lead to decisions Wisdom Knowledge Information Data The Information Hierarchy
  • 2. Why Data Mining? The Explosive Growth of Data: from terabytes to petabytes Data collection and data availability Automated data collection tools, database systems, Web, computerized society Major sources of abundant data Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific simulation, … Society and everyone: news, images, video, documents Internet … 2
  • 4. How much data?  Google: ~20-30 PB a day  Wayback Machine has ~4 PB + 100-200 TB/month  Facebook: ~3 PB of user data + 25 TB/day  eBay: ~7 PB of user data + 50 TB/day  CERN’s Large Hydron Collider generates 15 PB a year  In 2010, enterprises stored 7 Exabytes = 7,000,000,000 GB 640K ought to be enough for anybody.
  • 5. Big Data Growing 5 The Untapped Data Gap: Most of the useful data will not be tagged or analyzed – partly due to skill shortage IDC predicts: From 2005 to 2020, the digital universe will double every 2 years and grow from 130 exabytes to 40,000 exabytes or 5,200 GB / person in 2020.
  • 6. What Is Data Mining? We are drowning in data, but starving for knowledge! “Necessity is the mother of invention”—Data mining— Automated analysis of massive data sets 6 The non-trivial extraction of implicit, previously unknown and potentially useful knowledge from data in large data repositories  Data Mining: A Definition  Non-trivial: obvious knowledge is not useful  implicit: hidden difficult to observe knowledge  previously unknown  potentially useful: actionable; easy to understand
  • 7. 7 Data Mining: Confluence of Multiple Disciplines Data Mining Machine Learning Statistics Applications Algorithm Pattern Recognition High-Performance Computing Visualization Database Technology
  • 8. 8 Data Mining’s Virtuous Cycle 1. Identifying the problem 2. Mining data to transform it into actionable information 3. Acting on the information 4. Measuring the results
  • 9. 9 The Knowledge Discovery Process  Data Mining v. Knowledge Discovery in Databases (KDD)  DM and KDD are often used interchangeably  actually, DM is only part of the KDD process - The KDD Process
  • 10. 10 Types of Knowledge Discovery  Two kinds of knowledge discovery: directed and undirected  Directed Knowledge Discovery  Purpose: Explain value of some field in terms of all the others (goal-oriented)  Method: select the target field based on some hypothesis about the data; ask the algorithm to tell us how to predict or classify new instances  Examples: what products show increased sale when cream cheese is discounted which banner ad to use on a web page for a given user coming to the site  Undirected Knowledge Discovery  Purpose: Find patterns in the data that may be interesting (no target field)  Method: clustering, affinity grouping  Examples: which products in the catalog often sell together market segmentation (find groups of customers/users with similar characteristics or behavioral patterns)
  • 11. From Data Mining to Data Science 11
  • 12. 12 Data Mining: On What Kinds of Data?  Database-oriented data sets and applications Relational database, data warehouse, transactional database Object-relational databases, Heterogeneous databases and legacy databases  Advanced data sets and advanced applications Data streams and sensor data Time-series data, temporal data, sequence data (incl. bio-sequences) Structure data, graphs, social networks and information networks Spatial data and spatiotemporal data Multimedia database Text databases The World-Wide Web
  • 13. 13 Data Mining: What Kind of Data? Structured Databases relational, object-relational, etc. can use SQL to perform parts of the process e.g., SELECT count(*) FROM Items WHERE type=video GROUP BY category
  • 14. 14 Data Mining: What Kind of Data?  Flat Files  most common data source  can be text (or HTML) or binary  may contain transactions, statistical data, measurements, etc.  Transactional databases  set of records each with a transaction id, time stamp, and a set of items  may have an associated “description” file for the items  typical source of data used in market basket analysis
  • 15. 15 Data Mining: What Kind of Data?  Other Types of Databases  legacy databases  multimedia databases (usually very high-dimensional)  spatial databases (containing geographical information, such as maps, or satellite imaging data, etc.)  Time Series Temporal Data (time dependent information such as stock market data; usually very dynamic)  World Wide Web  basically a large, heterogeneous, distributed database  need for new or additional tools and techniques information retrieval, filtering and extraction agents to assist in browsing and filtering Web content, usage, and structure (linkage) mining tools  The “social Web” User generated meta-data, social networks, shared resources, etc.
  • 16. 16 What Can Data Mining Do Many Data Mining Tasks  often inter-related  often need to try different techniques/algorithms for each task  each tasks may require different types of knowledge discovery What are some of data mining tasks  Classification  Prediction  Clustering  Affinity Grouping / Association discovery  Sequence Analysis  Characterization  Discrimination
  • 17. 17 Some Applications of Data mining  Business data analysis and decision support Marketing focalization Recognizing specific market segments that respond to particular characteristics Return on mailing campaign (target marketing) Customer Profiling Segmentation of customer for marketing strategies and/or product offerings Customer behavior understanding Customer retention and loyalty Mass customization / personalization
  • 18. 18 Some Applications of Data mining  Business data analysis and decision support (cont.) Market analysis and management Provide summary information for decision-making Market basket analysis, cross selling, market segmentation. Resource planning Risk analysis and management "What if" analysis Forecasting Pricing analysis, competitive analysis Time-series analysis (Ex. stock market)
  • 19. 19 Some Applications of Data mining  Fraud detection Detecting telephone fraud: Telephone call model: destination of the call, duration, time of day or week Analyze patterns that deviate from an expected norm British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud scheme Detection of credit-card fraud Detecting suspicious money transactions (money laundering)  Text mining:  Message filtering (e-mail, newsgroups, etc.)  Newspaper articles analysis  Text and document categorization  Web Mining  Mining patterns from the content, usage, and structure of Web resources
  • 20. Types of Web Mining Web Content Mining Web Structure Mining Web Usage Mining Web Mining 20
  • 21. Types of Web Mining Web Content Mining Web Structure Mining Web Usage Mining Web Mining 21 Applications: • document clustering or categorization • topic identification / tracking • concept discovery • focused crawling • content-based personalization • intelligent search tools
  • 22. Types of Web Mining Web Content Mining Web Structure Mining Web Usage Mining Web Mining Applications: • user and customer behavior modeling • Web site optimization • e-customer relationship management • Web marketing • targeted advertising • recommender systems 22
  • 23. Types of Web Mining Web Content Mining Web Structure Mining Web Usage Mining Web Mining Applications: • document retrieval and ranking (e.g., Google) • discovery of “hubs” and “authorities” • discovery of Web communities • social network analysis 23
  • 24. 24 The Knowledge Discovery Process - The KDD Process  Next: We first focus on understanding the data and data preparation/transformation