SlideShare a Scribd company logo
1 of 24
1
From Data to Wisdom
 Data
 The raw material of
information
 Information
 Data organized and
presented by someone
 Knowledge
 Information read, heard or
seen and understood and
integrated
 Wisdom
 Distilled knowledge and
understanding which can
lead to decisions
Wisdom
Knowledge
Information
Data
The Information Hierarchy
Why Data Mining?
The Explosive Growth of Data: from terabytes to
petabytes
Data collection and data availability
Automated data collection tools, database systems, Web,
computerized society
Major sources of abundant data
Business: Web, e-commerce, transactions, stocks, …
Science: Remote sensing, bioinformatics, scientific simulation, …
Society and everyone: news, images, video, documents
Internet …
2
3
Source: Intel
How much data?
 Google: ~20-30 PB a day
 Wayback Machine has ~4 PB + 100-200 TB/month
 Facebook: ~3 PB of user data + 25 TB/day
 eBay: ~7 PB of user data + 50 TB/day
 CERN’s Large Hydron Collider generates 15 PB a year
 In 2010, enterprises stored 7 Exabytes = 7,000,000,000 GB
640K ought to be
enough for anybody.
Big Data Growing
5
The Untapped Data Gap:
Most of the useful data will
not be tagged or analyzed –
partly due to skill shortage
IDC predicts: From 2005 to 2020, the
digital universe will double every 2
years and grow from 130 exabytes to
40,000 exabytes
or 5,200 GB / person in 2020.
What Is Data Mining?
We are drowning in data, but starving for knowledge!
“Necessity is the mother of invention”—Data mining—
Automated analysis of massive data sets
6
The non-trivial extraction of implicit, previously unknown and
potentially useful knowledge from data in large data repositories
 Data Mining: A Definition
 Non-trivial: obvious knowledge is not useful
 implicit: hidden difficult to observe knowledge
 previously unknown
 potentially useful: actionable; easy to understand
7
Data Mining: Confluence of Multiple Disciplines
Data Mining
Machine
Learning
Statistics
Applications
Algorithm
Pattern
Recognition
High-Performance
Computing
Visualization
Database
Technology
8
Data Mining’s Virtuous Cycle
1. Identifying the problem
2. Mining data to transform it into actionable
information
3. Acting on the information
4. Measuring the results
9
The Knowledge Discovery Process
 Data Mining v. Knowledge Discovery in Databases (KDD)
 DM and KDD are often used interchangeably
 actually, DM is only part of the KDD process
- The KDD Process
10
Types of Knowledge Discovery
 Two kinds of knowledge discovery: directed and undirected
 Directed Knowledge Discovery
 Purpose: Explain value of some field in terms of all the others (goal-oriented)
 Method: select the target field based on some hypothesis about the data; ask the
algorithm to tell us how to predict or classify new instances
 Examples:
what products show increased sale when cream cheese is discounted
which banner ad to use on a web page for a given user coming to the site
 Undirected Knowledge Discovery
 Purpose: Find patterns in the data that may be interesting (no target field)
 Method: clustering, affinity grouping
 Examples:
which products in the catalog often sell together
market segmentation (find groups of customers/users with similar
characteristics or behavioral patterns)
From Data Mining to Data Science
11
12
Data Mining: On What Kinds of Data?
 Database-oriented data sets and applications
Relational database, data warehouse, transactional database
Object-relational databases, Heterogeneous databases and legacy databases
 Advanced data sets and advanced applications
Data streams and sensor data
Time-series data, temporal data, sequence data (incl. bio-sequences)
Structure data, graphs, social networks and information networks
Spatial data and spatiotemporal data
Multimedia database
Text databases
The World-Wide Web
13
Data Mining: What Kind of Data?
Structured Databases
relational, object-relational, etc.
can use SQL to perform parts of the process
e.g., SELECT count(*) FROM Items WHERE
type=video GROUP BY category
14
Data Mining: What Kind of Data?
 Flat Files
 most common data source
 can be text (or HTML) or binary
 may contain transactions, statistical data, measurements, etc.
 Transactional databases
 set of records each with a transaction id, time stamp, and a set of items
 may have an associated “description” file for the items
 typical source of data used in market basket analysis
15
Data Mining: What Kind of Data?
 Other Types of Databases
 legacy databases
 multimedia databases (usually very high-dimensional)
 spatial databases (containing geographical information, such as maps, or
satellite imaging data, etc.)
 Time Series Temporal Data (time dependent information such as stock market
data; usually very dynamic)
 World Wide Web
 basically a large, heterogeneous, distributed database
 need for new or additional tools and techniques
information retrieval, filtering and extraction
agents to assist in browsing and filtering
Web content, usage, and structure (linkage) mining tools
 The “social Web”
User generated meta-data, social networks, shared resources, etc.
16
What Can Data Mining Do
Many Data Mining Tasks
 often inter-related
 often need to try different techniques/algorithms for each task
 each tasks may require different types of knowledge discovery
What are some of data mining tasks
 Classification
 Prediction
 Clustering
 Affinity Grouping / Association discovery
 Sequence Analysis
 Characterization
 Discrimination
17
Some Applications of Data mining
 Business data analysis and decision support
Marketing focalization
Recognizing specific market segments that respond to particular
characteristics
Return on mailing campaign (target marketing)
Customer Profiling
Segmentation of customer for marketing strategies and/or product
offerings
Customer behavior understanding
Customer retention and loyalty
Mass customization / personalization
18
Some Applications of Data mining
 Business data analysis and decision support (cont.)
Market analysis and management
Provide summary information for decision-making
Market basket analysis, cross selling, market segmentation.
Resource planning
Risk analysis and management
"What if" analysis
Forecasting
Pricing analysis, competitive analysis
Time-series analysis (Ex. stock market)
19
Some Applications of Data mining
 Fraud detection
Detecting telephone fraud:
Telephone call model: destination of the call, duration, time of day or week
Analyze patterns that deviate from an expected norm
British Telecom identified discrete groups of callers with frequent intra-group calls,
especially mobile phones, and broke a multimillion dollar fraud scheme
Detection of credit-card fraud
Detecting suspicious money transactions (money laundering)
 Text mining:
 Message filtering (e-mail, newsgroups, etc.)
 Newspaper articles analysis
 Text and document categorization
 Web Mining
 Mining patterns from the content, usage, and structure of Web resources
Types of Web Mining
Web Content
Mining
Web Structure
Mining
Web Usage
Mining
Web Mining
20
Types of Web Mining
Web Content
Mining
Web Structure
Mining
Web Usage
Mining
Web Mining
21
Applications:
• document clustering or
categorization
• topic identification / tracking
• concept discovery
• focused crawling
• content-based personalization
• intelligent search tools
Types of Web Mining
Web Content
Mining
Web Structure
Mining
Web Usage
Mining
Web Mining
Applications:
• user and customer behavior modeling
• Web site optimization
• e-customer relationship management
• Web marketing
• targeted advertising
• recommender systems
22
Types of Web Mining
Web Content
Mining
Web Structure
Mining
Web Usage
Mining
Web Mining
Applications:
• document retrieval and
ranking (e.g., Google)
• discovery of “hubs” and
“authorities”
• discovery of Web
communities
• social network analysis
23
24
The Knowledge Discovery Process
- The KDD Process
 Next: We first focus on understanding the data and data
preparation/transformation

More Related Content

What's hot

Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
Application of KDD & its future scope
Application of KDD & its future scopeApplication of KDD & its future scope
Application of KDD & its future scopeTanmay Sethi
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data miningDataminingTools Inc
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an IntroductionAli Abbasi
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningSi Krishan
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CSThanveen
 
Data mining-2
Data mining-2Data mining-2
Data mining-2Nit Hik
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process Shuvra Ghosh
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378nitttin
 

What's hot (20)

Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Introduction to DataMining
Introduction to DataMiningIntroduction to DataMining
Introduction to DataMining
 
Data Mining Overview
Data Mining OverviewData Mining Overview
Data Mining Overview
 
Application of KDD & its future scope
Application of KDD & its future scopeApplication of KDD & its future scope
Application of KDD & its future scope
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Introduction data mining
Introduction data miningIntroduction data mining
Introduction data mining
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
 
Introduction
IntroductionIntroduction
Introduction
 
Data mining and knowledge Discovery
Data mining and knowledge DiscoveryData mining and knowledge Discovery
Data mining and knowledge Discovery
 
Ch 1 intro_dw
Ch 1 intro_dwCh 1 intro_dw
Ch 1 intro_dw
 
Data mining-2
Data mining-2Data mining-2
Data mining-2
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process
 
Kdd process
Kdd processKdd process
Kdd process
 
Lecture 01 Data Mining
Lecture 01 Data MiningLecture 01 Data Mining
Lecture 01 Data Mining
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378
 

Viewers also liked

similarity measure
similarity measure similarity measure
similarity measure ZHAO Sam
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overviewashok kumar
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 

Viewers also liked (6)

KDD e Data Mining
KDD e Data MiningKDD e Data Mining
KDD e Data Mining
 
Seminar datawarehouse @ Universitas Multimedia Nusantara
Seminar datawarehouse @ Universitas Multimedia NusantaraSeminar datawarehouse @ Universitas Multimedia Nusantara
Seminar datawarehouse @ Universitas Multimedia Nusantara
 
similarity measure
similarity measure similarity measure
similarity measure
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 

Similar to Data mining and knowledge discovery

Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data miningRohit Kumar
 
6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhar6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhardeepikakaler1
 
6 weeks summer training in data mining,ludhiana
6 weeks summer training in data mining,ludhiana6 weeks summer training in data mining,ludhiana
6 weeks summer training in data mining,ludhianadeepikakaler1
 
6 weeks summer training in data mining,jalandhar
6 weeks summer training in data mining,jalandhar6 weeks summer training in data mining,jalandhar
6 weeks summer training in data mining,jalandhardeepikakaler1
 
6months industrial training in data mining,ludhiana
6months industrial training in data mining,ludhiana6months industrial training in data mining,ludhiana
6months industrial training in data mining,ludhianadeepikakaler1
 
Introduction.ppt
Introduction.pptIntroduction.ppt
Introduction.pptbommaiah
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptPadmajaLaksh
 
Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)yesheeka
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.pptadmsoyadm4
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1DanWooster1
 

Similar to Data mining and knowledge discovery (20)

Data mining 1
Data mining 1Data mining 1
Data mining 1
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhar6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhar
 
6 weeks summer training in data mining,ludhiana
6 weeks summer training in data mining,ludhiana6 weeks summer training in data mining,ludhiana
6 weeks summer training in data mining,ludhiana
 
6 weeks summer training in data mining,jalandhar
6 weeks summer training in data mining,jalandhar6 weeks summer training in data mining,jalandhar
6 weeks summer training in data mining,jalandhar
 
6months industrial training in data mining,ludhiana
6months industrial training in data mining,ludhiana6months industrial training in data mining,ludhiana
6months industrial training in data mining,ludhiana
 
Introduction.ppt
Introduction.pptIntroduction.ppt
Introduction.ppt
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
Data mining
Data miningData mining
Data mining
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Data Mining
Data MiningData Mining
Data Mining
 

More from Luis Goldster

Ruby on rails evaluation
Ruby on rails evaluationRuby on rails evaluation
Ruby on rails evaluationLuis Goldster
 
Ado.net & data persistence frameworks
Ado.net & data persistence frameworksAdo.net & data persistence frameworks
Ado.net & data persistence frameworksLuis Goldster
 
Multithreading models.ppt
Multithreading models.pptMultithreading models.ppt
Multithreading models.pptLuis Goldster
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data miningLuis Goldster
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data miningLuis Goldster
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherenceLuis Goldster
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cacheLuis Goldster
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching worksLuis Goldster
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsLuis Goldster
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysisLuis Goldster
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with javaLuis Goldster
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithmsLuis Goldster
 

More from Luis Goldster (20)

Ruby on rails evaluation
Ruby on rails evaluationRuby on rails evaluation
Ruby on rails evaluation
 
Design patterns
Design patternsDesign patterns
Design patterns
 
Lisp and scheme i
Lisp and scheme iLisp and scheme i
Lisp and scheme i
 
Ado.net & data persistence frameworks
Ado.net & data persistence frameworksAdo.net & data persistence frameworks
Ado.net & data persistence frameworks
 
Multithreading models.ppt
Multithreading models.pptMultithreading models.ppt
Multithreading models.ppt
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data mining
 
Cache recap
Cache recapCache recap
Cache recap
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
 
Api crash
Api crashApi crash
Api crash
 
Object model
Object modelObject model
Object model
 
Abstraction file
Abstraction fileAbstraction file
Abstraction file
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
 
Abstract class
Abstract classAbstract class
Abstract class
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Data mining and knowledge discovery

  • 1. 1 From Data to Wisdom  Data  The raw material of information  Information  Data organized and presented by someone  Knowledge  Information read, heard or seen and understood and integrated  Wisdom  Distilled knowledge and understanding which can lead to decisions Wisdom Knowledge Information Data The Information Hierarchy
  • 2. Why Data Mining? The Explosive Growth of Data: from terabytes to petabytes Data collection and data availability Automated data collection tools, database systems, Web, computerized society Major sources of abundant data Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific simulation, … Society and everyone: news, images, video, documents Internet … 2
  • 4. How much data?  Google: ~20-30 PB a day  Wayback Machine has ~4 PB + 100-200 TB/month  Facebook: ~3 PB of user data + 25 TB/day  eBay: ~7 PB of user data + 50 TB/day  CERN’s Large Hydron Collider generates 15 PB a year  In 2010, enterprises stored 7 Exabytes = 7,000,000,000 GB 640K ought to be enough for anybody.
  • 5. Big Data Growing 5 The Untapped Data Gap: Most of the useful data will not be tagged or analyzed – partly due to skill shortage IDC predicts: From 2005 to 2020, the digital universe will double every 2 years and grow from 130 exabytes to 40,000 exabytes or 5,200 GB / person in 2020.
  • 6. What Is Data Mining? We are drowning in data, but starving for knowledge! “Necessity is the mother of invention”—Data mining— Automated analysis of massive data sets 6 The non-trivial extraction of implicit, previously unknown and potentially useful knowledge from data in large data repositories  Data Mining: A Definition  Non-trivial: obvious knowledge is not useful  implicit: hidden difficult to observe knowledge  previously unknown  potentially useful: actionable; easy to understand
  • 7. 7 Data Mining: Confluence of Multiple Disciplines Data Mining Machine Learning Statistics Applications Algorithm Pattern Recognition High-Performance Computing Visualization Database Technology
  • 8. 8 Data Mining’s Virtuous Cycle 1. Identifying the problem 2. Mining data to transform it into actionable information 3. Acting on the information 4. Measuring the results
  • 9. 9 The Knowledge Discovery Process  Data Mining v. Knowledge Discovery in Databases (KDD)  DM and KDD are often used interchangeably  actually, DM is only part of the KDD process - The KDD Process
  • 10. 10 Types of Knowledge Discovery  Two kinds of knowledge discovery: directed and undirected  Directed Knowledge Discovery  Purpose: Explain value of some field in terms of all the others (goal-oriented)  Method: select the target field based on some hypothesis about the data; ask the algorithm to tell us how to predict or classify new instances  Examples: what products show increased sale when cream cheese is discounted which banner ad to use on a web page for a given user coming to the site  Undirected Knowledge Discovery  Purpose: Find patterns in the data that may be interesting (no target field)  Method: clustering, affinity grouping  Examples: which products in the catalog often sell together market segmentation (find groups of customers/users with similar characteristics or behavioral patterns)
  • 11. From Data Mining to Data Science 11
  • 12. 12 Data Mining: On What Kinds of Data?  Database-oriented data sets and applications Relational database, data warehouse, transactional database Object-relational databases, Heterogeneous databases and legacy databases  Advanced data sets and advanced applications Data streams and sensor data Time-series data, temporal data, sequence data (incl. bio-sequences) Structure data, graphs, social networks and information networks Spatial data and spatiotemporal data Multimedia database Text databases The World-Wide Web
  • 13. 13 Data Mining: What Kind of Data? Structured Databases relational, object-relational, etc. can use SQL to perform parts of the process e.g., SELECT count(*) FROM Items WHERE type=video GROUP BY category
  • 14. 14 Data Mining: What Kind of Data?  Flat Files  most common data source  can be text (or HTML) or binary  may contain transactions, statistical data, measurements, etc.  Transactional databases  set of records each with a transaction id, time stamp, and a set of items  may have an associated “description” file for the items  typical source of data used in market basket analysis
  • 15. 15 Data Mining: What Kind of Data?  Other Types of Databases  legacy databases  multimedia databases (usually very high-dimensional)  spatial databases (containing geographical information, such as maps, or satellite imaging data, etc.)  Time Series Temporal Data (time dependent information such as stock market data; usually very dynamic)  World Wide Web  basically a large, heterogeneous, distributed database  need for new or additional tools and techniques information retrieval, filtering and extraction agents to assist in browsing and filtering Web content, usage, and structure (linkage) mining tools  The “social Web” User generated meta-data, social networks, shared resources, etc.
  • 16. 16 What Can Data Mining Do Many Data Mining Tasks  often inter-related  often need to try different techniques/algorithms for each task  each tasks may require different types of knowledge discovery What are some of data mining tasks  Classification  Prediction  Clustering  Affinity Grouping / Association discovery  Sequence Analysis  Characterization  Discrimination
  • 17. 17 Some Applications of Data mining  Business data analysis and decision support Marketing focalization Recognizing specific market segments that respond to particular characteristics Return on mailing campaign (target marketing) Customer Profiling Segmentation of customer for marketing strategies and/or product offerings Customer behavior understanding Customer retention and loyalty Mass customization / personalization
  • 18. 18 Some Applications of Data mining  Business data analysis and decision support (cont.) Market analysis and management Provide summary information for decision-making Market basket analysis, cross selling, market segmentation. Resource planning Risk analysis and management "What if" analysis Forecasting Pricing analysis, competitive analysis Time-series analysis (Ex. stock market)
  • 19. 19 Some Applications of Data mining  Fraud detection Detecting telephone fraud: Telephone call model: destination of the call, duration, time of day or week Analyze patterns that deviate from an expected norm British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud scheme Detection of credit-card fraud Detecting suspicious money transactions (money laundering)  Text mining:  Message filtering (e-mail, newsgroups, etc.)  Newspaper articles analysis  Text and document categorization  Web Mining  Mining patterns from the content, usage, and structure of Web resources
  • 20. Types of Web Mining Web Content Mining Web Structure Mining Web Usage Mining Web Mining 20
  • 21. Types of Web Mining Web Content Mining Web Structure Mining Web Usage Mining Web Mining 21 Applications: • document clustering or categorization • topic identification / tracking • concept discovery • focused crawling • content-based personalization • intelligent search tools
  • 22. Types of Web Mining Web Content Mining Web Structure Mining Web Usage Mining Web Mining Applications: • user and customer behavior modeling • Web site optimization • e-customer relationship management • Web marketing • targeted advertising • recommender systems 22
  • 23. Types of Web Mining Web Content Mining Web Structure Mining Web Usage Mining Web Mining Applications: • document retrieval and ranking (e.g., Google) • discovery of “hubs” and “authorities” • discovery of Web communities • social network analysis 23
  • 24. 24 The Knowledge Discovery Process - The KDD Process  Next: We first focus on understanding the data and data preparation/transformation