SlideShare a Scribd company logo
1 of 14
BITS Pilani
Pilani|Dubai|Goa|Hyderabad M1: Introduction to Data Mining
Data Mining
BITS Pilani
Pilani|Dubai|Goa|Hyderabad
1.1 Data Mining Defined
Source Courtesy: Some of the contents of this PPT are sourced from materials provided by publishers of prescribed books
Data Mining
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Why Data Mining?
• The Explosive Growth of Data: from terabytes to petabytes
• Data collection and data availability
• Automated data collection tools, database systems, Web, computerized
society
• Major sources of abundant data
• Business: Web, e-commerce, transactions, stocks, …
• Science: Remote sensing, bioinformatics, scientific simulation, …
• Society and everyone: news, digital cameras, YouTube
• We are drowning in data, but starving for knowledge!
• “Necessity is the mother of invention”—Data mining—Automated analysis of
massive data sets
3
Data Mining
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Why Data Mining
A search engine (e.g., Google) receives hundreds of millions of queries every day. Each query can be
viewed as a transaction where the user describes her or his information need.
What novel and useful knowledge can a search engine learn from such a huge collection of queries
collected from users over time? Some patterns found in user search queries can disclose invaluable
knowledge that cannot be obtained by reading individual data items alone.
For example, Google's Flu Trends uses specific search terms as indicators of flu activity. It found a
close relationship between the number of people who search for flu-related information and the
number of people who actually have flu symptoms. A pattern emerges when all of the search
queries related to flu are aggregated. Using aggregated Google search data, Flu Trends can estimate
flu activity up to two weeks faster than traditional systems can.This example shows how data
mining can turn a large collection of data into knowledge that can help meet a current global
challenge.
Data Mining
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Evolution of Database Technology
• 1960s:
• Data collection, database creation, IMS and network DBMS
• 1970s:
• Relational data model, relational DBMS implementation
• 1980s:
• RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
• Application-oriented DBMS (spatial, scientific, engineering, etc.)
• 1990s:
• Data mining, data warehousing, multimedia databases, and Web databases
• 2000s
• Stream data management and mining
• Data mining and its applications
• Web technology (XML, data integration) and global information systems
5
Data Mining
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
What Is Data Mining?
• Data mining (knowledge discovery from data)
• Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful)
patterns or knowledge from huge amount of data
• Data mining: a misnomer?
• Alternative names
• Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern
analysis, data archeology, data dredging, information harvesting, business intelligence, etc.
• Watch out: Is everything “data mining”?
• Simple search and query processing
• (Deductive) expert systems
6
Data Mining
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
What is (not) Data Mining?
 What is Data Mining?
– Certain names are more
prevalent in certain US
locations (O’Brien, O’Rurke,
O’Reilly… in Boston area)
– Group together similar
documents returned by search
engine according to their
context (e.g. Amazon
rainforest, Amazon.com,)
 What is not Data
Mining?
– Look up phone
number in phone
directory
– Query a Web
search engine for
information about
“Amazon”
Data Mining
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Origins of Data Mining
• Draws ideas from machine learning/AI, pattern recognition,
statistics, and database systems
• Traditional Techniques
may be unsuitable due to
• Enormity of data
• High dimensionality
of data
• Heterogeneous,
distributed nature
of data
Machine Learning/
Pattern
Recognition
Statistics/
AI
Data Mining
Database
systems
Data Mining
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Data Mining in Business Intelligence
9
Increasing potential
to support
business decisions End User
Business
Analyst
Data
Analyst
DBA
Decision
Making
Data Presentation
Visualization Techniques
Data Mining
Information Discovery
Data Exploration
Statistical Summary, Querying, and Reporting
Data Preprocessing/Integration, Data Warehouses
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
Data Mining
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Data Mining/KDD Process
10
Input Data Data
Mining
Data Pre-
Processing
Post-
Processing
Data integration
Normalization
Feature selection
Dimension reduction
Pattern discovery
Association & correlation
Classification
Clustering
Outlier analysis
… … … …
Pattern evaluation
Pattern selection
Pattern interpretation
Pattern visualization
Data Mining
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Multi-Dimensional View of Data Mining
• Data to be mined
• Database data (extended-relational, object-oriented, heterogeneous, legacy), data warehouse,
transactional data, stream, spatiotemporal, time-series, sequence, text and web, multi-media,
graphs & social and information networks
• Knowledge to be mined (or: Data mining functions)
• Characterization, discrimination, association, classification, clustering, trend/deviation, outlier
analysis, etc.
• Descriptive vs. predictive data mining
• Multiple/integrated functions and mining at multiple levels
• Techniques utilized
• Data-intensive, data warehouse (OLAP), machine learning, statistics, pattern recognition,
visualization, high-performance, etc.
• Applications adapted
• Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text
mining, Web mining, etc.
11
Data Mining
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Data Mining & Machine Learning
According to Tom M. Mitchell, Chair of Machine Learning at Carnegie Mellon
University and author of the book Machine Learning (McGraw-Hill),
A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P, if its performance at tasks in T, as measured by P, improves with
the experience E.
We now have a set of objects to define machine learning:
Task (T), Experience (E), and Performance (P)
With a computer running a set of tasks, the experience should be leading to performance
increases (to satisfy the definition)
Many data mining tasks are executed successfully with help of machine learning
Machine Learning: Hands-on for Developers and Technical Professionals by Jason Bell John Wiley & Sons
Data Mining
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Data Mining on Diverse kinds of Data
Besides relational database data (from operational or analytical systems), there are many other kinds of data
that have diverse forms and structures and different semantic meanings.
Examples of data can be :
time-related or sequence data (e.g., historical records, stock exchange data, and time-series and biological
sequence data),
data streams (e.g., video surveillance and sensor data, which are continuously transmitted),
spatial data (e.g., maps),
engineering design data (e.g., the design of buildings, system components, or integrated circuits),
hypertext and multimedia data (including text, image, video, and audio data),
graph and networked data (e.g., social and information networks), and
the Web (a widely distributed information repository).
Diversity of data brings in new challenges such as handling special structures (e.g., sequences, trees, graphs,
and networks) and specific semantics (such as ordering, image, audio and video contents, and connectivity)
Data Mining
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Author(s), Title, Edition, Publishing House
T1 Tan P. N., Steinbach M & Kumar V. “Introduction to Data Mining” Pearson
Education
T2 Data Mining: Concepts and Techniques, Third Edition by Jiawei Han, Micheline
Kamber and Jian Pei Morgan Kaufmann Publishers
Prescribed Text Books

More Related Content

What's hot

9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like YouSalford Systems
 
Computational intelligence for big data analytics bda 2013
Computational intelligence for big data analytics   bda 2013Computational intelligence for big data analytics   bda 2013
Computational intelligence for big data analytics bda 2013oj08
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryHoang Nguyen
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in DatabasesDiwas Kandel
 
All types of mining and trends indata mining
All types of mining and trends indata miningAll types of mining and trends indata mining
All types of mining and trends indata miningRupal Kharya
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...IJERDJOURNAL
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesDeepaR42
 
data mining
data miningdata mining
data mininguoitc
 
Data mining-2
Data mining-2Data mining-2
Data mining-2Nit Hik
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDataminingTools Inc
 

What's hot (20)

9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
Computational intelligence for big data analytics bda 2013
Computational intelligence for big data analytics   bda 2013Computational intelligence for big data analytics   bda 2013
Computational intelligence for big data analytics bda 2013
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in Databases
 
Data mining and knowledge Discovery
Data mining and knowledge DiscoveryData mining and knowledge Discovery
Data mining and knowledge Discovery
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
All types of mining and trends indata mining
All types of mining and trends indata miningAll types of mining and trends indata mining
All types of mining and trends indata mining
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Niso library law
Niso library lawNiso library law
Niso library law
 
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
 
Data Mining
Data MiningData Mining
Data Mining
 
Overview of Big Data
Overview of Big DataOverview of Big Data
Overview of Big Data
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 
data mining
data miningdata mining
data mining
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Lecture 01 Data Mining
Lecture 01 Data MiningLecture 01 Data Mining
Lecture 01 Data Mining
 
Data mining-2
Data mining-2Data mining-2
Data mining-2
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 
Dwdm
DwdmDwdm
Dwdm
 

Similar to Introduction to Data Mining Techniques

Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptSangrangBargayary3
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesKathirvel Ayyaswamy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.pptadmsoyadm4
 
Data sharing in the age of the Social Machine
Data sharing in the age of the Social MachineData sharing in the age of the Social Machine
Data sharing in the age of the Social MachineUlrik Lyngs
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Data Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business DatabasesData Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business Databasesbutest
 
Alejandro Arizpe - Artificial Intelligence, Machine Learning, and Databases i...
Alejandro Arizpe - Artificial Intelligence, Machine Learning, and Databases i...Alejandro Arizpe - Artificial Intelligence, Machine Learning, and Databases i...
Alejandro Arizpe - Artificial Intelligence, Machine Learning, and Databases i...Alejandro Arizpe, MBA, MSc IT, PMP
 
Data Mining introduction and basic concepts
Data Mining introduction and basic conceptsData Mining introduction and basic concepts
Data Mining introduction and basic conceptsPritiRishi
 
UNIT2-Data Mining.pdf
UNIT2-Data Mining.pdfUNIT2-Data Mining.pdf
UNIT2-Data Mining.pdfNancykumari47
 
A Lifecycle Approach to Information Privacy
A Lifecycle Approach to Information PrivacyA Lifecycle Approach to Information Privacy
A Lifecycle Approach to Information PrivacyMicah Altman
 

Similar to Introduction to Data Mining Techniques (20)

Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .ppt
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
Data sharing in the age of the Social Machine
Data sharing in the age of the Social MachineData sharing in the age of the Social Machine
Data sharing in the age of the Social Machine
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Data Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business DatabasesData Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business Databases
 
10 problems 06
10 problems 0610 problems 06
10 problems 06
 
Alejandro Arizpe - Artificial Intelligence, Machine Learning, and Databases i...
Alejandro Arizpe - Artificial Intelligence, Machine Learning, and Databases i...Alejandro Arizpe - Artificial Intelligence, Machine Learning, and Databases i...
Alejandro Arizpe - Artificial Intelligence, Machine Learning, and Databases i...
 
Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
 
Dm unit i r16
Dm unit i   r16Dm unit i   r16
Dm unit i r16
 
Data Mining introduction and basic concepts
Data Mining introduction and basic conceptsData Mining introduction and basic concepts
Data Mining introduction and basic concepts
 
UNIT2-Data Mining.pdf
UNIT2-Data Mining.pdfUNIT2-Data Mining.pdf
UNIT2-Data Mining.pdf
 
A Lifecycle Approach to Information Privacy
A Lifecycle Approach to Information PrivacyA Lifecycle Approach to Information Privacy
A Lifecycle Approach to Information Privacy
 

Recently uploaded

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 

Recently uploaded (20)

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 

Introduction to Data Mining Techniques

  • 1. BITS Pilani Pilani|Dubai|Goa|Hyderabad M1: Introduction to Data Mining Data Mining
  • 2. BITS Pilani Pilani|Dubai|Goa|Hyderabad 1.1 Data Mining Defined Source Courtesy: Some of the contents of this PPT are sourced from materials provided by publishers of prescribed books
  • 3. Data Mining BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 Why Data Mining? • The Explosive Growth of Data: from terabytes to petabytes • Data collection and data availability • Automated data collection tools, database systems, Web, computerized society • Major sources of abundant data • Business: Web, e-commerce, transactions, stocks, … • Science: Remote sensing, bioinformatics, scientific simulation, … • Society and everyone: news, digital cameras, YouTube • We are drowning in data, but starving for knowledge! • “Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets 3
  • 4. Data Mining BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 Why Data Mining A search engine (e.g., Google) receives hundreds of millions of queries every day. Each query can be viewed as a transaction where the user describes her or his information need. What novel and useful knowledge can a search engine learn from such a huge collection of queries collected from users over time? Some patterns found in user search queries can disclose invaluable knowledge that cannot be obtained by reading individual data items alone. For example, Google's Flu Trends uses specific search terms as indicators of flu activity. It found a close relationship between the number of people who search for flu-related information and the number of people who actually have flu symptoms. A pattern emerges when all of the search queries related to flu are aggregated. Using aggregated Google search data, Flu Trends can estimate flu activity up to two weeks faster than traditional systems can.This example shows how data mining can turn a large collection of data into knowledge that can help meet a current global challenge.
  • 5. Data Mining BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 Evolution of Database Technology • 1960s: • Data collection, database creation, IMS and network DBMS • 1970s: • Relational data model, relational DBMS implementation • 1980s: • RDBMS, advanced data models (extended-relational, OO, deductive, etc.) • Application-oriented DBMS (spatial, scientific, engineering, etc.) • 1990s: • Data mining, data warehousing, multimedia databases, and Web databases • 2000s • Stream data management and mining • Data mining and its applications • Web technology (XML, data integration) and global information systems 5
  • 6. Data Mining BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 What Is Data Mining? • Data mining (knowledge discovery from data) • Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data • Data mining: a misnomer? • Alternative names • Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. • Watch out: Is everything “data mining”? • Simple search and query processing • (Deductive) expert systems 6
  • 7. Data Mining BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 What is (not) Data Mining?  What is Data Mining? – Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area) – Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)  What is not Data Mining? – Look up phone number in phone directory – Query a Web search engine for information about “Amazon”
  • 8. Data Mining BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 Origins of Data Mining • Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems • Traditional Techniques may be unsuitable due to • Enormity of data • High dimensionality of data • Heterogeneous, distributed nature of data Machine Learning/ Pattern Recognition Statistics/ AI Data Mining Database systems
  • 9. Data Mining BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 Data Mining in Business Intelligence 9 Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Decision Making Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration Statistical Summary, Querying, and Reporting Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems
  • 10. Data Mining BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 Data Mining/KDD Process 10 Input Data Data Mining Data Pre- Processing Post- Processing Data integration Normalization Feature selection Dimension reduction Pattern discovery Association & correlation Classification Clustering Outlier analysis … … … … Pattern evaluation Pattern selection Pattern interpretation Pattern visualization
  • 11. Data Mining BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 Multi-Dimensional View of Data Mining • Data to be mined • Database data (extended-relational, object-oriented, heterogeneous, legacy), data warehouse, transactional data, stream, spatiotemporal, time-series, sequence, text and web, multi-media, graphs & social and information networks • Knowledge to be mined (or: Data mining functions) • Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis, etc. • Descriptive vs. predictive data mining • Multiple/integrated functions and mining at multiple levels • Techniques utilized • Data-intensive, data warehouse (OLAP), machine learning, statistics, pattern recognition, visualization, high-performance, etc. • Applications adapted • Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc. 11
  • 12. Data Mining BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 Data Mining & Machine Learning According to Tom M. Mitchell, Chair of Machine Learning at Carnegie Mellon University and author of the book Machine Learning (McGraw-Hill), A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with the experience E. We now have a set of objects to define machine learning: Task (T), Experience (E), and Performance (P) With a computer running a set of tasks, the experience should be leading to performance increases (to satisfy the definition) Many data mining tasks are executed successfully with help of machine learning Machine Learning: Hands-on for Developers and Technical Professionals by Jason Bell John Wiley & Sons
  • 13. Data Mining BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 Data Mining on Diverse kinds of Data Besides relational database data (from operational or analytical systems), there are many other kinds of data that have diverse forms and structures and different semantic meanings. Examples of data can be : time-related or sequence data (e.g., historical records, stock exchange data, and time-series and biological sequence data), data streams (e.g., video surveillance and sensor data, which are continuously transmitted), spatial data (e.g., maps), engineering design data (e.g., the design of buildings, system components, or integrated circuits), hypertext and multimedia data (including text, image, video, and audio data), graph and networked data (e.g., social and information networks), and the Web (a widely distributed information repository). Diversity of data brings in new challenges such as handling special structures (e.g., sequences, trees, graphs, and networks) and specific semantics (such as ordering, image, audio and video contents, and connectivity)
  • 14. Data Mining BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 Author(s), Title, Edition, Publishing House T1 Tan P. N., Steinbach M & Kumar V. “Introduction to Data Mining” Pearson Education T2 Data Mining: Concepts and Techniques, Third Edition by Jiawei Han, Micheline Kamber and Jian Pei Morgan Kaufmann Publishers Prescribed Text Books