SlideShare a Scribd company logo
Data Mining
Techniques
UNIT-I
The Introduction to Data mining
Systems
• What is Data?
• What is Database?
• What is Database Management System?
The Introduction to Data mining
Systems
• Why Data Mining?
• Data Collection and Data Availability
• Major sources of abundant data
Data Mining
• What is Data Mining?
• Data mining (knowledge discovery from data)
– Extraction of interesting (non-trivial, implicit, previously unknown and
potentially useful) patterns or knowledge from huge amount of data
• Alternative names
– Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.
• Is everything “data mining”?
– Simple search and query processing
– (Deductive) expert systems
Examples
• Examples of Data Mining
1. Marketing
2. Banking
3. Government
4. Health Care
5. Education
6. Retail Industry
7. Logistics and supply chain
Steps involved in Data Mining Process
Large-scale Data is Everywhere!
 There has been enormous data growth in both commercial and scientific
databases due to advances in data generation and collection technologies
Cyber Security E-Commerce
Traffic
Patterns
Social Networking: Twitter
Sensor Networks
Computational
Simulations
Why Data Mining? Commercial
Viewpoint
• Lots of data is being collected
and warehoused
– Web data
• Yahoo has Peta Bytes of web data
• Facebook has billions of active users
– purchases at department/
grocery stores, e-commerce
• Amazon handles millions of visits/day
– Bank/Credit Card transactions
• Computers have become cheaper and more powerful
• Competitive Pressure is Strong
– Provide better, customized services for an edge (e.g. in Customer
Relationship Management)
Great Opportunities to Solve Society’s Major Problems
Improving health care and reducing costs
Finding alternative/ green energy sources
Predicting the impact of climate change
Reducing hunger and poverty by
increasing agriculture production
Evolution of Data Mining
Evolution of Database Technology-
Summary
• 1960s-Data Collection, DB creation, network
DBMS
• 1970-Relational Data Model, relational DBMS
• 1980-RDBMS,Advanced data models,
Application oriented DBMS
• 2000s-stream data management and mining,
DM & its applications.
Basic Concepts
• Classification
• Clustering
• Supervised Learning
• Unsupervised Learning
Knowledge Discovery Process
Steps in the process of Knowledge
Discovery(KDD Process)
• Data Cleaning
• Data Integration
• Data Selection
• Data Transformation
• Data Mining
• Pattern Evaluation
• Knowledge Presentation
Kinds of Data
• What kinds of Data can be mined?
• Database Data
• Data Warehouses
• Transactional Data
• Other Kinds of Data
Database Data
• Database Management System(DBMS)
• Relational Data base: tables, attributes, tuples
• Entity-Relationship(ER Model)
• Database queries
• Mining relational database
Data Warehouses
• What is data warehouse?
• What is data cube?
• OLAP (Online Analytical Processing)
operations: drill down, roll up
Typical framework of a data
warehouse for All Electronics
A Multidimensional data cube ,commonly used for data warehousing.(a)
showing summarized data for All Electronics and b)showing summarized
data resulting from drill-down and roll-up operations on the cube .
Transactional Data
Other Kinds of Data
• Time related or sequence data
• Data streams
• Spatial data
• Engineering design data
• Hypertext and Multimedia data
• Graph and Networked data
Kinds of Patterns(Data Mining
Functionalities)
• Data Mining Tasks: Descriptive and Predictive
• DM functionalities includes:
• Characterization and Discrimination
• Mining frequent patterns, Associations and
Correlations
• Classification and Regression
• Clustering Analysis
• Outlier Analysis
• Are all patterns are interesting
Class/Concept Description: Characterization
and Discrimination
• Eg., In all electronics store, class of items for sale include computers and
printers and concepts of customers include big Spenders and budget
Spenders
• Data Characterization
• Methods for data summarization and characterization:simple data
summaries based on statistics measures and plots,data cube based OLAP
operations,attribute oriented induction techniques.
• Output of Data Characterization and Example for Data Characterization
• Data Discrimination
• Output of Data Discrimination and Example for Data Discrimination
Mining Frequent patterns,
association and correlations
• Frequent patterns:Frequent itemset,frequent subsequences,frequent
substructure
• Association Analysis:
• Eg:association rule- buys(x,”computer”) => buys(x,”software”)
predicate
[support=1%,confidence=50%]
confidence(certainity),support(under analysis)
• Single dimensional association rule
• Multidimensional association rule
Age(x,”20..29”)^ income(x,”40..49K”) =>buys(x,”laptops”)
[support=2%,confidence=60%]
• Association should satisfy both minimum threshold and minimum
confidence
Classification and regression for
predictive analysis
• What is classification and its example?
• Training data and test data
• Derived models presented by
1. Classification rules(If-then-rules)
2. Decision tree
3. Mathematical formulae
4. Neural networks
• Regression analysis
Cluster analysis and outlier analysis
Are all pattern interesting?
• Support(x=>y) =p(x U y)
• Confidence(x =>y) =p(y/x)
• Accuracy
• Coverage
• Unexpected Vs expected
Data mining Technologies
Statistics
• It is a collection, analysis, interpretation or
explanation and presentation of data.
• Statistical model
• Statistical description
• Inferential statistics or predictive statistics
• Statistical hypothesis test
Machine Learning
• What is machine learning?
• Classic problems in machine learning are:
• Supervised learning
• Unsupervised learning
• Semi-supervised learning
• Active learning
Database System, Data warehouses
& Information retrieval
• Database systems research
• Data warehouse
• Information retrieval
• Language model
• Topic model
Data Mining Applications
• Business Intelligence
• Web Search Engines
Issues in Data Mining
• Mining Methodology
• User Interaction
• Efficiency and Scalability
• Diversity of database types
• Data Mining and society
Mining Methodology
• Mining various and new kinds of knowledge
• Mining knowledge in multidimensional space
• Data Mining-an interdisciplinary effort
• Boosting the power of discovery in a networked
environment
• Handling uncertainty, noise or incompleteness of data
• Pattern evaluation and pattern-or constraint-guided
mining
User Interaction
• Interactive mining
• Incorporation of background knowledge
• Ad hoc data mining and data mining query
languages
• Presentation and visualization of data mining
results
Efficiency and Scalability
• Efficiency and scalability of data mining
algorithms
• Parallel, distributed and incremental mining
algorithms
• Cloud computing and cluster computing
Diversity of database types
• Handling complex types of data
• Mining dynamic, networked and global data
repositories
Data Mining and Society
• Social impacts of data mining
• Privacy-preserving data mining
• Invisible data mining
Summary
• Data mining: Discovering interesting patterns and knowledge from
massive amount of data
• A natural evolution of database technology, in great demand, with
wide applications
• A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
• Mining can be performed in a variety of data
• Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier and trend analysis, etc.
• Data mining technologies and applications
• Major issues in data mining
DATAWAREHOUSE:BASIC
CONCEPTS
• What is data warehouse?
• Subject-oriented, integrated, time- variant,
nonvolatile
• How are organizations using the information from
data warehouses?
- Knowledge workers
• Query driven approach(Traditional Database
approach)
• Update driven approach(Data warehousing approach)
Difference between operational database
systems and data warehouse
• What is OLTP and OLAP?
- Online transaction processing(OLTP)
- Online analytical processing (OLAP)
• Major features /differences between OLTP & OLAP
systems
-User and system orientation
-Data Contents
-Database design
-View
-Access patterns
Why have a separate Data
Warehouse?
• DBMS
• Data Warehouse
• Different functions and different data
-Missing data
-Data consolidation
-Data Quality
Data warehousing: A multiered
architecture
• Bottom tier: Data Warehouse Server
-Data Sources
-Gateways
• Middle tier: OLAP server
-ROLAP(Relational OLAP)server
-MOLAP(Multidimensional OLAP)
• Top tier: Front-end tools
A three-tier data warehousing
architecture
Data Warehouse Models
• Enterprise warehouse
• Data Mart
• Virtual warehouse
• Types of Data Mart
-Independent Data Mart
-Dependent Data Mart
• Data warehouse development
-Top-down approach &Bottom-up approach to
DataWarehouse development
A recommended approach for
data warehouse development
Data Warehouse Models
• High-level corporate data model is defined
within short period
• Enterprise and Department Data Marts
• Distributed Data Marts
• Multitier Data Warehouse
Extraction, Transformation and
Loading
• Data Extraction
• Data Cleaning
• Data Transformation
• Load
• Refresh
Metadata Repository
• Description of the data warehouse structure
• Operational metadata
-Data lineage
-Currency of data
-Monitoring Information
• Algorithms used for summarization
• Mapping from the operational environment to data
warehouse
• Data related to system performance
• Business metadata
Data warehouse modeling: Data
Cube and OLAP
• What is data cube?
• Facts
• Fact table
• Lattice of cuboids
• Base cuboid
• Apex cuboid
2D,3D,4D-Data Cube
3D Data Cube
4DData Cube
Schemas for multidimensional
data models
• Star schema
• Snow flake schema
• Fact constellation schema
Star Schema
Snow flake Schema
Fact Constellation
Schema Hierarchy Vs Set-Grouping
Hierarchy
• Data warehouse Vs Data Mart
• Dimensions: The role of Concept Hierarchies
-set of low level concepts to higher level,
more general concepts
Set grouping Hierarchy
• Discretizing or grouping values for a given
dimension or attributes
Measures: Their Categorization and
Computation
• Distributive
• Algebraic
• Holistic
Typical OLAP operations
• Roll-up(drill-up)
• Drill –down(reverse of roll-up)
• Slice & Dice
• Pivot
• Other operations: drill-across, drill-through
• OLAP systems Vs Statistical Databases
-Starnet query model for querying
multidimensional database:radial lines,foot
print
Slice,dice,pivot
starnet query model
END OF THE UNIT-1

More Related Content

What's hot

Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Jason Rodrigues
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
Archana Swaminathan
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
Trinath
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
varshakumar21
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
Sushil Kulkarni
 
Data preparation
Data preparationData preparation
Data preparation
Tony Nguyen
 
Dm from databases perspective u 1
Dm from databases perspective u 1Dm from databases perspective u 1
Dm from databases perspective u 1
sakthyvel3
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
Sanzid Kawsar
 
Data mining PPT
Data mining PPTData mining PPT
Data mining PPTKapil Rode
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
Dr. C.V. Suresh Babu
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
DataminingTools Inc
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
MITS Gwalior
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
Vishal Patel
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & Applications
Fazle Rabbi Ador
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Shruti Dalela
 

What's hot (20)

Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Decision tree
Decision treeDecision tree
Decision tree
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Data preparation
Data preparationData preparation
Data preparation
 
Dm from databases perspective u 1
Dm from databases perspective u 1Dm from databases perspective u 1
Dm from databases perspective u 1
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
Data mining PPT
Data mining PPTData mining PPT
Data mining PPT
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data models
Data modelsData models
Data models
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & Applications
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 

Similar to Data mining techniques unit 1

Data mining concept and methods for basic
Data mining concept and methods for basicData mining concept and methods for basic
Data mining concept and methods for basic
NivaTripathy2
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
AsifImran37
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
IfedayoOladeji1
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
ImXaib
 
Information_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.pptInformation_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.ppt
PrasadG76
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
thamizh arasi
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
Dr Pradhan PL Pradhan
 
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptxChapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
kusamee0
 
Compilerpt
CompilerptCompilerpt
Compilerpt
Muhammad Tahir
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptx
shumPanwar
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 abhagathk
 
dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
nayanakarsh469
 
Data_Mining_Applications of various kinds .ppt
Data_Mining_Applications of various kinds .pptData_Mining_Applications of various kinds .ppt
Data_Mining_Applications of various kinds .ppt
sadeshcsevelalar
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
Dhilsath Fathima
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slides
tafosepsdfasg
 
Datamininglecture
DatamininglectureDatamininglecture
Datamininglecture
Manish Rana
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
Wake Tech BAS
 
Data mining
Data miningData mining
Data mining
Akanksha Yadav
 
Data Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).pptData Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).ppt
AravindReddy565690
 

Similar to Data mining techniques unit 1 (20)

Data mining concept and methods for basic
Data mining concept and methods for basicData mining concept and methods for basic
Data mining concept and methods for basic
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
 
Information_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.pptInformation_System_and_Data_mining12.ppt
Information_System_and_Data_mining12.ppt
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptxChapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
 
Compilerpt
CompilerptCompilerpt
Compilerpt
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptx
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
 
Data_Mining_Applications of various kinds .ppt
Data_Mining_Applications of various kinds .pptData_Mining_Applications of various kinds .ppt
Data_Mining_Applications of various kinds .ppt
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slides
 
Datamininglecture
DatamininglectureDatamininglecture
Datamininglecture
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 
Data mining
Data miningData mining
Data mining
 
Data Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).pptData Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).ppt
 
dwm.pptx
dwm.pptxdwm.pptx
dwm.pptx
 

More from malathieswaran29

Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
malathieswaran29
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit III
malathieswaran29
 
Data mining techniques unit 2
Data mining techniques unit 2Data mining techniques unit 2
Data mining techniques unit 2
malathieswaran29
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
malathieswaran29
 
Bitcoin data mining
Bitcoin data miningBitcoin data mining
Bitcoin data mining
malathieswaran29
 
Principles of management organizing & reengineering
Principles of management organizing & reengineeringPrinciples of management organizing & reengineering
Principles of management organizing & reengineering
malathieswaran29
 
Principles of management human factor & motivation
Principles of management human factor & motivationPrinciples of management human factor & motivation
Principles of management human factor & motivation
malathieswaran29
 
Principles given by fayol
Principles given by fayolPrinciples given by fayol
Principles given by fayol
malathieswaran29
 
Software maintenance real world maintenance cost
Software maintenance real world maintenance costSoftware maintenance real world maintenance cost
Software maintenance real world maintenance cost
malathieswaran29
 
SOFTWARE MAINTENANCE -4
SOFTWARE MAINTENANCE -4SOFTWARE MAINTENANCE -4
SOFTWARE MAINTENANCE -4
malathieswaran29
 
SOFTWARE MAINTENANCE -3
SOFTWARE MAINTENANCE -3SOFTWARE MAINTENANCE -3
SOFTWARE MAINTENANCE -3
malathieswaran29
 
SOFTWARE MAINTENANCE -2
SOFTWARE MAINTENANCE -2SOFTWARE MAINTENANCE -2
SOFTWARE MAINTENANCE -2
malathieswaran29
 
SOFTWARE MAINTENANCE -1
SOFTWARE MAINTENANCE -1SOFTWARE MAINTENANCE -1
SOFTWARE MAINTENANCE -1
malathieswaran29
 
SOFTWARE MAINTENANCE- 5
SOFTWARE MAINTENANCE- 5SOFTWARE MAINTENANCE- 5
SOFTWARE MAINTENANCE- 5
malathieswaran29
 

More from malathieswaran29 (14)

Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit III
 
Data mining techniques unit 2
Data mining techniques unit 2Data mining techniques unit 2
Data mining techniques unit 2
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
Bitcoin data mining
Bitcoin data miningBitcoin data mining
Bitcoin data mining
 
Principles of management organizing & reengineering
Principles of management organizing & reengineeringPrinciples of management organizing & reengineering
Principles of management organizing & reengineering
 
Principles of management human factor & motivation
Principles of management human factor & motivationPrinciples of management human factor & motivation
Principles of management human factor & motivation
 
Principles given by fayol
Principles given by fayolPrinciples given by fayol
Principles given by fayol
 
Software maintenance real world maintenance cost
Software maintenance real world maintenance costSoftware maintenance real world maintenance cost
Software maintenance real world maintenance cost
 
SOFTWARE MAINTENANCE -4
SOFTWARE MAINTENANCE -4SOFTWARE MAINTENANCE -4
SOFTWARE MAINTENANCE -4
 
SOFTWARE MAINTENANCE -3
SOFTWARE MAINTENANCE -3SOFTWARE MAINTENANCE -3
SOFTWARE MAINTENANCE -3
 
SOFTWARE MAINTENANCE -2
SOFTWARE MAINTENANCE -2SOFTWARE MAINTENANCE -2
SOFTWARE MAINTENANCE -2
 
SOFTWARE MAINTENANCE -1
SOFTWARE MAINTENANCE -1SOFTWARE MAINTENANCE -1
SOFTWARE MAINTENANCE -1
 
SOFTWARE MAINTENANCE- 5
SOFTWARE MAINTENANCE- 5SOFTWARE MAINTENANCE- 5
SOFTWARE MAINTENANCE- 5
 

Recently uploaded

Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
ShahidSultan24
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 

Recently uploaded (20)

Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 

Data mining techniques unit 1

  • 2. The Introduction to Data mining Systems • What is Data? • What is Database? • What is Database Management System?
  • 3. The Introduction to Data mining Systems • Why Data Mining? • Data Collection and Data Availability • Major sources of abundant data
  • 4. Data Mining • What is Data Mining? • Data mining (knowledge discovery from data) – Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data • Alternative names – Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. • Is everything “data mining”? – Simple search and query processing – (Deductive) expert systems
  • 5. Examples • Examples of Data Mining 1. Marketing 2. Banking 3. Government 4. Health Care 5. Education 6. Retail Industry 7. Logistics and supply chain
  • 6. Steps involved in Data Mining Process
  • 7. Large-scale Data is Everywhere!  There has been enormous data growth in both commercial and scientific databases due to advances in data generation and collection technologies Cyber Security E-Commerce Traffic Patterns Social Networking: Twitter Sensor Networks Computational Simulations
  • 8. Why Data Mining? Commercial Viewpoint • Lots of data is being collected and warehoused – Web data • Yahoo has Peta Bytes of web data • Facebook has billions of active users – purchases at department/ grocery stores, e-commerce • Amazon handles millions of visits/day – Bank/Credit Card transactions • Computers have become cheaper and more powerful • Competitive Pressure is Strong – Provide better, customized services for an edge (e.g. in Customer Relationship Management)
  • 9. Great Opportunities to Solve Society’s Major Problems Improving health care and reducing costs Finding alternative/ green energy sources Predicting the impact of climate change Reducing hunger and poverty by increasing agriculture production
  • 11. Evolution of Database Technology- Summary • 1960s-Data Collection, DB creation, network DBMS • 1970-Relational Data Model, relational DBMS • 1980-RDBMS,Advanced data models, Application oriented DBMS • 2000s-stream data management and mining, DM & its applications.
  • 12. Basic Concepts • Classification • Clustering • Supervised Learning • Unsupervised Learning
  • 14. Steps in the process of Knowledge Discovery(KDD Process) • Data Cleaning • Data Integration • Data Selection • Data Transformation • Data Mining • Pattern Evaluation • Knowledge Presentation
  • 15. Kinds of Data • What kinds of Data can be mined? • Database Data • Data Warehouses • Transactional Data • Other Kinds of Data
  • 16. Database Data • Database Management System(DBMS) • Relational Data base: tables, attributes, tuples • Entity-Relationship(ER Model) • Database queries • Mining relational database
  • 17.
  • 18. Data Warehouses • What is data warehouse? • What is data cube? • OLAP (Online Analytical Processing) operations: drill down, roll up
  • 19. Typical framework of a data warehouse for All Electronics
  • 20. A Multidimensional data cube ,commonly used for data warehousing.(a) showing summarized data for All Electronics and b)showing summarized data resulting from drill-down and roll-up operations on the cube .
  • 22. Other Kinds of Data • Time related or sequence data • Data streams • Spatial data • Engineering design data • Hypertext and Multimedia data • Graph and Networked data
  • 23. Kinds of Patterns(Data Mining Functionalities) • Data Mining Tasks: Descriptive and Predictive • DM functionalities includes: • Characterization and Discrimination • Mining frequent patterns, Associations and Correlations • Classification and Regression • Clustering Analysis • Outlier Analysis • Are all patterns are interesting
  • 24. Class/Concept Description: Characterization and Discrimination • Eg., In all electronics store, class of items for sale include computers and printers and concepts of customers include big Spenders and budget Spenders • Data Characterization • Methods for data summarization and characterization:simple data summaries based on statistics measures and plots,data cube based OLAP operations,attribute oriented induction techniques. • Output of Data Characterization and Example for Data Characterization • Data Discrimination • Output of Data Discrimination and Example for Data Discrimination
  • 25. Mining Frequent patterns, association and correlations • Frequent patterns:Frequent itemset,frequent subsequences,frequent substructure • Association Analysis: • Eg:association rule- buys(x,”computer”) => buys(x,”software”) predicate [support=1%,confidence=50%] confidence(certainity),support(under analysis) • Single dimensional association rule • Multidimensional association rule Age(x,”20..29”)^ income(x,”40..49K”) =>buys(x,”laptops”) [support=2%,confidence=60%] • Association should satisfy both minimum threshold and minimum confidence
  • 26. Classification and regression for predictive analysis • What is classification and its example? • Training data and test data • Derived models presented by 1. Classification rules(If-then-rules) 2. Decision tree 3. Mathematical formulae 4. Neural networks • Regression analysis
  • 27.
  • 28. Cluster analysis and outlier analysis
  • 29. Are all pattern interesting? • Support(x=>y) =p(x U y) • Confidence(x =>y) =p(y/x) • Accuracy • Coverage • Unexpected Vs expected
  • 31. Statistics • It is a collection, analysis, interpretation or explanation and presentation of data. • Statistical model • Statistical description • Inferential statistics or predictive statistics • Statistical hypothesis test
  • 32. Machine Learning • What is machine learning? • Classic problems in machine learning are: • Supervised learning • Unsupervised learning • Semi-supervised learning • Active learning
  • 33.
  • 34.
  • 35. Database System, Data warehouses & Information retrieval • Database systems research • Data warehouse • Information retrieval • Language model • Topic model
  • 36. Data Mining Applications • Business Intelligence • Web Search Engines
  • 37. Issues in Data Mining • Mining Methodology • User Interaction • Efficiency and Scalability • Diversity of database types • Data Mining and society
  • 38. Mining Methodology • Mining various and new kinds of knowledge • Mining knowledge in multidimensional space • Data Mining-an interdisciplinary effort • Boosting the power of discovery in a networked environment • Handling uncertainty, noise or incompleteness of data • Pattern evaluation and pattern-or constraint-guided mining
  • 39. User Interaction • Interactive mining • Incorporation of background knowledge • Ad hoc data mining and data mining query languages • Presentation and visualization of data mining results
  • 40. Efficiency and Scalability • Efficiency and scalability of data mining algorithms • Parallel, distributed and incremental mining algorithms • Cloud computing and cluster computing
  • 41. Diversity of database types • Handling complex types of data • Mining dynamic, networked and global data repositories
  • 42. Data Mining and Society • Social impacts of data mining • Privacy-preserving data mining • Invisible data mining
  • 43. Summary • Data mining: Discovering interesting patterns and knowledge from massive amount of data • A natural evolution of database technology, in great demand, with wide applications • A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation • Mining can be performed in a variety of data • Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc. • Data mining technologies and applications • Major issues in data mining
  • 44. DATAWAREHOUSE:BASIC CONCEPTS • What is data warehouse? • Subject-oriented, integrated, time- variant, nonvolatile • How are organizations using the information from data warehouses? - Knowledge workers • Query driven approach(Traditional Database approach) • Update driven approach(Data warehousing approach)
  • 45. Difference between operational database systems and data warehouse • What is OLTP and OLAP? - Online transaction processing(OLTP) - Online analytical processing (OLAP) • Major features /differences between OLTP & OLAP systems -User and system orientation -Data Contents -Database design -View -Access patterns
  • 46. Why have a separate Data Warehouse? • DBMS • Data Warehouse • Different functions and different data -Missing data -Data consolidation -Data Quality
  • 47. Data warehousing: A multiered architecture • Bottom tier: Data Warehouse Server -Data Sources -Gateways • Middle tier: OLAP server -ROLAP(Relational OLAP)server -MOLAP(Multidimensional OLAP) • Top tier: Front-end tools
  • 48. A three-tier data warehousing architecture
  • 49. Data Warehouse Models • Enterprise warehouse • Data Mart • Virtual warehouse • Types of Data Mart -Independent Data Mart -Dependent Data Mart • Data warehouse development -Top-down approach &Bottom-up approach to DataWarehouse development
  • 50. A recommended approach for data warehouse development
  • 51. Data Warehouse Models • High-level corporate data model is defined within short period • Enterprise and Department Data Marts • Distributed Data Marts • Multitier Data Warehouse
  • 52. Extraction, Transformation and Loading • Data Extraction • Data Cleaning • Data Transformation • Load • Refresh
  • 53. Metadata Repository • Description of the data warehouse structure • Operational metadata -Data lineage -Currency of data -Monitoring Information • Algorithms used for summarization • Mapping from the operational environment to data warehouse • Data related to system performance • Business metadata
  • 54. Data warehouse modeling: Data Cube and OLAP • What is data cube? • Facts • Fact table • Lattice of cuboids • Base cuboid • Apex cuboid
  • 58.
  • 59. Schemas for multidimensional data models • Star schema • Snow flake schema • Fact constellation schema
  • 63. Schema Hierarchy Vs Set-Grouping Hierarchy • Data warehouse Vs Data Mart • Dimensions: The role of Concept Hierarchies -set of low level concepts to higher level, more general concepts
  • 64. Set grouping Hierarchy • Discretizing or grouping values for a given dimension or attributes
  • 65. Measures: Their Categorization and Computation • Distributive • Algebraic • Holistic
  • 66. Typical OLAP operations • Roll-up(drill-up) • Drill –down(reverse of roll-up) • Slice & Dice • Pivot • Other operations: drill-across, drill-through • OLAP systems Vs Statistical Databases -Starnet query model for querying multidimensional database:radial lines,foot print
  • 69. END OF THE UNIT-1