SlideShare a Scribd company logo
1 of 18
Introduction on Data Mining
What is Data Mining Non-trivial extraction of implicit, previously unknown and potentially useful information from data Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns Data mining is the process of automatically discovering useful information in large data repositories 	--
Simple Examples for Data Mining ,[object Object]
Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,),[object Object]
Origins of Data Mining Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems Traditional Techniquesmay be unsuitable due to  Enormity of data High dimensionality of data Heterogeneous, distributed nature of data
Data Mining Tasks Prediction Methods Use some variables to predict unknown or future values of other variables Description Methods Find human-interpretable patterns that describe the data.
Data Mining Tasks Classification [Predictive] Clustering [Descriptive] Association Rule Discovery [Descriptive] Sequential Pattern Discovery [Descriptive] Regression [Predictive] Deviation Detection [Predictive]
Classification: Definition It is used for discrete target variables Ex: predicting whether a Web user will make a purchase at  an online store is an classification tasks because the target variabe is binary-valued.
Clustering: Definition -	Clustering  analysis  seeks to find groups of closely related observations that belong to the same cluster are more similar to each other than observations  that observations that belong s to other clusters.  Ex:           -to find areas of ocean that have aq significant impact on the earth’s climate.
Association Rule Discovery: Definition 	Given a set of records each of which contain some number of items from a given collection; Produce dependency rules which will predict occurrence of an item based on occurrences of other items.
Contd… Rules Discovered: {Milk} --> {Coke}     {Diaper, Milk} --> {Beer}
Sequential Pattern Discovery: Definition 	Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events. (A   B)     (C)  --->   (D   E)
Contd… 	Rules are formed by first disovering patterns. Event occurrences in the patterns are governed by timing constraints. (A   B)     (C)    (D   E) <= xg  >ng <= ws <= ms
Sequential Pattern Discovery: Example 	 In telecommunications alarm logs,  (Inverter_ProblemExcessive_Line_Current)          (Rectifier_Alarm) --> (Fire_Alarm)
Regression 	Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency. Greatly studied in statistics, neural network fields.
Regression-examples 	Predicting sales amounts of new product based on advertising expenditure. Predicting wind velocities as a function of temperature, humidity, air pressure, etc. Time series prediction of stock market indices.
Deviation/Anomaly Detection Detect significant deviations from normal behavior Applications: Credit Card Fraud Detection Network Intrusion Detection
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net

More Related Content

Viewers also liked

certificate_QA_Course
certificate_QA_Coursecertificate_QA_Course
certificate_QA_Course
Ivan Atanasov
 
Improving Artificial Intelligence by Studying the Brain
Improving Artificial Intelligence by Studying the BrainImproving Artificial Intelligence by Studying the Brain
Improving Artificial Intelligence by Studying the Brain
Tahoe Silicon Mountain
 

Viewers also liked (19)

The Girlfriends' Guide of Financial Savvy
The Girlfriends' Guide of Financial SavvyThe Girlfriends' Guide of Financial Savvy
The Girlfriends' Guide of Financial Savvy
 
List 44
List 44List 44
List 44
 
Bài giảng tmđt75
Bài giảng tmđt75Bài giảng tmđt75
Bài giảng tmđt75
 
Voys Samen - Assen Onderneemt
Voys Samen - Assen OnderneemtVoys Samen - Assen Onderneemt
Voys Samen - Assen Onderneemt
 
Data
DataData
Data
 
Nuove frontiere della bilateralità lombarda
Nuove frontiere della bilateralità lombardaNuove frontiere della bilateralità lombarda
Nuove frontiere della bilateralità lombarda
 
Apresentação i9life
Apresentação i9lifeApresentação i9life
Apresentação i9life
 
Best Designer Perfumes for Women
Best Designer Perfumes for WomenBest Designer Perfumes for Women
Best Designer Perfumes for Women
 
certificate_QA_Course
certificate_QA_Coursecertificate_QA_Course
certificate_QA_Course
 
PR3 Types and Styles of Music Based Programming
PR3 Types and Styles of Music Based ProgrammingPR3 Types and Styles of Music Based Programming
PR3 Types and Styles of Music Based Programming
 
San jose island panama 1945
San jose island panama 1945San jose island panama 1945
San jose island panama 1945
 
Data Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer DiagnosisData Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer Diagnosis
 
Jenifer Carter Resume
Jenifer Carter ResumeJenifer Carter Resume
Jenifer Carter Resume
 
Improving Artificial Intelligence by Studying the Brain
Improving Artificial Intelligence by Studying the BrainImproving Artificial Intelligence by Studying the Brain
Improving Artificial Intelligence by Studying the Brain
 
Role of media in crisis and Disaster
Role of media in crisis and Disaster Role of media in crisis and Disaster
Role of media in crisis and Disaster
 
E tips educational resources online
E tips educational resources onlineE tips educational resources online
E tips educational resources online
 
Managing your supervisor
Managing your supervisorManaging your supervisor
Managing your supervisor
 
Báo cáo bài tập lớn
Báo cáo bài tập lớnBáo cáo bài tập lớn
Báo cáo bài tập lớn
 
Staying Ahead of the Game - The Steps to Effective Crisis Communications Plan...
Staying Ahead of the Game - The Steps to Effective Crisis Communications Plan...Staying Ahead of the Game - The Steps to Effective Crisis Communications Plan...
Staying Ahead of the Game - The Steps to Effective Crisis Communications Plan...
 

Similar to Introduction to data mining

chapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining pptchapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining ppt
GyanaKarn
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
Tony Nguyen
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
Luis Goldster
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
James Wong
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
Harry Potter
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
Young Alista
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
Fraboni Ec
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
Niyitegekabilly
 

Similar to Introduction to data mining (20)

Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
chapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining pptchapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining ppt
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Dwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basisDwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basis
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Dwd mdatamining intro-iep
Dwd mdatamining intro-iepDwd mdatamining intro-iep
Dwd mdatamining intro-iep
 
Data mining
Data miningData mining
Data mining
 
Datamining intro-iep
Datamining intro-iepDatamining intro-iep
Datamining intro-iep
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
 
Data mining
Data miningData mining
Data mining
 

More from Datamining Tools

More from Datamining Tools (20)

Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technologyData Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technology
 
Data MIning: Data processing
Data MIning: Data processingData MIning: Data processing
Data MIning: Data processing
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Data mining: Classification and Prediction
Data mining: Classification and PredictionData mining: Classification and Prediction
Data mining: Classification and Prediction
 
Data Mining: Data mining classification and analysis
Data Mining: Data mining classification and analysisData Mining: Data mining classification and analysis
Data Mining: Data mining classification and analysis
 
Data Mining: Data mining and key definitions
Data Mining: Data mining and key definitionsData Mining: Data mining and key definitions
Data Mining: Data mining and key definitions
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
 
Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data mining
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI  2AI: Learning in AI  2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 

Recently uploaded

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Introduction to data mining

  • 2. What is Data Mining Non-trivial extraction of implicit, previously unknown and potentially useful information from data Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns Data mining is the process of automatically discovering useful information in large data repositories --
  • 3.
  • 4.
  • 5. Origins of Data Mining Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems Traditional Techniquesmay be unsuitable due to Enormity of data High dimensionality of data Heterogeneous, distributed nature of data
  • 6. Data Mining Tasks Prediction Methods Use some variables to predict unknown or future values of other variables Description Methods Find human-interpretable patterns that describe the data.
  • 7. Data Mining Tasks Classification [Predictive] Clustering [Descriptive] Association Rule Discovery [Descriptive] Sequential Pattern Discovery [Descriptive] Regression [Predictive] Deviation Detection [Predictive]
  • 8. Classification: Definition It is used for discrete target variables Ex: predicting whether a Web user will make a purchase at an online store is an classification tasks because the target variabe is binary-valued.
  • 9. Clustering: Definition - Clustering analysis seeks to find groups of closely related observations that belong to the same cluster are more similar to each other than observations that observations that belong s to other clusters. Ex: -to find areas of ocean that have aq significant impact on the earth’s climate.
  • 10. Association Rule Discovery: Definition Given a set of records each of which contain some number of items from a given collection; Produce dependency rules which will predict occurrence of an item based on occurrences of other items.
  • 11. Contd… Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}
  • 12. Sequential Pattern Discovery: Definition Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events. (A B) (C) ---> (D E)
  • 13. Contd… Rules are formed by first disovering patterns. Event occurrences in the patterns are governed by timing constraints. (A B) (C) (D E) <= xg >ng <= ws <= ms
  • 14. Sequential Pattern Discovery: Example In telecommunications alarm logs, (Inverter_ProblemExcessive_Line_Current) (Rectifier_Alarm) --> (Fire_Alarm)
  • 15. Regression Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency. Greatly studied in statistics, neural network fields.
  • 16. Regression-examples Predicting sales amounts of new product based on advertising expenditure. Predicting wind velocities as a function of temperature, humidity, air pressure, etc. Time series prediction of stock market indices.
  • 17. Deviation/Anomaly Detection Detect significant deviations from normal behavior Applications: Credit Card Fraud Detection Network Intrusion Detection
  • 18. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net