SlideShare a Scribd company logo
1 of 18
Data Mining Input: Concepts, Instances, and Attributes
Input takes the following forms: ,[object Object]
Intelligible in that it can be understood
Operational in that it can be applied to actual examples
Instances: The data present consists of various instances of the class. E.g. the table below consists of 2 instances
Attributes: Each instance of the class has various attributes. E.g. the table bellow consists of two attributes {Name, Age},[object Object]
Learning scheme is presented with a set of classified examples from which it is expected to learn a way of classifying unseen examples
Also called supervised learning
E.g. Classification rules for the weather forecasting problem      If outlook = sunny and humidity = high then play = no       If outlook = rainy and windy = true         then play = no       If outlook = overcast                                   then play =  yes
[object Object]
Same as classification learning but the outcome to be predicted is not a discreet class but a numeric quantity
Clustering
Groups of examples that belong together are sought and clubbed together in a cluster
E.g. based on the data with a bank the following relation between debt and income was seen:,[object Object]
Any association among features is sought, not just ones that predict a particular class value
It predicts any attribute, not just the class
It can predict more than one attribute value at a time
E.g. from the following super market data it can be concluded: If milk and bread is bought, customers also buy butter,[object Object]

More Related Content

What's hot

13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMS
koolkampus
 

What's hot (20)

Codds rule
Codds ruleCodds rule
Codds rule
 
Data mining query language
Data mining query languageData mining query language
Data mining query language
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMS
 
Query-porcessing-& Query optimization
Query-porcessing-& Query optimizationQuery-porcessing-& Query optimization
Query-porcessing-& Query optimization
 
Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)
Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)
Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
 
data modeling and models
data modeling and modelsdata modeling and models
data modeling and models
 
planning & project management for DWH
planning & project management for DWHplanning & project management for DWH
planning & project management for DWH
 
Object Relational Database Management System(ORDBMS)
Object Relational Database Management System(ORDBMS)Object Relational Database Management System(ORDBMS)
Object Relational Database Management System(ORDBMS)
 
Multi dimensional model vs (1)
Multi dimensional model vs (1)Multi dimensional model vs (1)
Multi dimensional model vs (1)
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Introduction to-sql
Introduction to-sqlIntroduction to-sql
Introduction to-sql
 
Object Oriented Dbms
Object Oriented DbmsObject Oriented Dbms
Object Oriented Dbms
 
Bayes network
Bayes networkBayes network
Bayes network
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5 Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
 
Datawarehouse olap olam
Datawarehouse olap olamDatawarehouse olap olam
Datawarehouse olap olam
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 
Query optimization
Query optimizationQuery optimization
Query optimization
 
Query processing strategies in distributed database
Query processing strategies in distributed databaseQuery processing strategies in distributed database
Query processing strategies in distributed database
 

Viewers also liked

Weka presentation
Weka presentationWeka presentation
Weka presentation
Saeed Iqbal
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
rathorenitin87
 
WEKA Tutorial
WEKA TutorialWEKA Tutorial
WEKA Tutorial
butest
 
Sesión mat resolvemos problemas de equilibrio copia
Sesión mat resolvemos problemas de equilibrio   copiaSesión mat resolvemos problemas de equilibrio   copia
Sesión mat resolvemos problemas de equilibrio copia
SOTO ZOTITO
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKA
butest
 
Aprendizagem Supervisionada I
Aprendizagem Supervisionada IAprendizagem Supervisionada I
Aprendizagem Supervisionada I
Luís Nunes
 

Viewers also liked (20)

Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Weka
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
WEKA Tutorial
WEKA TutorialWEKA Tutorial
WEKA Tutorial
 
WEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic MethodsWEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic Methods
 
WEKA: The Experimenter
WEKA: The ExperimenterWEKA: The Experimenter
WEKA: The Experimenter
 
WEKA: The Knowledge Flow Interface
WEKA: The Knowledge Flow InterfaceWEKA: The Knowledge Flow Interface
WEKA: The Knowledge Flow Interface
 
Data For Datamining
Data For DataminingData For Datamining
Data For Datamining
 
Query Directed Data Mining
Query Directed Data MiningQuery Directed Data Mining
Query Directed Data Mining
 
K nearest neighbor classification over semantically secure encrypted relation...
K nearest neighbor classification over semantically secure encrypted relation...K nearest neighbor classification over semantically secure encrypted relation...
K nearest neighbor classification over semantically secure encrypted relation...
 
Sesión mat resolvemos problemas de equilibrio copia
Sesión mat resolvemos problemas de equilibrio   copiaSesión mat resolvemos problemas de equilibrio   copia
Sesión mat resolvemos problemas de equilibrio copia
 
WEKA:Output Knowledge Representation
WEKA:Output Knowledge RepresentationWEKA:Output Knowledge Representation
WEKA:Output Knowledge Representation
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKA
 
Fun with Python
Fun with PythonFun with Python
Fun with Python
 
Dummy variables xd
Dummy variables xdDummy variables xd
Dummy variables xd
 
Data Visualization(s) Using Python
Data Visualization(s) Using PythonData Visualization(s) Using Python
Data Visualization(s) Using Python
 
WEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And AttributesWEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Data Mining Input Concepts Instances And Attributes
 
Aprendizagem Supervisionada I
Aprendizagem Supervisionada IAprendizagem Supervisionada I
Aprendizagem Supervisionada I
 

Similar to WEKA: Data Mining Input Concepts Instances And Attributes

Similar to WEKA: Data Mining Input Concepts Instances And Attributes (20)

Data structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pdData structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pd
 
Lect 1-2 Zaheer Abbas
Lect 1-2 Zaheer AbbasLect 1-2 Zaheer Abbas
Lect 1-2 Zaheer Abbas
 
Data Structures & Algorithms
Data Structures & AlgorithmsData Structures & Algorithms
Data Structures & Algorithms
 
1- Introduction.pptx.pdf
1- Introduction.pptx.pdf1- Introduction.pptx.pdf
1- Introduction.pptx.pdf
 
introduction of database in DBMS
introduction of database in DBMSintroduction of database in DBMS
introduction of database in DBMS
 
Lect 1-2
Lect 1-2Lect 1-2
Lect 1-2
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
COMPUTERS Database
COMPUTERS Database COMPUTERS Database
COMPUTERS Database
 
Chapter 1 Introduction to Data Structures and Algorithms.pdf
Chapter 1 Introduction to Data Structures and Algorithms.pdfChapter 1 Introduction to Data Structures and Algorithms.pdf
Chapter 1 Introduction to Data Structures and Algorithms.pdf
 
DATA STRUCTURE AND ALGORITHMS
DATA STRUCTURE AND ALGORITHMS DATA STRUCTURE AND ALGORITHMS
DATA STRUCTURE AND ALGORITHMS
 
Bc0041
Bc0041Bc0041
Bc0041
 
UNIT II.docx
UNIT II.docxUNIT II.docx
UNIT II.docx
 
Dsa unit 1
Dsa unit 1Dsa unit 1
Dsa unit 1
 
Modeling System Requirements
Modeling System RequirementsModeling System Requirements
Modeling System Requirements
 
Dimensional data model
Dimensional data modelDimensional data model
Dimensional data model
 
UNIT I - Data Structures.pdf
UNIT I - Data Structures.pdfUNIT I - Data Structures.pdf
UNIT I - Data Structures.pdf
 
Database system
Database system Database system
Database system
 
Data resource management
Data resource managementData resource management
Data resource management
 
Chapter 1 - Introduction to Data Structure.ppt
Chapter 1 - Introduction to Data Structure.pptChapter 1 - Introduction to Data Structure.ppt
Chapter 1 - Introduction to Data Structure.ppt
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R Notes
 

More from DataminingTools Inc

More from DataminingTools Inc (20)

Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

WEKA: Data Mining Input Concepts Instances And Attributes

  • 1. Data Mining Input: Concepts, Instances, and Attributes
  • 2.
  • 3. Intelligible in that it can be understood
  • 4. Operational in that it can be applied to actual examples
  • 5. Instances: The data present consists of various instances of the class. E.g. the table below consists of 2 instances
  • 6.
  • 7. Learning scheme is presented with a set of classified examples from which it is expected to learn a way of classifying unseen examples
  • 9. E.g. Classification rules for the weather forecasting problem If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes
  • 10.
  • 11. Same as classification learning but the outcome to be predicted is not a discreet class but a numeric quantity
  • 13. Groups of examples that belong together are sought and clubbed together in a cluster
  • 14.
  • 15. Any association among features is sought, not just ones that predict a particular class value
  • 16. It predicts any attribute, not just the class
  • 17. It can predict more than one attribute value at a time
  • 18.
  • 19. Flat file: Each dataset is represented as a matrix of instances versus attributes, which in database terms is a single relationship, or a flat file
  • 20.
  • 21.
  • 22. Independence can be achieved by de-normalization
  • 23. In database terms, take two relations and join them together to make one, a process of flattening that is technically called de-normalization
  • 24.
  • 25. We are trying to find ‘Sister of’ relation ship Each row of tree mapped to instances: We cant make sense of this with respect to our requirement or concept. Therefore …….
  • 26. We de-normalize these tables to get: Here we can clearly see the ‘Sister of’ relationship
  • 27. Problems with de-normalization: If relationship between large number of items is required then tables will be huge It produces irregularities in data that are completely spurious Relations might not be finite (use: Inductive logic programming) Overlay data: Sometimes data relevant to the problem at hand needs to be collected from outside of the organization. This is called overlay data.
  • 28. Data Integration Integration of system wide databases is difficult because different departments will use/have: Different style of record keeping Different conventions Different degrees of data aggregations etc Different types of errors Different time period Different primary keys These issues are taken care by the idea of company wide databases, a process called as data warehousing
  • 29. Data Cleaning Data cleaning is the careful checking of data It helps in resolving many architectural issues with different databases Data cleaning usually requires good domain knowledge
  • 30. Attribute-Relation File Format (ARFF) Definition: An ARFF file is an ASCII text file that describes a list of instances sharing a set of attributes Conventions used in ARFF : ARFF Header Line beginning with % are comments To declare relation: @relation <name of relation> To declare attribute: @attribute <attribute> <data type> ARFF Data Section To start the actual data: @data, followed by row wise CS data
  • 31. Data type for ARFF: Numeric can be real or integer numbers Nominal values are defined by providing <nominal-specification> listing the possible values: {nm-value1, nm-value2,…} e.g. {yes, no} Values separated by space must be quoted String attributes allow us to create attributes containing arbitrary textual values Date type is used as: @attribute <name> date [<date-format>] The default date format is ISO-8601 combined date and time format:”yyyy-MM-dd’T’HH:mm:ss” Missing values are represented by ?
  • 32. Sparse ARFF files Sparse ARFF files are very similar to ARFF files, but data with value 0 are not be explicitly represented Same header as ARFF but different data section. Instead of representing each value in order, like this: @data 0, X, 0, Y, “class A” The non zero attributes are explicitly identified by attribute number(starting from zero) and their value stated , like this: @data {1X, 3Y,4 “class A”}
  • 33. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net