DATA FOR DATA MINING
Types of variablesCan be divided into two main types: Categorical attributes - Nominal, binary and ordinal variablesContinuous attributes– integer, interval-scaled and ratio-scaled variablesIgnore attribute (optional)  - Variables which are of no significance
Data CleaningErroneous values can be divided into:Noisy value: Valid for the dataset, but incorrectly recordedInvalid values: Can be easily detected and removed/correctedNoise detection:Peaks in the datasetSome values outside the normal range: Such values could even be genuine (called as Outliers)
Missing ValuesReasons of occurrence: Equipment malfunctionAdditional fields were added laterNon-availability of information Strategies to deal with missing valuesDiscard instancesReplace by most frequent/average  value
Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net

Data

  • 1.
  • 2.
    Types of variablesCanbe divided into two main types: Categorical attributes - Nominal, binary and ordinal variablesContinuous attributes– integer, interval-scaled and ratio-scaled variablesIgnore attribute (optional) - Variables which are of no significance
  • 3.
    Data CleaningErroneous valuescan be divided into:Noisy value: Valid for the dataset, but incorrectly recordedInvalid values: Can be easily detected and removed/correctedNoise detection:Peaks in the datasetSome values outside the normal range: Such values could even be genuine (called as Outliers)
  • 4.
    Missing ValuesReasons ofoccurrence: Equipment malfunctionAdditional fields were added laterNon-availability of information Strategies to deal with missing valuesDiscard instancesReplace by most frequent/average value
  • 5.
    Visit more selfhelp tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net