Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Data in Data mining

Published in: Technology
  • Be the first to comment

  • Be the first to like this


  1. 1. DATA FOR DATA MINING<br />
  2. 2. Types of variables<br />Can be divided into two main types: <br />Categorical attributes - Nominal, binary and ordinal variables<br />Continuous attributes– integer, interval-scaled and ratio-scaled variables<br />Ignore attribute (optional) - Variables which are of no significance<br />
  3. 3. Data Cleaning<br />Erroneous values can be divided into:<br />Noisy value: Valid for the dataset, but incorrectly recorded<br />Invalid values: Can be easily detected and removed/corrected<br />Noise detection:<br />Peaks in the dataset<br />Some values outside the normal range: Such values could even be genuine (called as Outliers)<br />
  4. 4. Missing Values<br />Reasons of occurrence: <br />Equipment malfunction<br />Additional fields were added later<br />Non-availability of information <br />Strategies to deal with missing values<br />Discard instances<br />Replace by most frequent/average value<br />
  5. 5. Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at<br />