Like this presentation? Why not share!

# XL-MINER: Data Utilities

## by DataminingTools Inc on Feb 03, 2010

• 1,455 views

XL-MINER: Data Utilities

XL-MINER: Data Utilities

### Views

Total Views
1,455
Views on SlideShare
1,452
Embed Views
3

Likes
0
0
0

### 2 Embeds3

 http://www.slideshare.net 2 http://static.slidesharecdn.com 1

### Categories

Uploaded via SlideShare as Microsoft PowerPoint

## XL-MINER: Data UtilitiesPresentation Transcript

• Introduction to
XLMiner™
DATA Utilities
XLMiner and Microsoft Office are registered trademarks of the respective owners.
• Brief description of the features of XLMiner:
Data Utilities
The XLMiner provides the user with a host of Data Utilities at his disposal. They are:
The different Data Utilities that XLMiner Provides are:-
Sample from Worksheet/Database.
• Simple Random sample.
• Stratified Sampling.
Missing Data handling.
Bin Continuous Data.
Transform Categorical Data .
http://dataminingtools.net
• Sample data from Worksheet
When huge amounts of data are involved, statisticians prefer taking a sample of the data that represents the entire database. However, such a representative sample is very difficult to obtain.
The entire dataset we want information about is called the population. A sample is a part of population that we actually examine to draw conclusions.
A good sample should be a true representation of data. As far as possible the cases chosen for sample should be like the cases that are not chosen. If the sample design is poor it can produce misleading conclusions. Various methods and techniques are developed to ensure a true sample.
XLMiner provides us sampling facilities.
http://dataminingtools.net
• Sample data from Worksheet
In XLMiner, sampling can be done in two ways:
Simple Random sampling:
A random sample of x records is chosen from the data such that every record in that sample has an equal chance of being chosen
Stratified Sampling :
The data is divided into strata of similar items. Then each stratum is sampled using the simple random approach and the results are then combined to give a final sample.
http://dataminingtools.net
• Sample data from Worksheet- Simple Random Sampling
Select the variables to be present in the sample
Here “Simple Random sampling is selected
We can specify the seed value( value used for random selection) or the wizard will specify it by default.
Set the size for the sampled set
If selected duplicate copies of records may be used.
http://dataminingtools.net
• Sample data from Worksheet- Simple Random Sampling output
http://dataminingtools.net
• Sample data from Worksheet-
Simple Random Sampling output with replacement.
Duplicate copies of record exist in the sample.
http://dataminingtools.net
• Sample data from Worksheet- Stratified Sample( proportionate )
http://dataminingtools.net
• Sample data from Worksheet- Stratified Sample( proportionate – output )
As selected by us, the % of records in each stratum in the sample set is same as that in the input set
http://dataminingtools.net
• Sample data from Worksheet- Stratified Sample(specify number)
http://dataminingtools.net
• Sample data from Worksheet- Stratified Sample(specify number)
All stratums have equal sizes as specified by user (here 10 records each)
http://dataminingtools.net
• Sample data from Worksheet- Stratified Sample( size of smallest stratum)
http://dataminingtools.net
• Sample data from Worksheet- Stratified Sample( size of smallest stratum-output)
All stratum have size equal to the size of the smallest stratum
http://dataminingtools.net
• Missing Data Handling
This utility allows the user to process the data before any mining method is applied on it. It allows the user to detect the missing values in the data and handle them the way the user wants.

XLMiner� considers a cell to be missing data if it is empty or contains an invalid formula. XLMiner� can be prompted to treat a cell to be missing data  if it contains a certain value specified by the user or handles the data as specified by the user.
The user can specify how XLMiner� should correct these missing values. A treatment can be assigned for every variable. The records with missing data can be either deleted fully or the missing values can be replaced.  XLMiner� provides options on how to replace the missing data, e.g. by mean or median or mode or a value specified by the user. The available options depend on the type of variable
http://dataminingtools.net
• Missing Data Handling
http://dataminingtools.net
• Missing Data Handling
Data Set
Select the action to handle the missing data in individual columns and click on “Apply this option to selected variable”
http://dataminingtools.net
• Missing Data Handling-Output
Changed records high-lighted
http://dataminingtools.net
• Transform Categorical Data
Sometimes our data sets may contain variables that take non-numeric values. This makes it difficult to apply standard procedures. Hence XLMiner provides us with a tool which can be used to rename (transform) non-numeric data to numeric data.
There are two ways to transform categorical data:
Creating Dummies:
Consider the variable to have 4 distinct values as A,B,C and D. Then 3 new rows, VAL1,VAL2, VAL3 are created with values either 1 or 0 .If row one contains value A the VAL1 will have a value 1,rest have 0.If all have 0,then the row has a value D.
Create category scores:
In this if the non-numeric holds 4 distinct values as above, each value( ordered alphabetically) will be numbered from 1 to 4 and a new column is created that contains the value of number the non-numeric variable corresponds to.
http://dataminingtools.net
• Transform Categorical Data- Dummies
Select the variable that contains non-numeric Data and needs to be transformed
http://dataminingtools.net
• Transform Categorical Data-Category Scores
http://dataminingtools.net
• Transform Categorical Data-Category Scores(output)
http://dataminingtools.net
• Thank you
For more visit:
http://dataminingtools.net
http://dataminingtools.net
• Visit more self help tutorials
Pick a tutorial of your choice and browse through it at your own pace.
The tutorials section is free, self-guiding and will not involve any additional support.
Visit us at www.dataminingtools.net