Your SlideShare is downloading. ×
XL-MINER:Partition
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

XL-MINER:Partition

721
views

Published on

XL-MINER:Partition

XL-MINER:Partition

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
721
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Introduction to
    XLMiner™:
    PARTITION DATA
    XLMiner and Microsoft Office are registered trademarks of the respective owners.
  • 2. Introduction to Partition Data
    Generally the data sets used in mining are enormous. Hence in order to mine data easily ,one method is to divide/partition data. Partitioning data means dividing the data set into multiple partitions that are mutually exclusive i.e. they do not overlap or the partitions have no data records are common.
    Partitioning data generally results in 3 sets of data:
    Training Data set :- This partition is used to create/build the mining model.
    Validation Data set :- : It is used to check whether the model developed using the training set is accurate or not. The validation set consists of data whose result (the value of the variable to be determined) is already known so that results obtained after applying the model and the actual results can be matched.
    Test data set :- It is used to determine how the model would perform when it encounters real world data.
    http://dataminingtools.net
  • 3. Types of Partitions
    XLMiner allows us to create 2 kinds of partitions:
    Standard Partition: Creates 3 partitions based on the partition ratios provided. Data records are randomly elected and every record has an equal chance of lying in any of the partition.
    • Automatic: When this is selected, the wizard by default sets the partitioning ratio as 60(training):40(validation) and no test set is created. These values cannot be altered.
    • 4. Specify percentages :Unlike automatic, if selected ,the user can specify the ratio of the partitions created in terms of percentages.
    • 5. Equal partitions: Selecting this option sets a partitioning ratio of 33.3(training): 33.3(validation): 33.3(test) .
    Partition with oversampling: This method of partitioning is used when the percentage of successes in the output variable is very low in the dataset but we want to train the data with a particular percentage of successes.
    http://dataminingtools.net
  • 6. Data Set used for Partition
    http://dataminingtools.net
  • 7. Standard Partition (Automatic)-Step 1
    http://dataminingtools.net
  • 8. Standard Partition (Automatic)-Output
    Testing Set Validation Set
    http://dataminingtools.net
  • 9. Standard Partition (Specify)-Step 1
    Selecting “Specify percentages” allows us to set the partitioning ratios as per our need. Here we have set a ratio of 50(testing):30(validation):20(test)
    http://dataminingtools.net
  • 10. Standard Partition (Equal)-Step 1
    Selecting “Equal” sets the partitioning ratio at 33.3% for each partition creating 3 equal sized partitions.
    http://dataminingtools.net
  • 11. Oversampled Partition – Data Set
    In order to oversample a data set, it must contain at least 1 data item that accepts only 2 distinct values, not more and only then can it be used as the success class(the data item which is oversampled)
    http://dataminingtools.net
  • 12. Oversampled Partition – Step 1
    http://dataminingtools.net
  • 13. Oversampled Partition – Output
    The records in the training data set
    http://dataminingtools.net
  • 14. Oversampled Partition – Output
    Rows in Validation set = 27, Rows in testing set = 30% of 27 = 12.
    http://dataminingtools.net
  • 15. Thank you
    For more visit:
    http://dataminingtools.net
    http://dataminingtools.net
  • 16. Visit more self help tutorials
    Pick a tutorial of your choice and browse through it at your own pace.
    The tutorials section is free, self-guiding and will not involve any additional support.
    Visit us at www.dataminingtools.net