Excel Datamining Addin Beginner


Published on

Data Mining with Excel Add-In: Beginners Edition

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Excel Datamining Addin Beginner

  1. 1.
  2. 2. What is DATA MINING Data mining (or Knowledge Discovery) refers to the process of analyzing a give data set from different precepts and scenarios in order to discover patterns in the given data set
  3. 3. The add-in in Excel <ul><li>On Installing the add-in you can see a new tab “DATA MINING” on the excel ribbon. Click on it to expand the tab. </li></ul><ul><li>The ribbon contains four important partitions, </li></ul><ul><li>Data preparation. </li></ul><ul><li>Data modeling. </li></ul><ul><li>Accuracy and Validation. </li></ul><ul><li>Connection. </li></ul><ul><li>We will see in brief how to use these options. </li></ul>
  4. 4. Who can use this add-in <ul><li>This add-in combines the powerful mining engine of SQL Server Analysis Service and the intuitive and user –friendly interface of Microsoft Excel . </li></ul><ul><li>This add-in can be used by any person with the basic knowledge Excel and no prior experience in data mining is necessary. </li></ul><ul><li>The add-in can be used to perform data-mining using a few clicks and the add-in employs advanced mining algorithm and also eliminate the difficult task of configuring the SQL server . </li></ul><ul><li>For those with past experience in data mining, the add-in could be used to perform very complex and accurate data mining with ease. </li></ul>
  5. 5. Data Preparation <ul><li>As the name suggests this block deals with preparing the data for mining, converting it to the proper format. The data preparation is the most important part of the data mining process as data can only be analyzed if it is structured in a proper format if accurate reports are our goal. This is done by the three tools provided for this purpose for this purpose: </li></ul><ul><li>Explore Data : this tool helps us to create a histogram for any column in the table. </li></ul><ul><li>Clean Data : Using this tool we can specify maximum and minimum values for data that we require in particular column </li></ul>
  6. 6. Data Preparation- Explore Data <ul><li>Description : </li></ul><ul><li>This tool uses a given column from the table and plots histogram .The histogram provides us insight on the distribution of data and the occurrence of a set of values enabling us to explore which discrete value of group of values dominate our data set . </li></ul><ul><li>How to use : </li></ul><ul><li>Choose a column a produce its histogram . </li></ul><ul><li>For Example, </li></ul><ul><li>In the next slide, Here we have used the tool to explore the Income column of the data set. We can see that maximum of the customers have income between the range of 30000 to 50000 and very few people have income in the range 150000-170000, so that we may market our product accordingly. </li></ul><ul><li>If required we can add this data as a column in our table . </li></ul>
  7. 7. Data Preparation- Explore Data
  8. 8. Data Preparation- Explore Data Here, we can see how maximum of the customers are have ages between 35 years – 40 years with most customers 40 years old.
  9. 9. Clean Data- Outliers <ul><ul><li>Outliers : </li></ul></ul><ul><li>This tool helps to identify outlying values or rare values that exist beyond a give value or below it within the table which may be exceptions thus making the table data inconsistent. After detecting outliers we may choose to change their values to average or null. </li></ul>
  10. 10. Data Modeling <ul><li>The actual work of data mining is done on prepared data using these tools. These tools internally mine data using powerful mining algorithm’s employing SQL Server Analysis services. </li></ul>Sr.no Tool name Mining Algorithm used 1. Classify Microsoft Decision Trees 2. Estimate Microsoft Decision Trees 3. Clusters Microsoft Clustering 4. Associate Microsoft Association Rules 5. Forecast Microsoft Time Series
  11. 11. Classify <ul><li>The Classify tool helps us build a classification model that shows how the individual values of one column are affected by values of other columns. </li></ul>
  12. 12. Data Modeling - Associate <ul><li>This creates an association model that analyzes the data to detect items that appear together in transaction. </li></ul>
  13. 13. Accuracy and Validation <ul><li>In this part, we can find tools that can be used to test and validate our mining models. It is important that we know how well the mining models developed by us work with real world data, and by checking their accuracy we can validate the mining models </li></ul>
  14. 14. Accuracy and Validation-Accuracy Chart <ul><li>This tool helps us to apply previously developed mining model on a set of real world data so that we can see how well it performs . </li></ul>
  15. 15. <ul><li>Browse: </li></ul><ul><li>Used to browse the previously created data mining models. </li></ul><ul><li>Query: </li></ul><ul><li>The Query Model tool lets you use the existing mining models to make predictions using the data in an Excel table using prediction query. </li></ul>Model Usage
  16. 16. <ul><li>Default Local host : Used to configure the connection of Excel to SQL Server analysis Services. </li></ul><ul><li>Trace : Used to view the log of all the data sent to the QL Server for analysis during mining model creation. </li></ul>Connection
  17. 17. Visit more self help tutorials <ul><li>Pick a tutorial of your choice and browse through it at your own pace. </li></ul><ul><li>The tutorials section is free, self-guiding and will not involve any additional support. </li></ul><ul><li>Visit us at www.dataminingtools.net </li></ul>