DMDW Lesson 08 - Further Data Mining Algorithms

1,839 views
1,732 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,839
On SlideShare
0
From Embeds
0
Number of Embeds
532
Actions
Shares
0
Downloads
53
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

DMDW Lesson 08 - Further Data Mining Algorithms

  1. 1. STUDIEREN<br />UND DURCHSTARTEN.<br />Author I: Dip.-Inf. (FH) Johannes Hoppe<br />Author II: M.Sc. Johannes Hofmeister<br />Author III: Prof. Dr. Dieter Homeister<br />Date: 13.05.2011 <br />
  2. 2. Further Data Mining Algorithms<br />Author I: Dip.-Inf. (FH) Johannes Hoppe<br />Author II: M.Sc. Johannes Hofmeister<br />Author III: Prof. Dr. Dieter Homeister<br />Date: 13.05.2011 <br />
  3. 3. 01<br />Data Mining Algorithms - Regression Analysis<br />3<br />
  4. 4. DM Algorithms - Regression Analysis<br />Regression Analysis<br />AKA. function approximation<br />Includes any techniques for modeling and analyzing several variables<br />Models the relationship between one or more variables you are trying to predict (dependent variables) and the predictive variables (independent variables)<br />4<br />
  5. 5. DM Algorithms - Regression Analysis<br />SSAS build in<br />MS Linear Regression Analysis<br />MS Logistic Regression Analysis<br />MS Time Series Algorithm<br />http://msdn.microsoft.com/en-us/library/ms170993(SQL.90).aspx<br />5<br />
  6. 6. DM Algorithms - Regression / Linear Regression<br />Linear Regression<br />Analyze two continuous columns <br />Relationship is an equation<br />Equation is a line (linear equation)<br /> f(x) = m*x + b<br />Error == distance from the regression line<br />http://msdn.microsoft.com/en-us/library/ms174824(SQL.90).aspx<br />6<br />
  7. 7. DM Algorithms - Regression / Linear Regression<br />7<br />Example<br />
  8. 8. DM Algorithms - Regression / Linear Regression<br />Explanation<br />The Diagram shows a relationship between sales and advertising along with the regression equation. The goal is to be able to predict sales based on the amount spent on advertising. The graph shows a very linear relationshipbetween sales and advertising. A key measure of the strength of the relationship is the R-square. The R-square measures the amount of the overall variation in the data that is explained by the model.This regression analysis results in an R-square of 70%.This implies that 70% of the variation in sales can be explained by the variation in advertising.<br />[Source: Olivia Parr Rud et. al, Data Mining Cookbook]<br />8<br />
  9. 9. DM Algorithms - Regression / Logistic Regression<br />Logistic regression<br />Dependent variables have values between 0 and 1<br />Functions which describes the probability of a given event <br />Instead of creating a straight line, logistic regression analysis creates an "S" shaped curve that contains maximum and minimum constraints<br />Wikipedia Algorithm != MSDN Algorithm<br />http://msdn.microsoft.com/en-us/library/ms174828(SQL.90).aspx<br />9<br />
  10. 10. DM Algorithms - Regression / Logistic Regression<br />Logistic regression <br />10<br />
  11. 11. DM Algorithms - Regression / Time-Series<br />MS Time-Series Algorithm<br />Trend Analysis<br />Optimized for analyzing continuous values<br />eg. product sales over time<br />Train  Predict<br />Cross-predictions possible! *<br />* cool!<br />http://msdn.microsoft.com/en-us/library/ms174923(SQL.90).aspx<br />
  12. 12. DM Algorithms - Regression / Time-Series<br />MS Time-Series Algorithm<br />
  13. 13. DM Algorithms - Regression / Time-Series<br />Combination of 2 algorithms, results are mixed<br />ARTxp<br />Auto Regressive Tree Method<br />Developed by Microsoft Research<br />Based on Microsoft Decision-Tree<br />For Short term predictions<br />ARIMA:<br />Auto Regressive Integrated Moving Average <br />Developed by Box and Jenkins<br />For long term predictions<br />http://msdn.microsoft.com/en-us/library/ms174828(SQL.90).aspx<br />http://msdn.microsoft.com/en-us/library/bb677216.aspx<br />13<br />
  14. 14. 02<br />Data Mining Algorithms - Neural Networks <br />14<br />
  15. 15. DM Algorithms - Neural Networks <br />15<br />
  16. 16. DM Algorithms - Neural Networks <br />Neural Networks (NN or ANN)<br />Better term: artificial neural networks (ANN),in opposite to biological NN<br />Sometimes called neuronal networks<br />Bytheway…http://code.google.com/p/clustered-neuronal-network/wiki/ProjektInfos<br />16<br />
  17. 17. 17<br />
  18. 18. DM Algorithms - Neural Networks <br />Definition<br />A neural network is a massively parallel distributed processor that has a natural propensity for storing experiential knowledge and making it available for use.<br />It resembles the brain in two respects:<br />Knowledge is acquired by the network through a learning process. <br />Interneuron connection strengths known as synaptic weights are used to store the knowledge. <br />[Source: Haykin, S. (1994), Neural Networks: A Comprehensive Foundation, NY: Macmillan. ]<br />18<br />
  19. 19. DM Algorithms - Neural Networks <br />Most NN are composed of several layers of neurons<br />The direction of most connections is from input to output <br />Often used: Back Propagation Networks<br />A single neuron has several inputs with individual weights and one output <br />In the basic form, the output is activated if the sum of inputs*weights exceeds a given threshold <br />Learning is done with a target value at an additional training input plus a training mode signal. <br />19<br />
  20. 20. THANK YOU<br />FOR YOUR ATTENTION<br />20<br />
  21. 21. Farben Primär<br />
  22. 22. Farben Code<br />

×