Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. A Presentation on Data Mining Vishal Sethia
  2. 2. Primitives <ul><li>Introduction </li></ul><ul><li>Steps involved in Data Mining </li></ul><ul><li>Process Flow </li></ul><ul><li>Data Mining Algorithms </li></ul><ul><li>Applications / Real World example </li></ul><ul><li>Advantages/Disadvantages </li></ul><ul><li>Future of Data Mining </li></ul><ul><li>Questions ?? </li></ul>
  3. 3. Introduction <ul><li>What is Data Mining ? </li></ul><ul><ul><ul><li>practice of automatically searching large stores of data for patterns </li></ul></ul></ul><ul><ul><ul><li>In a simple analogy, it's finding the “ proverbial needle in the haystack.” </li></ul></ul></ul><ul><li>Also known as knowledge-discovery in databases (KDD) </li></ul><ul><li>It is frequently described as &quot;the process of extracting valid, authentic, and actionable information from large databases . </li></ul><ul><li>Popular Misconception about Data Mining </li></ul><ul><ul><ul><li>Systems can autonomously dig out all valuable knowledge without human intervention </li></ul></ul></ul>
  4. 4. Steps in Data Mining <ul><li>Defining the Problem </li></ul><ul><li>Preparing Data </li></ul><ul><li>Exploring Data </li></ul><ul><li>Building Models </li></ul><ul><li>Exploring and Validating Models </li></ul><ul><li>Deploying and Updating Models </li></ul>
  5. 5. Process Flow
  6. 6. Defining the problem <ul><li>analyzing business requirements </li></ul><ul><li>defining the scope of the problem </li></ul><ul><li>defining the metrics by which the model will be evaluated </li></ul><ul><li>defining the final objective for the data mining project. </li></ul>
  7. 7. Prepare and Explore Data <ul><li>Data can be scattered across a company and stored in different formats </li></ul><ul><li>May contain inconsistencies such as flawed or missing entries </li></ul><ul><ul><ul><li>For eg. customer bought a product before that customer was actually even born </li></ul></ul></ul><ul><li>understand the data in order to make appropriate decisions when models are created . </li></ul>
  8. 8. Building Model and Validating Model <ul><li>knowledge gained from the Exploring Data step help define and create a mining model. </li></ul><ul><li>A model typically contains input columns, an identifying column, and a predictable column. </li></ul><ul><li>Patterns are found by passing the original data through a mathematical algorithm. </li></ul><ul><li>Model to be validated before put into production. Several tests are run. </li></ul>
  9. 9. Data Mining Algorithms <ul><li>Linear Regression Algorithm </li></ul><ul><li>Decision Trees Algorithm </li></ul><ul><li>Clustering Algorithm </li></ul><ul><li>Naive Bayes Algorithm </li></ul><ul><li>Association Algorithm </li></ul><ul><li>Sequence Clustering Algorithm </li></ul><ul><li>Time Series Algorithm </li></ul><ul><li>Neural Network Algorithm (SSAS) </li></ul><ul><li>Logistic Regression Algorithm </li></ul>
  10. 10. Applications <ul><li>Financial Data Mining </li></ul><ul><li>Text Mining </li></ul><ul><li>Data Mining In Healthcare </li></ul><ul><li>Scientific Data Mining </li></ul><ul><li>Data Mining in Oil and Gas industry </li></ul>
  11. 11. Real World Example <ul><li>Consider a bank which gives loan to customers and it has an dataset of a group of customers who selected a financial loan product, some of whom went &quot;BAD&quot;. </li></ul><ul><li>The information we will make use of comes from standard credit reports provided by all the major credit bureaus, including variables such as: </li></ul><ul><ul><ul><li>Number of credit reports requested for this person in last 6 months </li></ul></ul></ul><ul><ul><ul><li>Number of credit cards with balances greater than 80% of available credit </li></ul></ul></ul><ul><ul><ul><li>Number of new credit accounts opened in last 12 months </li></ul></ul></ul><ul><ul><ul><li>How long ago was oldest account opened? </li></ul></ul></ul><ul><ul><ul><li>How long ago was newest account opened? </li></ul></ul></ul>
  12. 12. Advantages <ul><li>Can generate new business opportunities by: </li></ul><ul><ul><ul><li>    Automated prediction of trends and behaviors : </li></ul></ul></ul><ul><ul><ul><ul><ul><li>Data mining automates the process of finding predictive information in a large database. It uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. </li></ul></ul></ul></ul></ul><ul><ul><ul><li>    Automated discovery of previously unknown patterns </li></ul></ul></ul><ul><ul><ul><ul><ul><li>Data mining tools sweep through databases and identify previously hidden patterns. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together.   </li></ul></ul></ul></ul></ul>
  13. 13. Disadvantages <ul><li>Privacy Concerns </li></ul><ul><ul><ul><li>What if every telephone call you make, every credit card purchase you make, every flight you take, every visit to the doctor you make, every warranty card you send in, every employment application you fill out, every school record you have, your credit record, every web page you visit ... was all collected together? A lot would be known about you! </li></ul></ul></ul><ul><li>Data Readiness for Analysis : </li></ul><ul><ul><ul><li>Data-mining requires a consolidated &quot;de-duplicated&quot; and cleaned data store to draw from. Seventy to 85 percent of the work in building models using data mining relates to the cleaning and preparation of data prior to a specific analysis.  </li></ul></ul></ul>
  14. 14. Future of Data Mining <ul><li>Intelligent agents turned loose on medical research data or on sub-atomic particle data. </li></ul><ul><li>Computers may reveal new treatments for diseases or new insights into the nature of the universe. </li></ul>
  15. 15. References <ul><li>http:// www.statsoft.com/textbook/stdatmin.html </li></ul><ul><li>http://msdn2.microsoft.com/en-us/library/ms174949.aspx </li></ul><ul><li>http://www.the-data-mine.com/bin/view/Misc/ApplicationsOfDataMining </li></ul><ul><li>http://www.eco.utexas.edu/~norman/BUS.FOR/course.mat/Alex/ </li></ul><ul><li>http://www.darwinmag.com/read/100103/mining.html </li></ul><ul><li>http://www.salford-systems.com/walkaboutcart1.php </li></ul><ul><li>http://databases.about.com/od/datamining/a/datamining.htm </li></ul><ul><li>http://www.spss.com/data_mining/?source=homepage&hpzone=tech </li></ul>
  16. 16. Questions ????