Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

  • Be the first to comment


  1. 1. Data Mining <ul><li>BY </li></ul><ul><li>JEMINI ISLAM </li></ul>
  2. 2. Data Mining <ul><li>Outline: </li></ul><ul><li>What is data mining? </li></ul><ul><li>Why use data mining? </li></ul><ul><li>How does data mining work </li></ul><ul><li>The process of data mining </li></ul><ul><li>Tools of data mining </li></ul>
  3. 3. What is data mining? <ul><li>Generally, data mining (sometimes called data </li></ul><ul><li>or knowledge discovery) is the process of </li></ul><ul><li>analyzing data from different perspectives and </li></ul><ul><li>summarizing it into useful information. It allows users to </li></ul><ul><li>analyze data from many different dimensions or angles, </li></ul><ul><li>categorize it, and summarize the relationships identified. </li></ul><ul><li>Technically, data mining is the process of finding correlations </li></ul><ul><li>or patterns among dozens of fields in large relational </li></ul><ul><li>databases. </li></ul>
  4. 4. Cont.. <ul><li>Data Mining, also known as Knowledge-Discovery </li></ul><ul><li>in Databases (KDD), is the process of automatically </li></ul><ul><li>searching large volumes of data for patterns. Data </li></ul><ul><li>Mining is a fairly recent and contemporary topic in </li></ul><ul><li>computing. However, Data Mining applies many </li></ul><ul><li>older computational techniques from statistics, </li></ul><ul><li>machine learning and pattern recognition . </li></ul>
  5. 5. Example of data mining: <ul><li>A simple example of data mining is its use in a retail </li></ul><ul><li>sales department. If a store tracks the purchases of a </li></ul><ul><li>customer and notices that a customer buys a lot of </li></ul><ul><li>silk shirts, the data mining system will make a </li></ul><ul><li>correlation between that customer and silk shirts. </li></ul><ul><li>The sales department will look at that information </li></ul><ul><li>and may begin direct mail marketing of silk shirts to </li></ul><ul><li>that customer, or it may alternatively attempt to get </li></ul><ul><li>the customer to buy a wider range of products.. </li></ul>
  6. 6. Example Cont.. <ul><li>Another widely used (though hypothetical) example </li></ul><ul><li>is that of a very large North American chain of </li></ul><ul><li>supermarkets. Through intensive analysis </li></ul><ul><li>of the transactions and the goods bought over a </li></ul><ul><li>period of time, analysts found that beers and diapers </li></ul><ul><li>were often bought together. </li></ul>
  7. 7. Continue.. <ul><li>The grocery chain could use this newly discovered </li></ul><ul><li>information in various ways to increase revenue. </li></ul><ul><li>For example, they could move the beer display </li></ul><ul><li>closer to the diaper display. And, they could </li></ul><ul><li>place the high-profit diapers next to the high-profit </li></ul><ul><li>beers. </li></ul>
  8. 8. Why use data mining? <ul><li>Data is one of the most valuable assets for any corporation - but only if we know how to reveal valuable knowledge hidden in raw data. Data mining allows us to extract diamonds of knowledge from historical data and predict useful outcomes form that. </li></ul>
  9. 9. Cont.. <ul><li>Data mining can- </li></ul><ul><li>* optimize business decisions, </li></ul><ul><li>* increase the value of each customer and </li></ul><ul><li>communication, and </li></ul><ul><li>*improve satisfaction of customer with your services. </li></ul>
  10. 10. How does data mining work? <ul><li>Data mining creates link between separate </li></ul><ul><li>transactions and analytical systems in a large- </li></ul><ul><li>scale information technology. It uses various </li></ul><ul><li>software to analyze relationships and patterns. </li></ul><ul><li>Generally,the following four types of </li></ul><ul><li>relationships are sought : </li></ul>
  11. 11. Classification <ul><li>A task of finding a function that </li></ul><ul><li>maps records into one of several </li></ul><ul><li>discrete classes. For example , a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials. </li></ul>
  12. 12. Clustering <ul><li>Clustering is a task of </li></ul><ul><li>identifying groups of records </li></ul><ul><li>that are similar between </li></ul><ul><li>themselves but different from </li></ul><ul><li>the rest of the data. For example, </li></ul><ul><li>data can be mined to identify </li></ul><ul><li>market segments or consumer </li></ul><ul><li>affinities </li></ul>
  13. 13. Association. <ul><li>Data can be mined to identify association. </li></ul><ul><li>The beer-diaper example is an example of </li></ul><ul><li>associative mining. </li></ul>
  14. 14. Sequential Patterns <ul><li>Data is mined to anticipate </li></ul><ul><li>behavior patterns and trends.For </li></ul><ul><li>example, an outdoor equipment </li></ul><ul><li>retailer could predict the </li></ul><ul><li>likelihood of a backpack being </li></ul><ul><li>purchased based on a consumer's </li></ul><ul><li>purchase of sleeping bags and </li></ul><ul><li>hiking shoes. </li></ul>
  15. 15. The process of data mining <ul><li>The process of data mining consists </li></ul><ul><li>of three stages: </li></ul><ul><li>1) The initial exploration, </li></ul><ul><li>2) model building or pattern </li></ul><ul><li>identification with validation </li></ul><ul><li>or verification, and </li></ul><ul><li>(3) deployment (i.e., the application of the model to new data in order to generate predictions). </li></ul>
  16. 16. Stage 1: Exploration <ul><li>This stage usually starts with </li></ul><ul><li>data preparation which may involve </li></ul><ul><li>cleaning data,data transformations, </li></ul><ul><li>selecting subsets of records and - </li></ul><ul><li>in case of data sets with large </li></ul><ul><li>numbers of variables (&quot;fields&quot;) – </li></ul><ul><li>performing some preliminary feature </li></ul><ul><li>selection operations to bring the </li></ul><ul><li>number of variables to a manageable range). </li></ul>
  17. 17. Stage 2: Model building and validation. <ul><li>This stage involves considering various </li></ul><ul><li>models and choosing the best one based on </li></ul><ul><li>their predictive performance (i.e., explaining </li></ul><ul><li>the variability in question and producing </li></ul><ul><li>stable results across samples). </li></ul>
  18. 18. Stage 3: Deployment. <ul><li>That final stage involves using the model </li></ul><ul><li>selected as best in the previous stage and </li></ul><ul><li>applying it to new data in order to generate </li></ul><ul><li>predictions or estimates of the expected </li></ul><ul><li>outcome. </li></ul>
  19. 19. Tools of Data Mining <ul><li>Artificial Neural </li></ul><ul><li>Networks : Non-linear </li></ul><ul><li>predictive models that </li></ul><ul><li>learn through training </li></ul><ul><li>and resemble biological </li></ul><ul><li>neural networks in </li></ul><ul><li>structure. </li></ul>
  20. 20. Cont.. <ul><li>Genetic algorithms : Optimization techniques </li></ul><ul><li>that use processes such as genetic </li></ul><ul><li>combination, mutation, and natural selection </li></ul><ul><li>in a design based on the concepts of natural </li></ul><ul><li>evolution. </li></ul>
  21. 21. Cont.. <ul><li>Decision trees : Tree </li></ul><ul><li>shaped structures that </li></ul><ul><li>represent sets of </li></ul><ul><li>decisions. These </li></ul><ul><li>decisions generate rules </li></ul><ul><li>for the classification of a </li></ul><ul><li>dataset </li></ul>
  22. 22. Cont..(Tools of Data Mining) <ul><li>Nearest neighbor </li></ul><ul><li>method: A technique that </li></ul><ul><li>classifies each record in a </li></ul><ul><li>dataset based on a </li></ul><ul><li>combination of the classes of </li></ul><ul><li>the k record(s) most similar </li></ul><ul><li>to it in a historical dataset </li></ul><ul><li>(where k 1). Sometimes </li></ul><ul><li>called the k-nearest neighbor </li></ul><ul><li>technique </li></ul>
  23. 23. Cont.. <ul><li>Rule induction : The </li></ul><ul><li>extraction of useful if- </li></ul><ul><li>then rules from data </li></ul><ul><li>based on statistical </li></ul><ul><li>significance. </li></ul>
  24. 24. Tools of Data Mining (Cont..) <ul><li>Data visualization : The visual interpretation </li></ul><ul><li>of complex relationships in multidimensional </li></ul><ul><li>data. Graphics tools are used to illustrate data </li></ul><ul><li>relationships. </li></ul>
  25. 25. Conclusion: <ul><li>The concept of Data Mining is becoming </li></ul><ul><li>increasingly popular as a business information </li></ul><ul><li>management tool where it is expected to reveal </li></ul><ul><li>knowledge structures that can guide decisions in </li></ul><ul><li>conditions of limited certainty. Today increasingly </li></ul><ul><li>more companies acknowledge the value of this new </li></ul><ul><li>opportunity and use data mining tools and solutions </li></ul><ul><li>that help optimizing their operations and increase </li></ul><ul><li>customer’s bottom line. </li></ul>