Data Mining


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • HAVE YOU EVER SEEN ONE OF THOSE posters that at first glance looks like a jumble of colored dots? Stare at it, and a three-dimensional picture will jump out from the pointillistic background. Now, think of those dots as the bits of information about your customers contained in your company's databases. If you look at the dots of information in the right light and at the right angle, they will reveal patterns that yield insight into customer behavior
  • Data mining enables you to discover hidden patterns and relationships in large amounts of data. Data mining solves a common paradox: the more customer data you have, the more difficult and time-consuming it is to effectively analyze and draw meaning from it. What should be a gold mine often lies unexplored, due to a lack of personnel, time or expertise. Data mining uses powerful analytic technologies to quickly and thoroughly explore mountains of data, isolating the valuable, usable information — the business intelligence — that you need. Why use data mining? When you have a reliable guide to the future of your business, you have the power to make the right decisions today. Data mining empowers you to change the future of your business, by delivering accurate predictions. For example, data mining tells you which prospects are likely to become profitable customers and which are most likely to respond to your offer. With this view of the future, you increase your ROI by making your offer to only those prospects likely to respond and become valuable customers. Your decisions are based on sound business intelligence, not on instinct or gut reactions. And those decisions deliver consistent results that keep you ahead of the competition.
  • look at DSS data mining Atler p.205- 214
  • The CRISP-DM project has developed an industry- and tool-neutral Data Mining process model. Starting from the embryonic knowledge discovery processes used in industry today and responding directly to user requirements, this project defined and validated a data mining process that is applicable in diverse industry sectors. This will make large data mining projects faster, cheaper, more reliable and more manageable. Even small scale data mining investigations will benefit from using CRISP-DM. The current version of CRISP-DM is Version 1.0 which has been developed by the Consortium Partners from the initial release and discussion paper made available at the conclusion of the project. The life cycle of a data mining project consists of six phases. The sequence of the phases is not strict. Moving back and forth between different phases is always required. It depends on the outcome of each phase which phase, or which particular task of a phase, that has to be performed next. The arrows indicate the most important and frequent dependencies between phases. The outer circle in the figure symbolizes the cyclic nature of data mining itself. A data mining process continues after a solution has been deployed. The lessons learned during the process can trigger new, often more focused business questions. Subsequent data mining processes will benefit from the experiences of previous ones. Below follows a brief outline of the phases: Business Understanding This initial phase focuses on understanding the project objectives and requirements from a business perspective, and then converting this knowledge into a data mining problem definition, and a preliminary plan designed to achieve the objectives. Data Understanding The data understanding phase starts with an initial data collection and proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data, or to detect interesting subsets to form hypotheses for hidden information. Data Preparation The data preparation phase covers all activities to construct the final dataset (data that will be fed into the modeling tool(s)) from the initial raw data. Data preparation tasks are likely to be performed multiple times, and not in any prescribed order. Tasks include table, record, and attribute selection as well as transformation and cleaning of data for modeling tools. Modeling In this phase, various modeling techniques are selected and applied, and their parameters are calibrated to optimal values. Typically, there are several techniques for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, stepping back to the data preparation phase is often needed. Evaluation At this stage in the project you have built a model (or models) that appears to have high quality, from a data analysis perspective. Before proceeding to final deployment of the model, it is important to more thoroughly evaluate the model, and review the steps executed to construct the model, to be certain it properly achieves the business objectives. A key objective is to determine if there is some important business issue that has not been sufficiently considered. At the end of this phase, a decision on the use of the data mining results should be reached. Deployment Creation of the model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that the customer can use it. Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process. In many cases it will be the customer, not the data analyst, who will carry out the deployment steps. However, even if the analyst will not carry out the deployment effort it is important for the customer to understand up front what actions will need to be carried out in order to actually make use of the created models. CRISP-DM was conceived in late 1996 by three “veterans” of the young and immature data mining market. Diamler Crystler, NCR and SPSS
  • SPSS software used in Market Research Class this example actual data mining project done by UAH mkt/mgt dept. SUBHT…subscribe to Huntsville Time How often do you read the newspaper and do you subscribe OLAP…how many people read newspaper daily, subscribe to it and read the sports and comics The “5” is erroneous data
  • WebMiner is led by CEO, Jesus Mena, a noted authority on web data mining and author of Data Mining Your Website . His patented-pending web mining ASP is the only solution that delivers data-driven, real-time, online product recommendations matched to the demographic, psychographic, and behavioral characteristics of website visitors. WebMiner ASP offers not only personalization solutions, but the marketing insight to fully implement profit-focused recommendations. Jesus Mena, data mining expert, author and lecturer In his days at the Internal Revenue Service, Jesus Mena became an expert in data mining. With the IRS' blessing, he started consulting on the side and even published a book, "Data Mining Your Website," for dot-com companies. While still at the IRS, Mena raised $2 million in venture capital and in 2000 started a company called WebMiner, which helped businesses make sense of customer information. Mena has since sold WebMiner and turned his attention back to the federal government. His goal: Teach agencies how to use data mining to improve homeland security. Jesus Mena Printer-Friendly Version E-Mail This Article Purchase A Reprint Link To This Page 01/26/04; Vol. 18 No. 20 Survival Guide: Perspectives from the field
  • Quadstone: , but further than OLAP the user has the opportunity to deploy models at the customer level at the click of a mouse and then perform customer-level “write-back”. Ex. instead of asking "how many times was the ATM machine used last week?" we can ask "which customers used the ATM machine last week?" Webgrove: One of the many strengths of Webgroove® Relate is its ability to analyze click-stream data. Converting worthless data into valuable information used for defining a viewers interests is a complicated task. It is also a critical task for accurate behavioral profiling
  • Oracle Home Page
  • Data Mining

    1. 1. Data Mining Getting to Know You!
    2. 2. Data Mining <ul><li>Data mining allows for the discovery of hidden patterns and relationships in large amounts of data. </li></ul><ul><li>Data mining uses powerful analytic technologies to quickly and thoroughly explore mountains of data, isolating the valuable, usable information — the business intelligence — </li></ul><ul><li>Ex. Data mining tells you which prospects are likely to become profitable customers and which are most likely to respond to your offer. ROI is increased by making offers to only those prospects likely to respond and become valuable customers. </li></ul><ul><li>Spyware is any technology that aids in gathering information about a person or organization without their knowledge. On the Internet (where it is sometimes called a spybot or tracking software ), spyware is programming that is put in someone's computer to secretly gather information about the user and relay it to advertisers or other interested parties. Often done via adware applications. Free spyware scan </li></ul><ul><li>SPSS edu </li></ul>
    3. 3. What Business Problems does Data Mining Solve? <ul><li>You can use data mining to solve almost any business problem that involves data, including: </li></ul><ul><li>Increasing business unit and overall profitability </li></ul><ul><li>Understanding customer desires and needs </li></ul><ul><li>Identifying profitable customers and acquiring new ones </li></ul><ul><li>Retaining customers and increasing loyalty </li></ul><ul><li>Increasing ROI and reducing costs on promotions </li></ul><ul><li>Cross-selling and up-selling </li></ul><ul><li>Detecting fraud, waste and abuse </li></ul><ul><li>Determining credit risks </li></ul><ul><li>Increasing Web site profitability </li></ul><ul><li>Increasing store traffic and optimizing layouts for increased sales </li></ul><ul><li>Monitoring business performance SPSS </li></ul>
    4. 4. Data Mining Definitions and Uses <ul><li>Data mining refers to a wide range of techniques that look at underlying patterns or associations among elements within large data sets. These patterns are then used to form rules or guidelines for use in a wide range of marketing decisions. Ex. Insightful Miner demo* </li></ul><ul><li>Data mining tools can improve marketing management decisions such as: </li></ul><ul><ul><li>inventory planning/management </li></ul></ul><ul><ul><li>space utilization </li></ul></ul><ul><ul><li>promotion management </li></ul></ul><ul><ul><li>segmentation and target marketing </li></ul></ul><ul><ul><li>improving sales force performance </li></ul></ul><ul><ul><li>customer relationship management (CRM) </li></ul></ul><ul><ul><li>and many others </li></ul></ul>
    5. 5. CRISP-DM Figure: Phases of the CRISP-DM Process Model Cross Industry Standard Process for Data Mining: Project Overview                                                                                                                                                           
    6. 6. What Can be Done with Data Mining? <ul><li>Through various algorithms , data mining software sorts through thousand of data points, organizes it, then summarizes complex relationships for the user. </li></ul><ul><li>Data mining software typically follows one of five different analytical approaches: </li></ul><ul><ul><li>Forecasting via Trend Analysis, Reporting and OLAP </li></ul></ul><ul><ul><li>Associations </li></ul></ul><ul><ul><li>Classifications </li></ul></ul><ul><ul><li>Sequential patterns </li></ul></ul><ul><ul><li>Clustering </li></ul></ul>
    7. 7. Reporting and Online Analytical Processing (OLAP) <ul><li>Reporting (a.k.a. summary methods or baby stats) is one of the most basic, but extremely useful, techniques for data analysis. </li></ul><ul><ul><li>Provides simple views of the data such as counts, sums, percentages, and averages. </li></ul></ul><ul><ul><li>Sample query: How many units did we sell last month? </li></ul></ul><ul><li>OLAP (think multi-dimensional cross-tabulation) is useful because it provides “cubes” of “ reports” that can break down one variable by another. </li></ul><ul><ul><li>Differs from traditional cross-tabs because it is interactive and you can “drill down” through the live reports to get more specific views of each cube (cell). </li></ul></ul><ul><ul><li>See SPSS example in class. </li></ul></ul>
    8. 8. Traditional Cross-tab vs. OLAP Days per week * SUBHT Cross-tabulation Count 56 114 170 90 116 206 56 42 98 41 132 173 1 1 2 244 405 649 daily 2-3 times once Sunday 5 Total no yes SUBHT Total Days per week
    9. 9. Associations <ul><li>The basic premise of associations is to find all relationships such that the presence of one set of items in a transaction implies other items, while controlling for extraneous factors </li></ul><ul><li>Ex. 75% of consumers who buy beer also buy corn chips; 26% of consumers who buy beer and corn chips also buy salsa </li></ul><ul><li>Easy to do simple data analyses to discover these relationships </li></ul><ul><li>Data mining software can simultaneously control for other variables that impact these relationships. Ex. Sale items, competitor actions, etc. to get true effects. </li></ul>
    10. 10. Classification <ul><li>Classification or profile generation uses data to develop profiles of different groups. </li></ul><ul><li>Can be used for segmenting and targeting, market evaluation, product management, etc. </li></ul><ul><li>Typically uses historical data to form rules that define groups. Those rules are then applied to new data to find similar groups. </li></ul><ul><li>Ex. Based on past results, a “hot prospect” is a person who has an advanced degree, earns $150K or more, has made three online purchases over the last month, and has purchased computer related equipment within the past year. Find person who fits that profile and you have a good prospect. </li></ul>
    11. 11. Sequential Patterns <ul><li>This technique looks at purchases, or events, occurring in a sequence over time and tries to uncover patterns of behavior. </li></ul><ul><li>Greatly useful for tracking effects of promotional activities; also for work force allocation, inventory management, and pricing/valuation. </li></ul><ul><li>Ex., Through data mining, company notices that when a customer buys a new DVD player, 85% of them return within three months to purchase speakers. </li></ul><ul><li>Ex., a company with a brand new potato chip is trying to decide what time of the year to launch it. Might look for events, trends, etc. </li></ul>
    12. 12. Clustering <ul><li>Clustering will segment a database into subsets or clusters, creating a set of groups which have the maximum similarity within them and the maximum difference between them. </li></ul><ul><li>Allows for consideration of multiple variables simultaneously. </li></ul><ul><li>Great for building consumer segments, perceptual mapping, brand image assessment, etc. </li></ul><ul><li>Ex. A company gathers consumer opinions on 25 product attributes, then develops clusters based on those attributes </li></ul><ul><li>Web Miner * - data mining consultancy utilizing cluster analysis with pre-mined demographic information </li></ul>
    13. 13. Data Mining Tools/ Web Mining <ul><li>Web Groove -solutions turn visitors into valued customers using adaptive personalization technology and click-stream analysis to increase ROI how it works </li></ul><ul><li>SPSS – Statistical Software utilizes cross-tabulation, OLAP cube, Crystal Reports etc. </li></ul><ul><li>Microsoft Excel – Spreadsheet </li></ul><ul><li>Oracle – data warehouse </li></ul><ul><li>Quadstone – uses a combination of analytical models including cross tabulation, OLAP, decision trees, scorecards. - demo available </li></ul><ul><li>Web Miner * -cluster analysis utilizing pre-mined demographic information – demo </li></ul><ul><li>DataDistilleries – Dutch data mining company </li></ul><ul><li>CRISP-DM – Cross Industry Standard Process for Data Mining Project </li></ul><ul><li>OpenTracker : </li></ul><ul><li>Cognos </li></ul><ul><li> </li></ul>
    14. 14. Data Mining Leads to Business Intelligence <ul><li>Using Web-enabled business intelligence technology and applications, businesses are learning more about their best customers, their supply chains and product life cycles - and fast! </li></ul><ul><li>Business intelligence delivers targeted, results-driven decisions and execution that can lead to competitive advantage. </li></ul><ul><li>Business intelligence and data warehousing infrastructure can empower employees! </li></ul>