Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

  1. 1. Syllabus: Management 954 Professor David L. Olson Advanced Topics in Information Systems CBA 256 Fall 2003 (402) 472-4521 CBA206 FAX (402) 472-5855 Tue 5:30-8:20 p.m. Graduate Seminar on Data Warehousing and Mining The widespread use of enterprise resource planning systems and other electronic means of generating data within business organizations has led to the development of many new information technology tools to support business. One of the most successful such applications is large scale data management, to include data warehouses and other products. These data management systems make it possible for business management to manipulate data much more effectively, to include analysis in the form of data mining. Data Mining (also called "knowledge discovery") technology is used increasingly by many companies to search large databases with the intent of discovering previously unknown, valid, and actionable information that is then used to make crucial business decisions. Understanding this information generation system and the tools vailable leading to analysis is fundamental for business students in the 21st Century Course Description: This course provides students exposure to important developments in large-scale database systems and the analysis of this data to aid business decision making. Initial classes describe major data storage tools useful in data mining, to include data warehousing, data marts, and on- line analytic processing (OLAP). This course then discusses fundamental concepts relating to data mining, data mining applications in a number of business decision making contexts, and data mining software and procedure. Data mining techniques will be reviewed, with the opportunity for specific students to explore research opportunities in greater depth. Course objectives: • Describe data warehousing and related products that support data mining • Describe functions of data marts and OLAP products • Discuss typical data mining procedure • Define data mining and identify typical benefits • Describe the primary analytic methods used in data mining Text Material: The instructor has extensive materials on each topic. Students will be expected to find new articles related to refereed publications on the topic. Justification: Data mining can be generally viewed as a statistical analysis of large quantities of data. This statistical analysis can be accomplished by traditional regression and classification techniques, as well as by artificial intelligence related techniques such as neural network technology.
  2. 2. Additionally artificial intelligence related techniques such as genetic algorithms, fuzzy sets and mathematical programming can enhance either the traditional or neural network approaches. Data mining is made possible because of the successes in large-scale database technology, to include data warehousing. There have been many successful applications of data mining, literally covering all fields of academic endeavor. Data mining has been extremely successful in business, often tied to data gathering technology such as bar coding, enabling retail organizations to capture data from ordering inventory through receipt of material at loading docks, through cash register recording of sales, and automatic replenishment of stock with automatic pricing based upon knowledge of availability across vendors and demand by geographic location. Data mining has been widely used by banking firms in soliciting credit card customers, by insurance and telecommunication companies to detect fraud, and by manufacturing firms in quality control. Fingerhut was very successful in applying data mining through market segmentation of catalog customers, enabling them to efficiently target market specialty products to small segments of the public most likely to buy specific products. Banks and other business organizations utilize data mining as a way to implement customer relationship management, the identification of the value of each type of customer, enabling them to make intelligent marketing decisions to retain profitable customers and shed unprofitable business. Exposure of Ph.D. students to data warehousing and data mining would give them access to one of the most important developing areas in information technology. Course Content: As a graduate level course, two papers per student will be required for presentation in classes during the semester on papers mutually acceptable to student and instructor. There will also be a semester project for presentation in Week 15. GRADING: Exam 1: 100 points Papers: 20 points (2 papers, 10 points each) Semester project: 10 points for proposal, due October 7th 20 point presentation 70 point paper Final exam: 100 points Papers: Twice during the semester, each student will select and present a refereed, published article on data mining (any aspect of data mining). The presentations should be about 10 minutes. At the same time, submit a written review of data mining issues and aspects discussed in the paper. These reviews are to be no more than 3 pages. Semester Project: Students will be responsible for a paper where they discuss some research aspect of data mining, ideally related to the student’s programThe paper is to be literate and documented, relating what has been done in this field, and what sorts of interesting research questions remain. Proposal should be concise, stating research question, and expected sources Presentation in the last three weeks of the semester Paper due last day of class
  3. 3. Aug 26 Overview of business data mining Sep 2 Data mining processes Sep 9 Database support to data mining Sep 16 Business data mining applications Sep 23 Market Basket Analysis Sep 30 Overview of data mining techniques Oct 7 Memory based reasoning Oct 14 Instructor gone from country – read heavily Oct 21 NO CLASS - FALL BREAK Oct 28 Clustering analysis Nov 4 Decision tree algorithms Nov 11 Best fit methods Nov 18 Best fit methods Nov 25 NO CLASS – Instructor working out of town again Dec 2 Visualization/Text Mining Dec 9 Student Presentations Dec 16 8:15-10:15 p.m. (we may arrange earlier in the day) Final Exam
  4. 4. Week 1: Initial Description of Data Mining in Business Introduction What is Needed to Do Data Mining Data Warehousing On-Line Analytic Processing Data Mining Focused Marketing Business Data Mining Retailing Banking Credit Card Management Insurance Telecommunications Telemarketing Human Resource Management Data Mining Tools Appendix: Evolution of the Concept of Data Mining Week 2: Data Mining Processes Data Selection Data Preprocessing Data Transformation Data Mining Data Interpretation Illustrative Example Example Data Mining Process Knowledge Discovery Process Week 3: Database Support to Data Mining Data Warehousing Data Marts On-Line Analytic Processing Data Warehouse Implementation Meta Data System Demonstrations Data Warehouse Data Mart OLAP Data Quality Software Products Real Examples Wal-Mart’s Data Warehouse System Summers Rubber Company Data Storage Design Week 4: Business Data Mining Applications Data Mining Techniques Mailstream Optimization at Fingerhut Lift Customer Relationship Management Credit Scoring Bankruptcy Prediction Investment Risk Analysis Data Mining Applications in Insurance Comparisons of Data Mining Methods Caveats
  5. 5. Week 5: Market-Basket Analysis Definitions Demonstration Market Basket Limitations Market-Basket Analysis Software Week 6: Overview of Data Mining Techniques Data Mining Models Machine Learning and Data Mining Data Mining Perspectives Data Mining Functions Demonstration Data Sets Loan Application Data Job Application Data Insurance Fraud Data Expenditure Data APPENDIX: Enterprise Miner Demonstration on Expenditure Data Set Data Partitioning Regression Modeling Decision Tree Modeling Neural Network Modeling Week 7: Memory Based Reasoning Matching Job Applicant Data Loan Application Data Insurance Fraud Data Distance Minimization Job Applicant Data Loan Application Data Insurance Fraud Data Applications of Memory Based Reasoning Census Classification Telecommunication Fraud Application of Methods to Larger Data Sets Job Applicant Data Loan Application Data Insurance Fraud Data Software Products Chapter 8: Clustering Algorithms Description of Cluster Analysis Job Applicant Data Loan Application Data Insurance Fraud Data Varying the Number of Clusters Applications of Cluster Analysis Monitoring Credit Card Accounts Data Mining of Insurance Claims Application of Methods to Larger Data Sets Job Applicant Data Loan Application Data Insurance Fraud Data Multiple Clusters Software Products
  6. 6. Week 9: Decision Tree Algorithms Description of Algorithm Tree Structure Machine Learning Fuzzy Decision Trees Decision Tree Applications Inventory Prediction Mining of Clinical Databases Software Development Quality Application of Methods to Larger Data Sets Loan Application Data Decision Tree Software Products Summary Appendix: Demonstration of See5 Decision Tree Analysis 1. Data Cleaning 2. Data Mining Process Week 10-11: Best Fit Data Mining Algorithms Regression Models Classical Tests of the Regression Model Multiple Regression Logistic Regression Neural Networks Example Neural Network Application Real Applications of Best Fit Models in Data Mining Data Mining to Target Customers Neural Network Models for Bankruptcy Prediction Application of Methods to Larger Data Sets Insurance Fraud Data Job Applicant Data Neural Network Products Week 12: Web and Text Mining Web Mining Web Usage Analysis Web Data Collection Web Mining Examples Web Mining Systems Web Usage Mining Demonstration Text Mining Text Mining Products Text Mining Demonstration
  7. 7. Extra Material Chapter 11: Linear Programming-Based Methods Linear Discriminant Analysis Multiple Criteria Linear Programming Classification Fuzzy Linear Programming Classification Credit Card Portfolio Management: A Real Application Linear Programming-based Software Support Chapter 12: Genetic Algorithm Support to Data Mining Demonstration of Genetic Algorithm Production Quality Test Design Genetic Algorithm Support to Data Mining Japanese Credit Screening Product Quality Testing Design Medical Analysis