Project Report
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Project Report

on

  • 881 views

 

Statistics

Views

Total Views
881
Views on SlideShare
881
Embed Views
0

Actions

Likes
1
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Project Report Document Transcript

  • 1. TOWSON UNIVERSITY COSC757 SPRING 2001 PROJECT REPORT ON GETTING A LOAN APPROVAL Instructor: Dr. Ramesh K. Karne Prepared by : Bohui Qi Jianping Du Jin Guo Lanlan Wang Yi Yu Ying Zhang
  • 2. CONTENTS 1 Data Mining Tool 3 1.1 How to Select Data Mining Tool? 3 1.2 Which Tool do we Select? 4 2 Application Example Chosen 4 2.1 Project Description 4 2.2 Project Implementation 4 3 Preparing Data 5 3.1 Select Appropriate Data for Mining 5 3.2 Perform Data Preprocessing 5 3.3 Perform Data Reduction and Projection 5 3.4 Data List 6 4 Mining Experiment 6 4.1 Conversion of Input Data 6 4.2 Algorithm for Data Mining 6 4.3 Procedures 7 5 Mining Results 7 6 Additional Input 8 6.1 Data Generalization 9 6.2 Model Build and Estimated with Holdout Method 9 6.3 Model Build and Estimated with Cross-Validation 9 7 Additional Mining Results 9 7.1 Test Decision Tree by New Data 10 7.2 Tree Pruning 10 8 Mining Technique Used in the Tool 11 9 Mining Technique Details and Information 12 10 Critical Evaluation of the Mining Technique Used 12 11 Visualization Technique Used in the Tool 14 12 Visualization Technique Details and Information 14 13 Setting Up Environment for the Tool -- Weka 16 13.1 Where to Download? 16 13.2 How to Set Up? 16 13.3 How to Use? 17 13.4 System Environment 18 13.5 Evaluation 18 13.6 Attachment 18 14 Conclusions 20 Appendix 1 21 Appendix 2 24 Appendix 3 45 Appendix 4 66 Appendix 5 68 Appendix 6 71 COSC757 Team Project Paper Page 2 of 80 Spring 2001
  • 3. Appendix 7 74 Appendix 8 (project proposal) 77 COSC757 Team Project Paper Page 3 of 80 Spring 2001
  • 4. Data Mining Project Report on Getting a Loan Approval Bohui Qi, Jianping Du, Jin Guo, Lanlan Wang, Yi Yu, Ying Zhang Department of Computer Science Towson University 1. Data Mining Tool 1.1 How to Select Data Mining Tool? With the proliferation of data warehouses, data mining tools are flooding the market. Their objective is to discover the hidden gold in your data. Many traditional report and query tools and statistical analysis systems use the term "data mining" in their product descriptions. What is a data-mining tool? The ultimate objective of data mining is knowledge discovery. Data mining methodology extracts hidden predictive information from large databases. With such a broad definition, however, an online analytical processing (OLAP) product or a statistical package could qualify as a data-mining tool. Data mining methodology extracts hidden predictive information from large databases. That's where the technology comes in: for true knowledge discovery a data mining tool should unearth the hidden information automatically. By this definition data mining is data-driven, not user-driven or verification-driven. One way to identify a true data-mining tool is by evaluating how it operates on the data: is it manually (top-down) or automatically (bottom-up)? In other words, who originates the query, the user or the software? There are two concerns driving the selection of the appropriate data-mining tool — your business objectives and your data structure. Both should guide you to the same tool. Consider following questions when evaluating a set of potential tools: • Is the data set heavily categorical? • What platforms do your candidate tools support? • Are the candidate tools ODBC-compliant? • What data format can the tools import? COSC757 Team Project Paper Page 4 of 80 Spring 2001
  • 5. No single tool is likely to provide the answer to your data mining project. Some tools integrate several technologies into a suite of statistical analysis programs, a neural network, and a symbolic classifier. 1.2 Which Tool do we Select? We found a lot of data mining tools on the web. But most of them are not free for use and we are also not familiar with them due to no detailed information available. Weka is a good tool for us because of its easy use and it can be easily downloaded for free. We have a book/manual about how to use this tool. 2. Application Example Chosen With the fast development of computers and networks, we have entered the Age of Information. Various business, scientific, and governmental organizations around the world generate an enormous volume of data everyday. In order to analyze and discover the hidden gold in such overwhelming amounts of data, scientists have developed automated computer systems for intelligent data analysis - data mining - on these data. The ultimate objective of data mining is knowledge discovery. Data mining methodology extracts hidden predictive information from a large database. In addition, for true knowledge discovery a data mining tool should unearth hidden information automatically. Different data mining methods are suited best for different applications. There are now many commercially available data mining tools. Thus, it is very important to have the right data mining tools to study the system of interests. 2.1 Project Description The goal of our project is to display patterns of the amount of loan approval in different groups in age, income, credit history, and home ownership, etc. The objective of this project is to use data mining tools to address what facts are and how they affect getting the approval of a person’s application of a certain amount of a loan. From the view of business values, we’d like to set some rules of identifying the critical factors and to what extent they affect the amount of the credit line approved for credit card companies, especially for the new startups. It’s important and has commercial benefits in real business world today to attract more and more potential and valuable customers, enlarge market shares in the industry, and minimize the financial risks for the credit card companies. 2.2 Project Implementation COSC757 Team Project Paper Page 5 of 80 Spring 2001
  • 6. For our project, we first organized the database, generated a set of data with attributes of age, income, credit history and home ownership, etc. All these data sets is prepared in an excel table and assessed of the structure. Based on objectives and the data structure, we evaluated several data mining tools and chose the one that is best (Weka data mining tool sets) suitable to mine this application, getting a loan approval. Finally, after observing and analyzing the new knowledge, we validated the findings. We concluded the roles of different intervals in each category in the amount of loan approval, which interval in each category has the highest approved credit line amount, and the most critical factors in all categories for the highest amount of credit line approval. We discussed the results of the analysis with some experts to ensure that the findings are correct and appropriate for the business objectives. The following sessions describe all details of our project. 3. Preparing Data 3.1 Select Appropriate Data for Mining Due to the large set of data, it is more effective to choose meaningful data for data mining. After our group members discussed several times, we chose the interesting data mining topic of credit card application approval. 3.2 Perform Data Preprocessing For this project, we referred the raw data from the website (ftp://ftp.ics.uci.edu/pub/machine-…databases/credit-screening) that introduced by instructor Dr. Karne. It is a credit card application approval database in UCI machine learning center. Data preprocessing is an important step in the data mining process. The preprocessing step is necessary to resolve several types of problems that frequent occur in large data sets. These problems include noisy data, redundancy data, and missing data values, etc. Preprocessing consists of data cleaning and missing value resolution. Database records often contain fields with bad or useless information. We do data cleaning by discarding meaningless attributes and resetting some attribute records using clear numeric variables. 3.3 Perform Data Reduction and Projection Determining useful features in the dataset may further reduce the size of selected dataset. Often there exist huge amounts of duplicated values in large databases, which are not what we’re interested in and they slow down the speed of mining process. So we reduce some COSC757 Team Project Paper Page 6 of 80 Spring 2001
  • 7. sensor-recorded data that frequently contains long stretches of uninteresting data with no exciting patterns. Reducing these data is more desirable and efficient. Data projection determines the best means to represent discovered information. We transformed some key attribute values so as to make them more reliable. After above steps, we got the data for our project, data mining of credit card application approval prepared. We selected four attributes for class ‘application approval’ as follows: • Credit history • Age • Income • House owner Please refer to data list in the appendix for details. For each application, credit loan is granted in following five recommend levels: • $0 • $5,000 • $10,000 • $20,000 • $50,000 By using the methods of attribute relevance analysis and our 87 pretest instances, we calculated Gain(A) as fellows: Gain(History) = 0.5210; Gain(Age) = 0.2143; Gain(Income) = 0.1926; Gain(House Owner) = 0.1085 . We think that all these attributes have good meaning. 3.4 Data list Please refer to the appendix. 4. Mining Experiment 4.1 Conversion of Input Data Before starting mining the data, we had to convert the data file to ARFF format since Weka only expects data to be in that format. The data file we found is in Excel format, so we followed the direction of how to convert data stored in Excel to ARFF format, and completed it successfully. 4.2 Algorithm for Data Mining Before starting the experiment, we need to specify the knowledge we want to extract, because the knowledge specificity determines what kind of mining function to be chosen. COSC757 Team Project Paper Page 7 of 80 Spring 2001
  • 8. In our project, we want to learn what kind of credit line should be recommended to a new applicant by mining a set of classified examples found in real world. That is a categorized problem, therefore we decide to use decision tree, the one of the basic techniques for data classification, to represent the knowledge that would be minded. 4.3 Procedures Classification is a form of data analysis, and it can be used to extract models describing important data class or make future prediction. Through this mining experiment, we built a decision tree in order to get some classification rules and use them to predict what amount of credit line should be given to a new applicant. Data classification is a two-step process. In the first step, a model is built by analyzing a set of training data through the classification algorithm. This is a learning step because the learnt model is actually a set of classification rules, and people try to use these rules to categorize the new data. The second step is to estimate the predictive accuracy of the model. If the accuracy is considered acceptable, the model can be used to classify future object which class is unknown. We took the following procedures for the project. 4.3.1 Module Building In order to build a model to classify the data, we select a set of training data. There are five attributes remaining after data preparation. First, we chose an attribute named “recommended” as class label attribute since we want to learn the proper credit line given to a custom. Second we created a training data set by selecting 96 tuples randomly based on customer age. The training data were analyzed by decision tree mining algorithm after tuples selecting. The learned model was presented in the form of decision tree shown in Appendix 4. 4.3.2 Increasing the Data Size General speaking, we can get better classifier if the size of training data is getting large, so we increased total training samples up to 811 and obtained the accuracy of the classification rule shown in Appendix 5. 5. Mining Result We observed our result, and found several facts. • The first part of Appendix 5 is a decision tree in textual form. There are seven levels in the tree. The first level is split on history attribute, and the second split COSC757 Team Project Paper Page 8 of 80 Spring 2001
  • 9. on income and house-owner respectively, and so on. The bottom level is split on age attribute. • Under the tree structure, there are 37 leaves nodes represent class distributions, and the size of the tree is 72, which represent the total number of nodes in the tree. Each node denotes a test on an attribute, and branches represent an outcome of the test. • The last section shows that 798 instances were classified correctly and 13 ones were misclassified. The correct percentage of classification on test data set is 98%. • The sum of the underline numbers shown in Confusion Matrix is equal to the number of correctly classified instances, and the sum of the rest numbers is the total number of misclassified instances. To make analysis easier, a diagram of decision tree was drawn as Appendix 8 based on its textual format of Appendix 5. From the tree presentation, we noticed that it was really hard to analyze the result even though the accuracy of mining result was very high. The tree level is very deep, equals 8. There were 71 branches in the tree, and the test value interval between two brunches with the same attribute was too small. This resulted in the tree being divided in so many parts. We realized that the data analyzed by mining algorithm were too big, such as the age value of applicant was from 18 to 80, and income value was from $20,000 to $120,000. The data need to be generalized before mining. 6. Additional Input In this step, we took two ways to build model and estimate its accuracy by using the generalized data. First we chose holdout method with its default size, so the input data were randomly partitioned into two independent set, 66% data were allocated to the training set to derive the classifier, and remaining 34% was used as test data whose accuracy is estimated. The result of this method is shown in Appendix 6. This method is thought pessimistic since only part of initial data is used to build the model. So, we used 10-fold cross-validation as second method for our project. In this method, the algorithm partitioned the data into 10 mutually exclusive folds with approximately equal size. Training and test set were performed 10 times. In each time, the subset Si was allocated as test data, and rest 9 subsets were treated as training data to classifier, so the accuracy COSC757 Team Project Paper Page 9 of 80 Spring 2001
  • 10. estimation is the overall number of correct classifications from the 10 iterations, divided by total samples of whole data set. Procedures 6.1 Data Generalization We decided to transform the data for further input. We set 4 groups for age attribute: age1: <20; age2: 20-40; age3: 41-60 and age4: >60, so the raw data were placed by high- level concept. For income, we also divided it into 4 sets. They are income1: <$30,000; income2: $30,000-$60,000; income3: $60,000-$90,000, and income4: >$90,000, therefore the data was generalized from low level to high level. For class label attribute, recommend1, recommend2, recommend3, recommend4 and recommend5 are represented credit line $0, $5,000, $10,000, $20,000 and $50,000 respectively. 6.2 Model Build and Estimated with Holdout Method The result of this method is shown in Appendix 6. In this figure, there were 496 data, only 2/3 training data selected to build a model, therefore it was hardly to tell if all samples with a certain class was missed out of the training set. Sometimes, the sample used for training or test set might not be representative. We used another method to estimate a built model. 6.3 Model Build and Estimated with Cross-Validation After data had been transformed, we put it into mining algorithm again and got the following result in Appendix 7. In this step, the model was built based on setting all the input data as training data, and the first set of measurements is derived from these data. There are 780 instances classified correctly and 30 instances misclassified. The accuracy is 96%. Such measurement is optimistic since the classifier has been learned form the very same training data. In this step, we didn’t set test data by typing the statement: java weak.classifier. j48.J48 – t credit.arff, therefore the algorithm automatically performed a ten-fold cross-validation to evaluate the model. The final section of Appendix 7 presented the result obtained using this method. 7. Additional Mining Result • It is a four-level tree in Appendix 7. The first level is split on history attribute, and the second split on income and house-owner respectively, the third level is divided COSC757 Team Project Paper Page 10 of 80 Spring 2001
  • 11. into age and income and the bottom level is split on age and house-owner attributes. • Under the tree structure, there are 60 leaves nodes represent class distributions, and the size of the tree is 80, which represent the total number of nodes in the tree. • The last section shows 764 instances are classified correctly and 46 ones are misclassified. The correct percentage of classification on test data set is 94%. We can see that the result shown in Appendix 7 is easier to analyze than Appendix 5, since the tree is shallower than the pervious one. The accuracy of result is a little bit lower than the first input, however ninety-four is still high enough and considered to be accessible, so the knowledge mined form decision tree algorithm can be used to predict future data samples, and provide a better understanding of the data contents. 7.1 Test Decision Tree by New Data After the decision tree was built, we used another 15 new data, which are different from all the training data and test data of our experiment, and used these data to test the accuracy of classification rules. The test data shows as below: 16, 78560, none, no, 10000 43, 89630, none, yes, 10000 44, 88888, none, no, 5000 19, 100045, none, no, 5000 19, 112480, bad, yes, 5000 19, 426900, bad, no, 0 20, 22000, good, yes, 10000 30, 21000, good, no, 10000 32, 26580, none, yes, 5000 23, 28000, none, no, 5000 21, 29650, bad, yes, 5000 22, 28500, bad, no, 0 28, 45600, none, yes, 10000 36, 39520, none, no, 5000 52, 36540, good, yes, 0 In those data, we found that 14 new samples fit the rule, but the instance (52, 36540, good, yes, 0) is incorrect. The accurate rate of the new data is 93%. 7.2 Tree Pruning COSC757 Team Project Paper Page 11 of 80 Spring 2001
  • 12. The last step of our experiment is tree pruning. We draw the tree based on Appendix 7 as shown in Appendix 8. We found that some leaves represented the different groups of a certain attribute belong to the same class, so we tied these brunch together and built a more simple tree shown in Appendix 10. We know that the knowledge learnt from decision trees can be extracted and presented in If-Then rules. We convert the tree to classification rules by tracing the path from the root node to each leaf node for easy to analysis. Here we only listed part of the rules extracted from Appendix 7. According to the above classification rules, we are now able to determine appropriate credit line for individual credit card applicants. IF history=”Good” AND income =”income1”, Then recommended=”recommend3” IF history=”Good” AND income =”income2” AND (age = “age1”OR age = “age3”OR age = “age4”, Then recommended=”recommend3” IF history=”Good”, income =”income2”, age =”age2” AND house-owner =”yes”, Then recommended=”recommend4” IF history=”Good”, income =”income2”, ”, age =”age2” AND house-owner =”no”, Then recommended=”recommend3” IF history=”Bad”, house-owner =”yes”, income =”income2”, AND age =”age2” Then recommended=”recommend2” IF history=”Bad”, house-owner =”no”, Then recommended=”recommend1” IF history=”None”, house-owner =”yes”, income =”income2”, AND age =”age2” Then recommended=”recommend3” IF history=”None”, house-owner =”yes”, income =”income1”, AND age =”age2” Then recommended=”recommend2” IF history=”None”, house-owner =”yes”, income =”income4”, AND age =”age3” Then recommended=”recommend4” IF history=”None”, house-owner =”no”, income =”income1”, Then recommended=”recommend2” IF history=”None”, house-owner =”no”, income =”income3”, AND age=”age3”, Then recommended=”recommend2” 8. Mining Technique Used in the Tool COSC757 Team Project Paper Page 12 of 80 Spring 2001
  • 13. In Weka3.0 software suit, we choose J48 as our mining tool. Decision tree is the main mining technique used in the J48 algorithm. In order to improve the classifier accuracy, we used both holdout and 10-fold cross-validation methods. 9. Mining Technique Details and Information Decision tree divides the data into groups based on values of the variables. The main methodology is to use a hierarchy of if-then statements to classify the data. This structure has a form of a tree. Following this procedure, one eventually finds a conclusion to which class the considered object should be assigned. There has been a surge of interest in decision tree-based products, primarily because they are faster than neural networks for many business problems and easier for users to understand. However, this method can be applied for solution of classification tasks only and it may not work with some types of data such as continuous sets of data, like age or sales, and require that they be grouped into ranges. This limits applicability of the decision trees method in many fields. The way a range is selected can inadvertently hide patterns. For instance, if age is broken into a 25 to 34-year-old group, the fact that there is a significant break at 30 may be concealed. To avoid this problem, it is possible by assigning values to groups in a fuzzy way -- each instance of the same value may be assigned to a different group. To estimate classifier accuracy, holdout and k-fold cross-validation are two common methods. For holdout, two independent sets of data, a training set and a test set, were generated. The training set used 2/3 of data while the other 1/3 of date is allocated to the test set. Then, the classifier is derived from the training set and its accuracy is estimated with the test set. For 10-fold cross-validation, 10 equal size subsets S1, S2, …S10, were generated by randomly partition. Subset Si was used to do the test and the remaining 9 subsets were used t train classifier. After performing testing and testing for 10 times, the accuracy estimate is the overall number of correct classification from the 10 iterations, divided by the total number of samples in the initial data. 10. Critical Evaluation of the Mining Technique Used In order to solve business problems, data mining tools seek to address two key business requirements: • Description -- discovering patterns, associations and clusters of information. COSC757 Team Project Paper Page 13 of 80 Spring 2001
  • 14. • Prediction -- using those patterns to predict future trends and behaviors. Different data mining tools can help business in different ways. It is very important to differentiate among these tools with their technologies. Our project addresses what factors are and how they affect a person’s approval of a certain amount of a loan. Decision tree is a better mining technique for this project. The reasons are as follows: • In data preparation phase of the project, we prepared data description, data cleaning, data selection and data transformation. This is very crucial for the development of our model and important to select the data mining tool. After considering the goals of the project and the data warehouse to be used, we decided that decision tree technique is a better tool in our project. • Decision tree technique provides a model of classification. In this project, we have separated the intervals in age group, credit history group, income group, and home ownership group, and classified them to the different amount of loan provided. Although there might be some hidden patterns due to the breakdown of the continuous sets of data, such as age, income, the other two groups, credit history and home ownership, can be nicely set into different categories. And we paid special attention on setting the intervals of the two continuous data sets, age and income. Therefore, we believe that the decision tree is a better mining tool for this project. • The problem here is not a very complex system. Size of data is relatively small. Levels of interactions are low. Only a few variables present and their non-linearity is low too. Therefore, decision tree can give us a pretty good picture of the patterns. • Decision tree provides a good user interface to facilitate model building or pattern recognizing. After applying the decision tree analysis, the results are relatively easier to visualize. Therefore, it is easy and reliable for us to build a model and explore patterns generated by data mining tools. • Data preparation and access for decision tree is easy. The database is small and data are all at intervals. Therefore, decision tree has good performance with high speed and accuracy. • The model provided generated by decision tree is relatively easy to understand and interpret. In addition, it has interface to many tools that can further help the knowledge discovery process. COSC757 Team Project Paper Page 14 of 80 Spring 2001
  • 15. 11. Visualization Technique Used in the Tool We choose Microsoft Excel as an existing visualization for segmentations of this project, and for decision tree, we use Witten’s data mining machine learning tool. 12. Visualization Technique Details and Information J48 pruned tree give us a clear decision tree result, we use Microsoft word to draw the graph. Since the data of this project are multi-dimensioned into 2-D space, an overview of the entire segmentation is logically a 2-D map. Major search topics are plotted in the decision tree. The visualization developed for decision tree is shown in Figure 1, Figure 2, and Figure 3. The Figure 4 shows the number of instances that belong to each type of the class. The recommend3 ($10,000) was the maximum, and in decreasing order, it is recommend2, recommend1, recommend4, and recommend5 respectively. There are only a few in the recommend5 ($50,000). The Figure 5 shows the number of the last step, decisions (leaf node) decided by house owner levels. Most of them were ”no” (house owner). That means no house has more possibility to get credit loan than owning a house. Because we know that credit history was bad if no house then directly to recommend1 ($0) from decision tree. The Figure 6 shows the number of the last step decisions (leaf node) decided by age level. Each group is very similar with others. The Figure 7 shows that the number of the last step decisions (leaf node) decided by income levels. Most of them were income1 (<$30,000). Because low salary ones only be given either recommend1 ($0) or 2 ($5,000). Providing high enough salary, income would not be key factor for recommend decision. But income3 sometimes helps gaining recommend decision to higher levels. COSC757 Team Project Paper Page 15 of 80 Spring 2001
  • 16. Recommend 300 200 Instances 100 0 r1 r2 r3 r4 r5 Type Figure 4 The number of instances that belong to each type of the class Decided by House Ow ner 200 0 yes no Figure 5 The number of the last step decisions (leaf node) decided by house owner level Decided by Age 150 100 50 0 age1 age2 age3 age4 Figure 6 The number of the last step decisions (leaf node) decided by age level Decided by Incom e 100 50 0 income1 income2 income3 income4 Figure 7 The number of the last step decisions (leaf node) decided by income levels COSC757 Team Project Paper Page 16 of 80 Spring 2001
  • 17. 13. Setting Up Environment for the Tool -- Weka 13.1 Where to Download? We can go to the web site: http://www.cs.waikato.ac.nz/ml/weka/ to download weka software. After you access this web page, you can see different versions for your downloading. For example, you may select the stable GUI version, which includes visualization tools and lots of other improvements (weka-3-2.jar, 3,669,565 bytes, screenshots). But this version must use swing technique, if you do not have java1.3 JDK installed on your computer, we suggest you download another version, the stable book version (weka-3-0-4.jar, 1,576,597 bytes). This version needs to unzip the jar file. If you are under windows environment, it’s easier to download self-extracting executables for installing the GUI version of Weka under Windows weka-3-2.exe (3,874,492 bytes). The author also produces a joint version combining both Weka package and java JDK weka-3-2jre.exe (11,496,646 bytes, includes the Java Runtime Environment). For our project, we select weka-3-2.exe to download. It takes about 27 minutes with 56K modem to get an executable file weka-3-2.exe. If you download it in Towson University Computer Science Lab, it only takes about 4 minutes. 13.2 How to Set Up? Suppose we have downloaded the executable file to the file directory “C:WINDOWSDesktoptemp”. Now we are ready to set up the software. • Double click the executable file, you will see a pop up window which reminder you ”This will install WEKA, do you want to continue”. You just press the “Yes” button. • Another page will prompt you to close all other applications you are running, and you click the “Next” button to continue. • A window comes out to require you to read the license for using Weka, from the license description, we know this version was released in June 1991. You press the “Yes” button to go on. • Now you need to select an installation directory. We select “C:WINDOWSDesktoptemp”. • After selecting the directory, windows will reminder you WEKA will be added to “start menu group”. Click the “next” button to continue. • Now setup is ready to install WEKA on you computer, click the “install” button. COSC757 Team Project Paper Page 17 of 80 Spring 2001
  • 18. • It takes about 10 seconds to finish installation. You just click “finish” to finish this step. • Open your file to view, there are 18 files in total. 13.3 How to Use? After finishing the WEKA installation, the next very important thing is to master how to use this package. We talk about it step by step in the following: • Enter your file directory. In our case, it is ”C:WINDOWSDesktoptemp”. • Double click the file name “weka.jar”. This is a jar file, it contains 628 classes and when you click it, it will execute because it contains a file named as “manifest.mf”, you can easily double click to run it just as if it’s an .exe file. • Then you got a GUI picture. There are three buttons explicitly on it. • Click “Simple CLI” first, you access another picture. There are two parts on it. One is help description and the other small part on the bottom is for you to enter commands. In command text field, please type: “java weka.classifiers.j48.J48 –t C:WINDOWSDesktoptempdata weather.arff”. Press ENTER, you get your decision tree output. The attachment in 13.6 is an output when my command is “javaweka.classifiers.j48.J48 –t c:student5Weka-3-2dataweather.arff”. It works in Towson University Computer Science Lab. Note: the directory is different with what we talked above. For output analysis, please refer to chapter 5. • If you click the “Explore” button, you can get more features for many data mining analysis. Now you click the “open file” button to locate you file for mining. e.g. we select a data file from our directory: “C:WINDOWSDesktoptempdataweather.arff”. Then you can select which class you want to mining. Now you can view the first line. There are 6 buttons you can click. If you want to use the “Classify” method to analysis data, just click it. Then you can select the “Test options”. Suppose we use the default “Cross- validation”. This time, you click the “Start” button to get the output you want. You can also use other methods such as “Cluster” and “Associate” to get different output. If you want to have a visualized result, just click the “Visualize” button. A visual picture is created. • Use the “Explore” is more convenient than the “Simple CLI” due to it has more functions, but it is more difficult to learn and analysis. For the beginning stage of the studying, we suggest to use the “Simple CLI” for study purpose. COSC757 Team Project Paper Page 18 of 80 Spring 2001
  • 19. For details regarding how to use WEKA, please visit the following web: http://www.cs.waikato.ac.nz/~ml/weka/Experiments.pdf. 13.4 System Environment The WEKA package requires at least 7.6MB of hard disk space and you need to have Java Virtual Machine installed on your computer. 13.5 Evaluation Through using this software, we find WEKA provides a lot of functions for the user. For example, for implemented schemes for classification, it includes: • decision tree inducers • rule learners • naive Bayes • decision tables For more detailed WEKA functions, please visit the website: http://www.cs.waikato.ac.nz/ ml/weka/ . It’s easy to use for beginners and it is free for downloading. 13.6 Attachment: Welcome to the WEKA SimpleCLI Enter commands in the text field at the bottom of the window. Use the up and down arrows to move through previous commands. > help Command must be one of: java <classname> <args> break kill cls exit help <command> > java weka.classifiers.j48.J48 -t c:student5Weka-3-2dataweather.arff J48 pruned tree ------------------- outlook = sunny COSC757 Team Project Paper Page 19 of 80 Spring 2001
  • 20. | humidity <= 75: yes (2.0) | humidity > 75: no (3.0) outlook = overcast: yes (4.0) outlook = rainy | windy = TRUE: no (2.0) | windy = FALSE: yes (3.0) Number of Leaves : 5 Size of the tree : 8 === Error on training data === Correctly Classified Instances 14 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0 Root mean squared error 0 Relative absolute error 0 % Root relative squared error 0 % Total Number of Instances 14 === Confusion Matrix === a b  classified as 90 | a = yes 05 | b = no === Stratified cross-validation === Correctly Classified Instances 9 64.2857 % Incorrectly Classified Instances 5 35.7143 % Kappa statistic 0.186 Mean absolute error 0.3036 Root mean squared error 0.4813 Relative absolute error 63.75 % COSC757 Team Project Paper Page 20 of 80 Spring 2001
  • 21. Root relative squared error 97.5542 % Total Number of Instances 14 === Confusion Matrix === a b  classified as 7 2 | a = yes 3 2 | b = no 14. Conclusions This project shows that we can use data mining machine learning tool to discover useful knowledge like credit line granting rules for credit card applicants. And, data mining can address the question of how best to use historical data to discover general regularities and improve the process of decision-making. During the implementation of this project, we learned all of the knowledge that included in our project proposal. This project is very interesting though it is a hard work to finish. It is real a team work, each of our group members understands the project and contributes to the project. Thanks Dr. Karne for giving us this practice opportunity and a lot of valuable ideas and directions. COSC757 Team Project Paper Page 21 of 80 Spring 2001
  • 22. APPENDIX 1 @relation credit @attribute age real @attribute income real @attribute history {good, none, bad} @attribute house_owner {yes, no} @attribute recommended {20000, 10000, 5000, 0} @data 18,22000,good,yes,10000 19,21000,good,no,10000 18,18000,none,yes,10000 19,28000,none,no,5000 19,29650,bad,yes,0 18,28500,bad,no,0 17,32600,good,yes,10000 18,38620,good,no,10000 19,45600,none,yes,10000 19,39520,none,no,5000 19,59300,bad,yes,0 18,54280,bad,no,0 18,68420,good,yes,10000 18,70510,good,no,10000 17,89630,none,yes,10000 16,78560,none,no,10000 19,79465,bad,yes,0 19,88240,bad,no,0 18,96300,good,yes,10000 19,99860,good,no,10000 19,95680,none,yes,10000 19,100045,none,no,5000 19,112480,bad,yes,5000 19,426900,bad,no,0 20,22000,good,yes,10000 30,21000,good,no,10000 32,26580,none,yes,5000 23,28000,none,no,5000 COSC757 Team Project Paper Page 22 of 80 Spring 2001
  • 23. 21,29650,bad,yes,5000 22,28500,bad,no,0 25,38240,good,yes,20000 26,38620,good,no,10000 28,45600,none,yes,10000 36,39520,none,no,5000 39,59300,bad,yes,5000 33,64280,bad,no,0 29,68420,good,yes,20000 34,70510,good,no,20000 35,89630,none,yes,10000 32,78560,none,no,5000 33,79465,bad,yes,5000 36,88240,bad,no,0 39,96300,good,yes,5000 38,99860,good,no,5000 38,95680,none,yes,20000 39,100045,none,no,10000 36,112480,bad,yes,10000 25,426900,bad,no,0 42,28260,good,yes,10000 43,27560,good,no,10000 44,29000,none,yes,5000 58,28000,none,no,5000 59,29650,bad,yes,0 59,28500,bad,no,0 44,32600,good,yes,10000 59,38620,good,no,10000 55,45600,none,yes,10000 56,59530,none,no,10000 54,59300,bad,yes,5000 57,54280,bad,no,0 42,68420,good,yes,20000 41,70510,good,no,20000 43,89630,none,yes,10000 44,88888,none,no,5000 46,79465,bad,yes,5000 60,88240,bad,no,0 43,96300,good,yes,5000 42,99860,good,no,20000 COSC757 Team Project Paper Page 23 of 80 Spring 2001
  • 24. 44,326000,none,yes,20000 52,100045,none,no,10000 53,242560,bad,yes,5000 58,426900,bad,no,0 69,22000,good,yes,10000 70,28620,good,no,10000 76,29630,none,yes,5000 72,28000,none,no,5000 80,29650,bad,yes,0 64,28500,bad,no,0 62,52600,good,yes,10000 66,38620,good,no,10000 64,45600,none,yes,5000 61,59580,none,no,5000 62,59300,bad,yes,0 65,54280,bad,no,0 63,68420,good,yes,20000 69,70510,good,no,10000 64,89630,none,yes,10000 69,78560,none,no,5000 71,79465,bad,yes,5000 76,88240,bad,no,0 61,96300,good,yes,20000 62,423060,good,no,20000 63,95680,none,yes,10000 64,100045,none,no,5000 64,112480,bad,yes,5000 63,326900,bad,no,0 COSC757 Team Project Paper Page 24 of 80 Spring 2001
  • 25. APPENDIX 2 @relation credit @attribute age real @attribute income real @attribute history {good, none, bad} @attribute house_owner {yes, no} @attribute recommended {50000, 20000, 10000, 5000, 0} @data 18,22000,good,yes,10000 19,21000,good,no,10000 18,18000,none,yes,10000 19,28000,none,no,5000 19,29650,bad,yes,0 18,28500,bad,no,0 17,32600,good,yes,10000 18,38620,good,no,10000 19,45600,none,yes,10000 19,39520,none,no,5000 19,59300,bad,yes,0 18,54280,bad,no,0 18,68420,good,yes,10000 18,70510,good,no,10000 17,89630,none,yes,10000 16,78560,none,no,10000 19,79465,bad,yes,0 19,88240,bad,no,0 18,96300,good,yes,10000 19,99860,good,no,10000 19,95680,none,yes,10000 19,100045,none,no,5000 19,112480,bad,yes,5000 19,426900,bad,no,0 20,22000,good,yes,10000 30,21000,good,no,10000 32,26580,none,yes,5000 23,28000,none,no,5000 COSC757 Team Project Paper Page 25 of 80 Spring 2001
  • 26. 21,29650,bad,yes,5000 22,28500,bad,no,0 25,38240,good,yes,20000 26,38620,good,no,10000 28,45600,none,yes,10000 36,39520,none,no,5000 39,59300,bad,yes,5000 33,64280,bad,no,0 29,68420,good,yes,20000 34,70510,good,no,20000 35,89630,none,yes,10000 32,78560,none,no,5000 33,79465,bad,yes,5000 36,88240,bad,no,0 39,96300,good,yes,50000 38,99860,good,no,50000 38,95680,none,yes,20000 39,100045,none,no,10000 36,112480,bad,yes,10000 25,426900,bad,no,0 42,28260,good,yes,10000 43,27560,good,no,10000 44,29000,none,yes,5000 58,28000,none,no,5000 59,29650,bad,yes,0 59,28500,bad,no,0 44,32600,good,yes,10000 59,38620,good,no,10000 55,45600,none,yes,10000 56,59530,none,no,10000 54,59300,bad,yes,5000 57,54280,bad,no,0 42,68420,good,yes,20000 41,70510,good,no,20000 43,89630,none,yes,10000 44,88888,none,no,5000 46,79465,bad,yes,5000 60,88240,bad,no,0 43,96300,good,yes,50000 42,99860,good,no,20000 COSC757 Team Project Paper Page 26 of 80 Spring 2001
  • 27. 44,326000,none,yes,20000 52,100045,none,no,10000 53,242560,bad,yes,5000 58,426900,bad,no,0 69,22000,good,yes,10000 70,28620,good,no,10000 76,29630,none,yes,5000 72,28000,none,no,5000 80,29650,bad,yes,0 64,28500,bad,no,0 62,52600,good,yes,10000 66,38620,good,no,10000 64,45600,none,yes,5000 61,59580,none,no,5000 62,59300,bad,yes,0 65,54280,bad,no,0 63,68420,good,yes,20000 69,70510,good,no,10000 64,89630,none,yes,10000 69,78560,none,no,5000 71,79465,bad,yes,5000 76,88240,bad,no,0 61,96300,good,yes,20000 62,423060,good,no,20000 63,95680,none,yes,10000 64,100045,none,no,5000 64,112480,bad,yes,5000 63,326900,bad,no,0 18,22000,good,yes,10000 19,21000,good,no,10000 18,18000,none,yes,10000 19,28000,none,no,5000 19,29650,bad,yes,0 18,28500,bad,no,0 17,32600,good,yes,10000 18,38620,good,no,10000 19,45600,none,yes,10000 19,39520,none,no,5000 19,59300,bad,yes,0 18,54280,bad,no,0 COSC757 Team Project Paper Page 27 of 80 Spring 2001
  • 28. 18,68420,good,yes,10000 18,70510,good,no,10000 17,89630,none,yes,10000 16,78560,none,no,10000 19,79465,bad,yes,0 19,88240,bad,no,0 18,96300,good,yes,10000 19,99860,good,no,10000 19,95680,none,yes,10000 19,100045,none,no,5000 19,112480,bad,yes,5000 19,426900,bad,no,0 20,22000,good,yes,10000 30,21000,good,no,10000 32,26580,none,yes,5000 23,28000,none,no,5000 21,29650,bad,yes,5000 22,28500,bad,no,0 25,38240,good,yes,20000 26,38620,good,no,10000 28,45600,none,yes,10000 36,39520,none,no,5000 39,59300,bad,yes,5000 33,64280,bad,no,0 29,68420,good,yes,20000 34,70510,good,no,20000 35,89630,none,yes,10000 32,78560,none,no,5000 33,79465,bad,yes,5000 36,88240,bad,no,0 39,96300,good,yes,50000 38,99860,good,no,50000 38,95680,none,yes,20000 39,100045,none,no,10000 36,112480,bad,yes,10000 25,426900,bad,no,0 42,28260,good,yes,10000 43,27560,good,no,10000 44,29000,none,yes,5000 58,28000,none,no,5000 COSC757 Team Project Paper Page 28 of 80 Spring 2001
  • 29. 59,29650,bad,yes,0 59,28500,bad,no,0 44,32600,good,yes,10000 59,38620,good,no,10000 55,45600,none,yes,10000 56,59530,none,no,10000 54,59300,bad,yes,5000 57,54280,bad,no,0 42,68420,good,yes,20000 41,70510,good,no,20000 43,89630,none,yes,10000 44,88888,none,no,5000 46,79465,bad,yes,5000 60,88240,bad,no,0 43,96300,good,yes,50000 42,99860,good,no,20000 44,326000,none,yes,20000 52,100045,none,no,10000 53,242560,bad,yes,5000 58,426900,bad,no,0 69,22000,good,yes,10000 70,28620,good,no,10000 76,29630,none,yes,5000 72,28000,none,no,5000 80,29650,bad,yes,0 64,28500,bad,no,0 62,52600,good,yes,10000 66,38620,good,no,10000 64,45600,none,yes,5000 61,59580,none,no,5000 62,59300,bad,yes,0 65,54280,bad,no,0 63,68420,good,yes,20000 69,70510,good,no,10000 64,89630,none,yes,10000 69,78560,none,no,5000 71,79465,bad,yes,5000 76,88240,bad,no,0 61,96300,good,yes,20000 62,423060,good,no,20000 COSC757 Team Project Paper Page 29 of 80 Spring 2001
  • 30. 63,95680,none,yes,10000 64,100045,none,no,5000 64,112480,bad,yes,5000 63,326900,bad,no,0 56,962450,good,yes,50000 61,864500,good,no,20000 20,362450,good,no,5000 23,356280,good,no,5000 26,356280,good,no,5000 29,289645,good,no,5000 36,295631,good,no,5000 37,423560,none,no,5000 32,365698,none,no,5000 23,295632,none,no,5000 23,469250,bad,yes,0 22,569840,bad,no,0 29,362350,bad,no,0 42,236589,bad,no,0 68,256398,bad,no,0 45,689532,bad,yes,5000 54,59300,bad,yes,5000 57,54280,bad,no,0 42,68420,good,yes,20000 41,70510,good,no,20000 43,89630,none,yes,10000 44,88888,none,no,5000 46,79465,bad,yes,5000 60,88240,bad,no,0 43,96300,good,yes,50000 42,99860,good,no,20000 44,326000,none,yes,20000 52,100045,none,no,10000 53,242560,bad,yes,5000 58,426900,bad,no,0 69,22000,good,yes,10000 70,28620,good,no,10000 76,29630,none,yes,5000 72,28000,none,no,5000 80,29650,bad,yes,0 35,89630,none,yes,10000 COSC757 Team Project Paper Page 30 of 80 Spring 2001
  • 31. 32,78560,none,no,5000 33,79465,bad,yes,5000 36,88240,bad,no,0 39,96300,good,yes,50000 38,99860,good,no,50000 38,95680,none,yes,20000 39,100045,none,no,10000 36,112480,bad,yes,10000 25,426900,bad,no,0 42,28260,good,yes,10000 43,27560,good,no,10000 44,29000,none,yes,5000 58,28000,none,no,5000 59,29650,bad,yes,0 59,28500,bad,no,0 44,32600,good,yes,10000 59,38620,good,no,10000 55,45600,none,yes,10000 18,22000,good,yes,10000 19,21000,good,no,10000 18,18000,none,yes,10000 19,28000,none,no,5000 19,29650,bad,yes,0 18,28500,bad,no,0 17,32600,good,yes,10000 18,38620,good,no,10000 19,45600,none,yes,10000 19,39520,none,no,5000 19,59300,bad,yes,0 18,54280,bad,no,0 18,68420,good,yes,10000 18,70510,good,no,10000 17,89630,none,yes,10000 16,78560,none,no,10000 43,89630,none,yes,10000 44,88888,none,no,5000 46,79465,bad,yes,5000 60,88240,bad,no,0 43,96300,good,yes,50000 42,99860,good,no,20000 COSC757 Team Project Paper Page 31 of 80 Spring 2001
  • 32. 44,326000,none,yes,20000 52,100045,none,no,10000 53,242560,bad,yes,5000 58,426900,bad,no,0 69,22000,good,yes,10000 70,28620,good,no,10000 76,29630,none,yes,5000 72,28000,none,no,5000 80,29650,bad,yes,0 64,28500,bad,no,0 62,52600,good,yes,10000 66,38620,good,no,10000 18,22000,good,yes,10000 19,21000,good,no,10000 18,18000,none,yes,10000 19,28000,none,no,5000 19,29650,bad,yes,0 18,28500,bad,no,0 17,32600,good,yes,10000 18,38620,good,no,10000 19,45600,none,yes,10000 19,39520,none,no,5000 19,59300,bad,yes,0 18,54280,bad,no,0 18,68420,good,yes,10000 18,70510,good,no,10000 17,89630,none,yes,10000 16,78560,none,no,10000 19,79465,bad,yes,0 19,88240,bad,no,0 18,96300,good,yes,10000 19,99860,good,no,10000 19,95680,none,yes,10000 19,100045,none,no,5000 19,112480,bad,yes,5000 19,426900,bad,no,0 20,22000,good,yes,10000 30,21000,good,no,10000 32,26580,none,yes,5000 23,28000,none,no,5000 COSC757 Team Project Paper Page 32 of 80 Spring 2001
  • 33. 21,29650,bad,yes,5000 22,28500,bad,no,0 25,38240,good,yes,20000 26,38620,good,no,10000 28,45600,none,yes,10000 36,39520,none,no,5000 39,59300,bad,yes,5000 33,64280,bad,no,0 29,68420,good,yes,20000 34,70510,good,no,20000 35,89630,none,yes,10000 32,78560,none,no,5000 33,79465,bad,yes,5000 36,88240,bad,no,0 39,96300,good,yes,50000 38,99860,good,no,50000 38,95680,none,yes,20000 39,100045,none,no,10000 36,112480,bad,yes,10000 25,426900,bad,no,0 42,28260,good,yes,10000 43,27560,good,no,10000 44,29000,none,yes,5000 58,28000,none,no,5000 59,29650,bad,yes,0 59,28500,bad,no,0 44,32600,good,yes,10000 59,38620,good,no,10000 55,45600,none,yes,10000 56,59530,none,no,10000 54,59300,bad,yes,5000 57,54280,bad,no,0 42,68420,good,yes,20000 41,70510,good,no,20000 43,89630,none,yes,10000 44,88888,none,no,5000 46,79465,bad,yes,5000 60,88240,bad,no,0 43,96300,good,yes,50000 42,99860,good,no,20000 COSC757 Team Project Paper Page 33 of 80 Spring 2001
  • 34. 44,326000,none,yes,20000 52,100045,none,no,10000 53,242560,bad,yes,5000 58,426900,bad,no,0 69,22000,good,yes,10000 70,28620,good,no,10000 76,29630,none,yes,5000 72,28000,none,no,5000 80,29650,bad,yes,0 64,28500,bad,no,0 62,52600,good,yes,10000 66,38620,good,no,10000 64,45600,none,yes,5000 61,59580,none,no,5000 62,59300,bad,yes,0 65,54280,bad,no,0 63,68420,good,yes,20000 69,70510,good,no,10000 64,89630,none,yes,10000 69,78560,none,no,5000 71,79465,bad,yes,5000 76,88240,bad,no,0 61,96300,good,yes,20000 62,423060,good,no,20000 63,95680,none,yes,10000 64,100045,none,no,5000 64,112480,bad,yes,5000 63,326900,bad,no,0 18,22000,good,yes,10000 19,21000,good,no,10000 18,18000,none,yes,10000 19,28000,none,no,5000 19,29650,bad,yes,0 18,28500,bad,no,0 17,32600,good,yes,10000 18,38620,good,no,10000 19,45600,none,yes,10000 19,39520,none,no,5000 19,59300,bad,yes,0 18,54280,bad,no,0 COSC757 Team Project Paper Page 34 of 80 Spring 2001
  • 35. 18,68420,good,yes,10000 18,70510,good,no,10000 17,89630,none,yes,10000 16,78560,none,no,10000 19,79465,bad,yes,0 19,88240,bad,no,0 18,96300,good,yes,10000 19,99860,good,no,10000 19,95680,none,yes,10000 19,100045,none,no,5000 19,112480,bad,yes,5000 19,426900,bad,no,0 20,22000,good,yes,10000 30,21000,good,no,10000 32,26580,none,yes,5000 23,28000,none,no,5000 21,29650,bad,yes,5000 22,28500,bad,no,0 25,38240,good,yes,20000 26,38620,good,no,10000 28,45600,none,yes,10000 36,39520,none,no,5000 39,59300,bad,yes,5000 33,64280,bad,no,0 29,68420,good,yes,20000 34,70510,good,no,20000 35,89630,none,yes,10000 32,78560,none,no,5000 33,79465,bad,yes,5000 36,88240,bad,no,0 39,96300,good,yes,50000 38,99860,good,no,50000 38,95680,none,yes,20000 39,100045,none,no,10000 36,112480,bad,yes,10000 25,426900,bad,no,0 42,28260,good,yes,10000 43,27560,good,no,10000 44,29000,none,yes,5000 58,28000,none,no,5000 COSC757 Team Project Paper Page 35 of 80 Spring 2001
  • 36. 59,29650,bad,yes,0 59,28500,bad,no,0 44,32600,good,yes,10000 59,38620,good,no,10000 55,45600,none,yes,10000 56,59530,none,no,10000 54,59300,bad,yes,5000 57,54280,bad,no,0 42,68420,good,yes,20000 41,70510,good,no,20000 43,89630,none,yes,10000 44,88888,none,no,5000 46,79465,bad,yes,5000 60,88240,bad,no,0 43,96300,good,yes,50000 42,99860,good,no,20000 44,326000,none,yes,20000 52,100045,none,no,10000 53,242560,bad,yes,5000 58,426900,bad,no,0 69,22000,good,yes,10000 70,28620,good,no,10000 76,29630,none,yes,5000 72,28000,none,no,5000 80,29650,bad,yes,0 64,28500,bad,no,0 62,52600,good,yes,10000 66,38620,good,no,10000 64,45600,none,yes,5000 61,59580,none,no,5000 62,59300,bad,yes,0 65,54280,bad,no,0 63,68420,good,yes,20000 69,70510,good,no,10000 64,89630,none,yes,10000 69,78560,none,no,5000 71,79465,bad,yes,5000 76,88240,bad,no,0 61,96300,good,yes,20000 62,423060,good,no,20000 COSC757 Team Project Paper Page 36 of 80 Spring 2001
  • 37. 63,95680,none,yes,10000 64,100045,none,no,5000 64,112480,bad,yes,5000 63,326900,bad,no,0 56,962450,good,yes,50000 61,864500,good,no,20000 20,362450,good,no,5000 23,356280,good,no,5000 26,356280,good,no,5000 29,289645,good,no,5000 36,295631,good,no,5000 37,423560,none,no,5000 32,365698,none,no,5000 23,295632,none,no,5000 23,469250,bad,yes,0 22,569840,bad,no,0 29,362350,bad,no,0 42,236589,bad,no,0 68,256398,bad,no,0 45,689532,bad,yes,5000 54,59300,bad,yes,5000 57,54280,bad,no,0 42,68420,good,yes,20000 41,70510,good,no,20000 43,89630,none,yes,10000 44,88888,none,no,5000 46,79465,bad,yes,5000 60,88240,bad,no,0 43,96300,good,yes,50000 42,99860,good,no,20000 44,326000,none,yes,20000 52,100045,none,no,10000 53,242560,bad,yes,5000 58,426900,bad,no,0 69,22000,good,yes,10000 70,28620,good,no,10000 76,29630,none,yes,5000 72,28000,none,no,5000 80,29650,bad,yes,0 35,89630,none,yes,10000 COSC757 Team Project Paper Page 37 of 80 Spring 2001
  • 38. 32,78560,none,no,5000 33,79465,bad,yes,5000 36,88240,bad,no,0 39,96300,good,yes,50000 38,99860,good,no,50000 38,95680,none,yes,20000 39,100045,none,no,10000 36,112480,bad,yes,10000 25,426900,bad,no,0 42,28260,good,yes,10000 43,27560,good,no,10000 44,29000,none,yes,5000 58,28000,none,no,5000 59,29650,bad,yes,0 59,28500,bad,no,0 44,32600,good,yes,10000 59,38620,good,no,10000 55,45600,none,yes,10000 18,22000,good,yes,10000 19,21000,good,no,10000 18,18000,none,yes,10000 19,28000,none,no,5000 19,29650,bad,yes,0 18,28500,bad,no,0 17,32600,good,yes,10000 18,38620,good,no,10000 19,45600,none,yes,10000 19,39520,none,no,5000 19,59300,bad,yes,0 18,54280,bad,no,0 18,68420,good,yes,10000 18,70510,good,no,10000 17,89630,none,yes,10000 16,78560,none,no,10000 43,89630,none,yes,10000 44,88888,none,no,5000 46,79465,bad,yes,5000 60,88240,bad,no,0 43,96300,good,yes,50000 42,99860,good,no,20000 COSC757 Team Project Paper Page 38 of 80 Spring 2001
  • 39. 44,326000,none,yes,20000 52,100045,none,no,10000 53,242560,bad,yes,5000 58,426900,bad,no,0 69,22000,good,yes,10000 70,28620,good,no,10000 76,29630,none,yes,5000 72,28000,none,no,5000 80,29650,bad,yes,0 64,28500,bad,no,0 62,52600,good,yes,10000 66,38620,good,no,10000 42,99860,good,no,20000 44,326000,none,yes,20000 52,100045,none,no,10000 53,242560,bad,yes,5000 58,426900,bad,no,0 69,22000,good,yes,10000 70,28620,good,no,10000 76,29630,none,yes,5000 72,28000,none,no,5000 19,45600,none,yes,10000 19,39520,none,no,5000 19,59300,bad,yes,0 18,54280,bad,no,0 18,68420,good,yes,10000 18,70510,good,no,10000 17,89630,none,yes,10000 16,78560,none,no,10000 43,89630,none,yes,10000 44,88888,none,no,5000 46,79465,bad,yes,5000 19,21000,good,no,10000 18,18000,none,yes,10000 19,28000,none,no,5000 19,29650,bad,yes,0 18,28500,bad,no,0 17,32600,good,yes,10000 18,38620,good,no,10000 19,45600,none,yes,10000 COSC757 Team Project Paper Page 39 of 80 Spring 2001
  • 40. 19,39520,none,no,5000 19,59300,bad,yes,0 18,54280,bad,no,0 18,68420,good,yes,10000 18,70510,good,no,10000 17,89630,none,yes,10000 16,78560,none,no,10000 43,89630,none,yes,10000 32,26580,none,yes,5000 23,28000,none,no,5000 21,29650,bad,yes,5000 22,28500,bad,no,0 25,38240,good,yes,20000 26,38620,good,no,10000 28,45600,none,yes,10000 36,39520,none,no,5000 39,59300,bad,yes,5000 33,64280,bad,no,0 29,68420,good,yes,20000 34,70510,good,no,20000 35,89630,none,yes,10000 32,78560,none,no,5000 33,79465,bad,yes,5000 36,88240,bad,no,0 39,96300,good,yes,50000 38,99860,good,no,50000 38,95680,none,yes,20000 39,100045,none,no,10000 36,112480,bad,yes,10000 61,59580,none,no,5000 62,59300,bad,yes,0 65,54280,bad,no,0 63,68420,good,yes,20000 69,70510,good,no,10000 64,89630,none,yes,10000 69,78560,none,no,5000 71,79465,bad,yes,5000 76,88240,bad,no,0 61,96300,good,yes,20000 62,423060,good,no,20000 COSC757 Team Project Paper Page 40 of 80 Spring 2001
  • 41. 63,95680,none,yes,10000 64,100045,none,no,5000 64,112480,bad,yes,5000 63,326900,bad,no,0 56,962450,good,yes,50000 61,864500,good,no,20000 20,362450,good,no,5000 23,356280,good,no,5000 26,356280,good,no,5000 29,289645,good,no,5000 36,295631,good,no,5000 37,423560,none,no,5000 32,365698,none,no,5000 23,295632,none,no,5000 23,469250,bad,yes,0 22,569840,bad,no,0 29,362350,bad,no,0 42,236589,bad,no,0 68,256398,bad,no,0 45,689532,bad,yes,5000 54,59300,bad,yes,5000 57,54280,bad,no,0 42,68420,good,yes,20000 41,70510,good,no,20000 43,89630,none,yes,10000 44,88888,none,no,5000 46,79465,bad,yes,5000 60,88240,bad,no,0 43,96300,good,yes,50000 42,99860,good,no,20000 44,326000,none,yes,20000 52,100045,none,no,10000 53,242560,bad,yes,5000 58,426900,bad,no,0 69,22000,good,yes,10000 70,28620,good,no,10000 76,29630,none,yes,5000 72,28000,none,no,5000 80,29650,bad,yes,0 35,89630,none,yes,10000 COSC757 Team Project Paper Page 41 of 80 Spring 2001
  • 42. 32,78560,none,no,5000 33,79465,bad,yes,5000 36,88240,bad,no,0 39,96300,good,yes,50000 38,99860,good,no,50000 38,95680,none,yes,20000 39,100045,none,no,10000 36,112480,bad,yes,10000 25,426900,bad,no,0 42,28260,good,yes,10000 43,27560,good,no,10000 44,29000,none,yes,5000 58,28000,none,no,5000 59,29650,bad,yes,0 59,28500,bad,no,0 44,32600,good,yes,10000 59,38620,good,no,10000 55,45600,none,yes,10000 18,22000,good,yes,10000 19,21000,good,no,10000 18,18000,none,yes,10000 19,28000,none,no,5000 19,29650,bad,yes,0 18,28500,bad,no,0 17,32600,good,yes,10000 18,38620,good,no,10000 19,45600,none,yes,10000 19,39520,none,no,5000 19,59300,bad,yes,0 18,54280,bad,no,0 18,68420,good,yes,10000 18,70510,good,no,10000 17,89630,none,yes,10000 16,78560,none,no,10000 43,89630,none,yes,10000 44,88888,none,no,5000 19,100045,none,no,5000 19,112480,bad,yes,5000 19,426900,bad,no,0 20,22000,good,yes,10000 COSC757 Team Project Paper Page 42 of 80 Spring 2001
  • 43. 30,21000,good,no,10000 32,26580,none,yes,5000 23,28000,none,no,5000 21,29650,bad,yes,5000 22,28500,bad,no,0 25,38240,good,yes,20000 26,38620,good,no,10000 28,45600,none,yes,10000 36,39520,none,no,5000 39,59300,bad,yes,5000 33,64280,bad,no,0 29,68420,good,yes,20000 34,70510,good,no,20000 35,89630,none,yes,10000 32,78560,none,no,5000 33,79465,bad,yes,5000 36,88240,bad,no,0 39,96300,good,yes,50000 38,99860,good,no,50000 38,95680,none,yes,20000 39,100045,none,no,10000 36,112480,bad,yes,10000 25,426900,bad,no,0 42,28260,good,yes,10000 43,27560,good,no,10000 44,29000,none,yes,5000 58,28000,none,no,5000 62,59300,bad,yes,0 65,54280,bad,no,0 63,68420,good,yes,20000 69,70510,good,no,10000 64,89630,none,yes,10000 69,78560,none,no,5000 71,79465,bad,yes,5000 76,88240,bad,no,0 61,96300,good,yes,20000 62,423060,good,no,20000 63,95680,none,yes,10000 64,100045,none,no,5000 64,112480,bad,yes,5000 COSC757 Team Project Paper Page 43 of 80 Spring 2001
  • 44. 63,326900,bad,no,0 18,22000,good,yes,10000 19,21000,good,no,10000 18,18000,none,yes,10000 19,28000,none,no,5000 19,29650,bad,yes,0 18,28500,bad,no,0 17,32600,good,yes,10000 18,38620,good,no,10000 19,45600,none,yes,10000 19,39520,none,no,5000 19,59300,bad,yes,0 18,54280,bad,no,0 18,68420,good,yes,10000 62,59300,bad,yes,0 65,54280,bad,no,0 63,68420,good,yes,20000 69,70510,good,no,10000 64,89630,none,yes,10000 69,78560,none,no,5000 71,79465,bad,yes,5000 76,88240,bad,no,0 61,96300,good,yes,20000 62,423060,good,no,20000 63,95680,none,yes,10000 64,100045,none,no,5000 64,112480,bad,yes,5000 63,326900,bad,no,0 18,22000,good,yes,10000 19,21000,good,no,10000 18,18000,none,yes,10000 19,28000,none,no,5000 19,29650,bad,yes,0 18,28500,bad,no,0 17,32600,good,yes,10000 18,38620,good,no,10000 19,45600,none,yes,10000 19,39520,none,no,5000 19,59300,bad,yes,0 18,54280,bad,no,0 COSC757 Team Project Paper Page 44 of 80 Spring 2001
  • 45. 18,68420,good,yes,10000 63,326900,bad,no,0 56,962450,good,yes,50000 61,864500,good,no,20000 20,362450,good,no,5000 23,356280,good,no,5000 26,356280,good,no,5000 29,289645,good,no,5000 36,295631,good,no,5000 37,423560,none,no,5000 32,365698,none,no,5000 23,295632,none,no,5000 23,469250,bad,yes,0 22,569840,bad,no,0 29,362350,bad,no,0 42,236589,bad,no,0 20,6790, none, no,10000 23,15689, none, yes,20000 45,36500, bad, no,10000 62,63530, bad, no,10000 36,85640, good, yes,5000 52,36540, good, yes, 0 46,63520, none, yes,0 COSC757 Team Project Paper Page 45 of 80 Spring 2001
  • 46. APPENDIX 3 @relation credit @attribute age {age1, age2, age3,age4} @attribute income {income1, income2, income3, income4} @attribute history {good, none, bad} @attribute house_owner {yes, no} @attribute recommend {recommend1, recommend2,recommend3, recommend4, recommend5} @data age1,income1,good,yes,recommend3 age1,income1,good,no, recommend3 age1,income1,none,yes,recommend3 age1,income1,none,no, recommend2 age1,income1,bad,yes, recommend1 age1,income1,bad,no, recommend1 age1,?, good,yes, recommend3 age1,income2,?,no, recommend3 age1,income2,none,yes,recommend3 age1,income2,none,no, recommend2 age1,income2,bad,yes, recommend1 age1,income2,bad,no, recommend1 age1,income3,?,yes, recommend3 age1,income3,good,no, recommend3 age1,income3,none,yes,recommend3 age1,income3,none,no, recommend3 age1,income3,bad,yes, recommend1 age1,income3,bad,no, recommend1 age1,income4,good,yes,recommend3 age1,income4,good,no, recommend3 ?,income4,none,yes, recommend3 age1,income4,none,no, recommend2 age1,income4,bad,yes, recommend2 age1,income4,bad,no, recommend1 age2,income1,good,yes,recommend3 age2,income1,good,no, recommend3 COSC757 Team Project Paper Page 46 of 80 Spring 2001
  • 47. age2,income1,none,yes,recommend2 age2,income1,none,no, recommend2 age2,income1,bad,yes, recommend2 age2,income1,bad,no, recommend1 age2,income2,good,yes,recommend4 age2,income2,good,no, recommend3 age2,income2,none,yes,recommend3 age2,income2,none,no, recommend2 age2,income2,bad,yes, recommend2 age2,income3,bad,no, recommend1 age2,income3,good,yes,recommend4 age2,income3,good,no, recommend4 age2,income3,none,yes,recommend3 age2,income3,none,no, recommend2 age2,income3,bad,yes, recommend2 age2,income3,bad,no, recommend1 age2,income4,good,yes,recommend5 age2,income4,good,no,? age2,income4,none,yes,recommend4 age2,income4,none,no, recommend3 age2,income4,bad,yes, recommend3 age2,income4,bad,no, recommend1 age3,income1,good,yes,recommend3 age3,income1,good,no, recommend3 age3,income1,none,yes,recommend2 age3,income1,none,no, recommend2 age3,income1,bad,yes, recommend1 age3,income1,bad,no, recommend1 age3,income2,good,yes,recommend3 age3,income2,good,no, recommend3 age3,income2,none,yes,recommend3 age3,income2,none,no, recommend3 age3,income2,bad,yes, recommend2 age3,income2,bad,no, recommend1 age3,income3,good,yes,recommend4 age3,income3,good,no, recommend4 age3,income3,none,yes,recommend3 age3,income3,none,no, recommend2 age3,income3,bad,yes,recommend2 age3,income3,bad,no,recommend1 COSC757 Team Project Paper Page 47 of 80 Spring 2001
  • 48. age3,income4,good,yes,recommend5 age3,income4,good,no,recommend4 age3,income4,none,yes,recommend4 age3,income4,none,no,recommend3 age3,income4,bad,yes,recommend2 age3,income4,bad,no,recommend1 age4,income1,good,yes,recommend3 age4,income1,good,no,recommend3 age4,income1,none,yes,recommend2 age4,income1,none,no,recommend2 age4,income1,bad,yes,recommend1 age4,income1,bad,no,recommend1 age4,income2,good,yes,recommend3 age4,income2,good,no,recommend3 age4,income2,none,yes,recommend2 age4,income2,none,no,recommend2 age4,income2,bad,yes,recommend1 age4,income2,bad,no,recommend1 age4,income3,good,yes,recommend4 age4,income3,good,no,recommend3 age4,income3,none,yes,recommend3 age4,income3,none,no,recommend2 age4,income3,bad,yes,recommend2 age4,income3,bad,no,recommend1 age4,income4,good,yes,recommend4 age4,income4,good,no,recommend4 age4,income4,none,yes,recommend3 age4,income4,none,no,recommend2 age4,income4,bad,yes,recommend2 age4,income4,bad,no,recommend1 age1,income1,good,yes,recommend3 age1,income1,good,no,recommend3 age1,income1,none,yes,recommend3 age1,income1,none,no,recommend2 age1,income1,bad,yes,recommend1 age1,income1,bad,no,recommend1 age1,income2,good,yes,recommend3 age1,income2,good,no,recommend3 age1,income2,none,yes,recommend3 age1,income2,none,no,recommend2 COSC757 Team Project Paper Page 48 of 80 Spring 2001
  • 49. age1,income2,bad,yes,recommend1 age1,income2,bad,no,recommend1 age1,income3,good,yes,recommend3 age1,income3,good,no,recommend3 age1,income3,none,yes,recommend3 age1,income3,none,no,recommend3 age1,income3,bad,yes,recommend1 age1,income3,bad,no,recommend1 age1,income4,good,yes,recommend3 age1,income4,good,no,recommend3 age1,income4,none,yes,recommend3 age1,income4,none,no,recommend2 age1,income4,bad,yes,recommend2 age1,income4,bad,no,recommend1 age2,income1,good,yes,recommend3 age2,income1,good,no,recommend3 age2,income1,none,yes,recommend2 age2,income1,none,no,recommend2 age2,income1,bad,yes,recommend2 age2,income1,bad,no,recommend1 age2,income2,good,yes,recommend4 age2,income2,good,no,recommend3 age2,income2,none,yes,recommend3 age2,income2,none,no,recommend2 age2,income2,bad,yes,recommend2 age2,income3,bad,no,recommend1 age2,income3,good,yes,recommend4 age2,income3,good,no,recommend4 age2,income3,none,yes,recommend3 age2,income3,none,no,recommend2 age2,income3,bad,yes,recommend2 age2,income3,bad,no,recommend1 age2,income4,good,yes,recommend5 age2,income4,good,no,recommend5 age2,income4,none,yes,recommend4 age2,income4,none,no,recommend3 age2,income4,bad,yes,recommend3 age2,income4,bad,no,recommend1 age3,income1,good,yes,recommend3 age3,income1,good,no,recommend3 COSC757 Team Project Paper Page 49 of 80 Spring 2001
  • 50. age3,income1,none,yes,recommend2 age3,income1,none,no,recommend2 age3,income1,bad,yes,recommend1 age3,income1,bad,no,recommend1 age3,income2,good,yes,recommend3 age3,income2,good,no,recommend3 age3,income2,none,yes,recommend3 age3,income2,none,no,recommend3 age3,income2,bad,yes,recommend2 age3,income2,bad,no,recommend1 age3,income3,good,yes,recommend4 age3,income3,good,no,recommend4 age3,income3,none,yes,recommend3 age3,income3,none,no,recommend2 age3,income3,bad,yes,recommend2 age3,income3,bad,no,recommend1 age3,income4,good,yes,recommend5 age3,income4,good,no,recommend4 age3,income4,none,yes,recommend4 age3,income4,none,no,recommend3 age3,income4,bad,yes,recommend2 age3,income4,bad,no,recommend1 age4,income1,good,yes,recommend3 age4,income1,good,no,recommend3 age4,income1,none,yes,recommend2 age4,income1,none,no,recommend2 age4,income1,bad,yes,recommend1 age4,income1,bad,no,recommend1 age4,income2,good,yes,recommend3 age4,income2,good,no,recommend3 age4,income2,none,yes,recommend2 age4,income2,none,no,recommend2 age4,income2,bad,yes,recommend1 age4,income2,bad,no,recommend1 age4,income3,good,yes,recommend4 age4,income3,good,no,recommend3 age4,income3,none,yes,recommend3 age4,income3,none,no,recommend2 age4,income3,bad,yes,recommend2 age4,income3,bad,no,recommend1 COSC757 Team Project Paper Page 50 of 80 Spring 2001
  • 51. age4,income4,good,yes,recommend4 age4,income4,good,no,recommend4 age4,income4,none,yes,recommend3 age4,income4,none,no,recommend2 age4,income4,bad,yes,recommend2 age4,income4,bad,no,recommend1 age3,income4,good,yes,recommend5 age4,income4,good,no,recommend4 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,none,no,recommend2 age2,income4,none,no,recommend2 age2,income4,none,no,recommend2 age2,income4,bad,yes,recommend1 age2,income4,bad,no, recommend1 age2,income4,bad,no, recommend1 age3,income4,bad,no, recommend1 age4,income4,bad,no, recommend1 age3,income4,bad,yes,recommend2 age3,income2,bad,yes,recommend2 age3,income2,bad,no, recommend1 age3,income3,good,yes,recommend4 age3,income3,good,no, recommend4 age3,income3,none,yes,recommend3 age3,income3,none,no, recommend2 age3,income3,bad,yes, recommend2 age3,income3,bad,no, recommend1 age3,income4,good,yes,recommend5 age3,income4,good,no, recommend4 age3,income4,none,yes,recommend4 age3,income4,none,no, recommend3 age3,income4,bad,yes, recommend2 age3,income4,bad,no, recommend1 age4,income1,good,yes,recommend3 age4,income1,good,no, recommend3 age4,income1,none,yes,recommend2 age4,income1,none,no, recommend2 COSC757 Team Project Paper Page 51 of 80 Spring 2001
  • 52. age4,income1,bad,yes, recommend1 age2,income3,none,yes,recommend3 age2,income3,none,no, recommend2 age2,income3,bad,yes, recommend2 age2,income3,bad,no, recommend1 age2,income4,good,yes,recommend5 age2,income4,good,no, recommend5 age2,income4,none,yes,recommend4 age2,income4,none,no, recommend3 age2,income4,bad,yes, recommend3 age2,income4,bad,no, recommend1 age3,income1,good,yes,recommend3 age3,income1,good,no, recommend3 age3,income1,none,yes,recommend2 age3,income1,none,no, recommend2 age3,income1,bad,yes, recommend1 age3,income1,bad,no, recommend1 age3,income2,good,yes,recommend3 age3,income2,good,no, recommend3 age3,income2,none,yes,recommend3 age1,income1,good,yes,recommend3 age1,income1,good,no, recommend3 age1,income1,none,yes,recommend3 age1,income1,none,no, recommend2 age1,income1,bad,yes, recommend1 age1,income1,bad,no, recommend1 age1,income2,good,yes,recommend3 age1,income2,good,no, recommend3 age1,income2,none,yes,recommend3 age1,income2,none,no, recommend2 age1,income2,bad,yes, recommend1 age1,income2,bad,no, recommend1 age1,income3,good,yes,recommend3 age1,income3,good,no, recommend3 age1,income3,none,yes,recommend3 age1,income3,none,no, recommend3 age3,income3,none,yes,recommend3 age3,income3,none,no, recommend2 age3,income3,bad,yes, recommend2 age3,income3,bad,no, recommend1 COSC757 Team Project Paper Page 52 of 80 Spring 2001
  • 53. age3,income4,good,yes,recommend5 age3,income4,good,no, recommend4 age3,income4,none,yes,recommend4 age3,income4,none,no, recommend3 age3,income4,bad,yes, recommend2 age3,income4,bad,no, recommend1 age4,income1,good,yes,recommend3 age4,income1,good,no, recommend3 age4,income1,none,yes,recommend2 age4,income1,none,no, recommend2 age4,income1,bad,yes, recommend1 age4,income1,bad,no, recommend1 age4,income2,good,yes,recommend3 age4,income2,good,no, recommend3 age1,income1,good,yes,recommend3 age1,income1,good,no, recommend3 age1,income1,none,yes,recommend3 age1,income1,none,no, recommend2 age1,income1,bad,yes, recommend1 age1,income1,bad,no, recommend1 age1,income2,good,yes,recommend3 age1,income2,good,no,recommend3 age1,income2,none,yes,recommend3 age1,income2,none,no,recommend2 age1,income2,bad,yes,recommend1 age1,income2,bad,no,recommend1 age1,income3,good,yes,recommend3 age1,income3,good,no,recommend3 age1,income3,none,yes,recommend3 age1,income3,none,no,recommend3 age1,income3,bad,yes,recommend1 age1,income3,bad,no,recommend1 age1,income4,good,yes,recommend3 age1,income4,good,no,recommend3 age1,income4,none,yes,recommend3 age1,income4,none,no,recommend2 age1,income4,bad,yes,recommend2 age1,income4,bad,no,recommend1 age2,income1,good,yes,recommend3 age2,income1,good,no,recommend3 COSC757 Team Project Paper Page 53 of 80 Spring 2001
  • 54. age2,income1,none,yes,recommend2 age2,income1,none,no,recommend2 age2,income1,bad,yes,recommend2 age2,income1,bad,no,recommend1 age2,income2,good,yes,recommend4 age2,income2,good,no,recommend3 age2,income2,none,yes,recommend3 age2,income2,none,no,recommend2 age2,income2,bad,yes,recommend2 age2,income3,bad,no,recommend1 age2,income3,good,yes,recommend4 age2,income3,good,no,recommend4 age2,income3,none,yes,recommend3 age2,income3,none,no,recommend2 age2,income3,bad,yes,recommend2 age2,income3,bad,no,recommend1 age2,income4,good,yes,recommend5 age2,income4,good,no,recommend5 age2,income4,none,yes,recommend4 age2,income4,none,no,recommend3 age2,income4,bad,yes,recommend3 age2,income4,bad,no,recommend1 age3,income1,good,yes,recommend3 age3,income1,good,no,recommend3 age3,income1,none,yes,recommend2 age3,income1,none,no,recommend2 age3,income1,bad,yes,recommend1 age3,income1,bad,no,recommend1 age3,income2,good,yes,recommend3 age3,income2,good,no,recommend3 age3,income2,none,yes,recommend3 age3,income2,none,no,recommend3 age3,income2,bad,yes,recommend2 age3,income2,bad,no,recommend1 age3,income3,good,yes,recommend4 age3,income3,good,no,recommend4 age3,income3,none,yes,recommend3 age3,income3,none,no,recommend2 age3,income3,bad,yes,recommend2 age3,income3,bad,no,recommend1 COSC757 Team Project Paper Page 54 of 80 Spring 2001
  • 55. age3,income4,good,yes,recommend5 age3,income4,good,no,recommend4 age3,income4,none,yes,recommend4 age3,income4,none,no,recommend3 age3,income4,bad,yes,recommend2 age3,income4,bad,no,recommend1 age4,income1,good,yes,recommend3 age4,income1,good,no,recommend3 age4,income1,none,yes,recommend2 age4,income1,none,no,recommend2 age4,income1,bad,yes,recommend1 age4,income1,bad,no,recommend1 age4,income2,good,yes,recommend3 age4,income2,good,no,recommend3 age4,income2,none,yes,recommend2 age4,income2,none,no,recommend2 age4,income2,bad,yes,recommend1 age4,income2,bad,no,recommend1 age4,income3,good,yes,recommend4 age4,income3,good,no,recommend3 age4,income3,none,yes,recommend3 age4,income3,none,no,recommend2 age4,income3,bad,yes,recommend2 age4,income3,bad,no,recommend1 age4,income3,good,yes,recommend4 age4,income4,good,no,recommend4 age4,income4,none,yes,recommend3 age4,income4,none,no,recommend2 age4,income4,bad,yes,recommend2 age4,income4,bad,no,recommend1 age1,income1,good,yes,recommend3 age1,income1,good,no,recommend3 age1,income1,none,yes,recommend3 age1,income1,none,no,recommend2 age1,income1,bad,yes,recommend1 age1,income1,bad,no,recommend1 age1,income2,good,yes,recommend3 age1,income2,good,no,recommend3 age1,income2,none,yes,recommend3 age1,income2,none,no,recommend2 COSC757 Team Project Paper Page 55 of 80 Spring 2001
  • 56. age1,income2,bad,yes,recommend1 age1,income2,bad,no,recommend1 age1,income3,good,yes,recommend3 age1,income3,good,no,recommend3 age1,income3,none,yes,recommend3 age1,income3,none,no,recommend3 age1,income3,bad,yes,recommend1 age1,income3,bad,no,recommend1 age1,income4,good,yes,recommend3 age1,income4,good,no,recommend3 age1,income4,none,yes,recommend3 age1,income4,none,no,recommend2 age1,income4,bad,yes,recommend2 age1,income4,bad,no,recommend1 age2,income1,good,yes,recommend3 age2,income1,good,no,recommend3 age2,income1,none,yes,recommend2 age2,income1,none,no,recommend2 age2,income1,bad,yes,recommend2 age2,income1,bad,no,recommend1 age2,income2,good,yes,recommend4 age2,income2,good,no,recommend3 age2,income2,none,yes,recommend3 age2,income2,none,no,recommend2 age2,income2,bad,yes,recommend2 age2,income3,bad,no,recommend1 age2,income3,good,yes,recommend4 age2,income3,good,no,recommend4 age2,income3,none,yes,recommend3 age2,income3,none,no,recommend2 age2,income3,bad,yes,recommend2 age2,income3,bad,no,recommend1 age2,income4,good,yes,recommend5 age2,income4,good,no,recommend5 age2,income4,none,yes,recommend4 age2,income4,none,no,recommend3 age2,income4,bad,yes,recommend3 age2,income4,bad,no,recommend1 age3,income1,good,yes,recommend3 age3,income1,good,no,recommend3 COSC757 Team Project Paper Page 56 of 80 Spring 2001
  • 57. age3,income1,none,yes,recommend2 age3,income1,none,no,recommend2 age3,income1,bad,yes,recommend1 age3,income1,bad,no,recommend1 age3,income2,good,yes,recommend3 age3,income2,good,no,recommend3 age3,income2,none,yes,recommend3 age3,income2,none,no,recommend3 age3,income2,bad,yes,recommend2 age3,income2,bad,no,recommend1 age3,income3,good,yes,recommend4 age3,income3,good,no,recommend4 age3,income3,none,yes,recommend3 age3,income3,none,no,recommend2 age3,income3,bad,yes,recommend2 age3,income3,bad,no,recommend1 age3,income4,good,yes,recommend5 age3,income4,good,no,recommend4 age3,income4,none,yes,recommend4 age3,income4,none,no,recommend3 age3,income4,bad,yes,recommend2 age3,income4,bad,no,recommend1 age4,income1,good,yes,recommend3 age4,income1,good,no,recommend3 age4,income1,none,yes,recommend2 age4,income1,none,no,recommend2 age4,income1,bad,yes,recommend1 age4,income1,bad,no,recommend1 age4,income2,good,yes,recommend3 age4,income2,good,no,recommend3 age4,income2,none,yes,recommend2 age4,income2,none,no,recommend2 age4,income2,bad,yes,recommend1 age4,income2,bad,no,recommend1 age4,income3,good,yes,recommend4 age4,income3,good,no,recommend3 age4,income3,none,yes,recommend3 age4,income3,none,no,recommend2 age4,income3,bad,yes,recommend2 age4,income3,bad,no,recommend1 COSC757 Team Project Paper Page 57 of 80 Spring 2001
  • 58. age4,income4,good,yes,recommend4 age4,income4,good,no,recommend4 age4,income4,none,yes,recommend3 age4,income4,none,no,recommend2 age4,income4,bad,yes,recommend2 age4,income4,bad,no,recommend1 age3,income4,good,yes,recommend5 age4,income4,good,no,recommend4 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,none,no,recommend2 age2,income4,none,no,recommend2 age2,income4,none,no,recommend2 age2,income4,bad,yes,recommend1 age2,income4,bad,no,recommend1 age2,income4,bad,no,recommend1 age3,income4,bad,no,recommend1 age4,income4,bad,no,recommend1 age3,income4,bad,yes,recommend2 age3,income2,bad,yes,recommend2 age3,income2,bad,no,recommend1 age3,income3,good,yes,recommend4 age3,income3,good,no,recommend4 age3,income3,none,yes,recommend3 age3,income3,none,no,recommend2 age3,income3,bad,yes,recommend2 age3,income3,bad,no,recommend1 age3,income4,good,yes,recommend5 age3,income4,good,no,recommend4 age3,income4,none,yes,recommend4 age3,income4,none,no,recommend3 age3,income4,bad,yes,recommend2 age3,income4,bad,no,recommend1 age4,income1,good,yes,recommend3 age4,income1,good,no,recommend3 age4,income1,none,yes,recommend2 age4,income1,none,no,recommend2 COSC757 Team Project Paper Page 58 of 80 Spring 2001
  • 59. age4,income1,bad,yes,recommend1 age2,income3,none,yes,recommend3 age2,income3,none,no,recommend2 age2,income3,bad,yes,recommend2 age2,income3,bad,no,recommend1 age2,income3,good,yes,recommend5 age2,income4,good,no,recommend5 age2,income4,none,yes,recommend4 age2,income4,none,no,recommend3 age2,income4,bad,yes,recommend3 age2,income4,bad,no,recommend1 age3,income1,good,yes,recommend3 age3,income1,good,no,recommend3 age3,income1,none,yes,recommend2 age3,income1,none,no,recommend2 age3,income1,bad,yes,recommend1 age3,income1,bad,no,recommend1 age3,income2,good,yes,recommend3 age3,income2,good,no,recommend3 age3,income2,none,yes,recommend3 age1,income1,good,yes,recommend3 age1,income1,good,no,recommend3 age1,income1,none,yes,recommend3 age1,income1,none,no,recommend2 age1,income1,bad,yes,recommend1 age1,income1,bad,no,recommend1 age1,income2,good,yes,recommend3 age1,income2,good,no,recommend3 age1,income2,none,yes,recommend3 age1,income2,none,no,recommend2 age1,income2,bad,yes,recommend1 age1,income2,bad,no,recommend1 age1,income3,good,yes,recommend3 age1,income3,good,no,recommend3 age1,income3,none,yes,recommend3 age1,income3,none,no,recommend3 age3,income3,none,yes,recommend3 age3,income3,none,no,recommend2 age3,income3,bad,yes,recommend2 age3,income3,bad,no,recommend1 COSC757 Team Project Paper Page 59 of 80 Spring 2001
  • 60. age3,income4,good,yes,recommend5 age3,income4,good,no,recommend4 age3,income4,none,yes,recommend4 age3,income4,none,no,recommend3 age3,income4,bad,yes,recommend2 age3,income4,bad,no,recommend1 age4,income1,good,yes,recommend3 age4,income1,good,no,recommend3 age4,income1,none,yes,recommend2 age4,income1,none,no,recommend2 age4,income1,bad,yes,recommend1 age4,income1,bad,no,recommend1 age4,income2,good,yes,recommend3 age4,income2,good,no,recommend3 age3,income4,good,no,recommend4 age3,income4,none,yes,recommend4 age3,income4,none,no,recommend3 age3,income4,bad,yes,recommend2 age3,income4,bad,no,recommend1 age4,income1,good,yes,recommend3 age4,income1,good,no,recommend3 age4,income1,none,yes,recommend2 age4,income1,none,no,recommend2 age1,income2,none,yes,recommend3 age1,income2,none,no,recommend2 age1,income2,bad,yes,recommend1 age1,income2,bad,no,recommend1 age1,income3,good,yes,recommend3 age1,income3,good,no,recommend3 age1,income3,none,yes,recommend3 age1,income3,none,no,recommend3 age3,income3,none,yes,recommend3 age3,income3,none,no,recommend2 age3,income3,bad,yes,recommend2 age1,income1,good,no,recommend3 age1,income1,none,yes,recommend3 age1,income1,none,no,recommend2 age1,income1,bad,yes,recommend1 age1,income1,bad,no,recommend1 age1,income2,good,yes,recommend3 COSC757 Team Project Paper Page 60 of 80 Spring 2001
  • 61. age1,income2,good,no,recommend3 age1,income2,none,yes,recommend3 age1,income2,none,no,recommend2 age1,income2,bad,yes,recommend1 age1,income2,bad,no,recommend1 age1,income3,good,yes,recommend3 age1,income3,good,no,recommend3 age1,income3,none,yes,recommend3 age1,income3,none,no,recommend3 age3,income3,none,yes,recommend3 age2,income1,none,yes,recommend2 age2,income1,none,no,recommend2 age2,income1,bad,yes,recommend2 age2,income1,bad,no,recommend1 age2,income2,good,yes,recommend4 age2,income2,good,no,recommend3 age2,income2,none,yes,recommend3 age2,income2,none,no,recommend2 age2,income2,bad,yes,recommend2 age2,income3,bad,no,recommend1 age2,income3,good,yes,recommend4 age2,income3,good,no,recommend4 age2,income3,none,yes,recommend3 age2,income3,none,no,recommend2 age2,income3,bad,yes,recommend2 age2,income3,bad,no,recommend1 age2,income4,good,yes,recommend5 age2,income4,good,no,recommend5 age2,income4,none,yes,recommend4 age2,income4,none,no,recommend3 age2,income4,bad,yes,recommend3 age4,income2,none,no,recommend2 age4,income2,bad,yes,recommend1 age4,income2,bad,no,recommend1 age4,income3,good,yes,recommend4 age4,income3,good,no,recommend3 age4,income3,none,yes,recommend3 age4,income3,none,no,recommend2 age4,income3,bad,yes,recommend2 age4,income3,bad,no,recommend1 COSC757 Team Project Paper Page 61 of 80 Spring 2001
  • 62. age4,income4,good,yes,recommend4 age4,income4,good,no,recommend4 age4,income4,none,yes,recommend3 age4,income4,none,no,recommend2 age4,income4,bad,yes,recommend2 age4,income4,bad,no,recommend1 age3,income4,good,yes,recommend5 age4,income4,good,no,recommend4 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,none,no,recommend2 age2,income4,none,no,recommend2 age2,income4,none,no,recommend2 age2,income4,bad,yes,recommend1 age2,income4,bad,no,recommend1 age2,income4,bad,no,recommend1 age3,income4,bad,no,recommend1 age4,income4,bad,no,recommend1 age3,income4,bad,yes,recommend2 age3,income2,bad,yes,recommend2 age3,income2,bad,no,recommend1 age3,income3,good,yes,recommend4 age3,income3,good,no,recommend4 age3,income3,none,yes,recommend3 age3,income3,none,no,recommend2 age3,income3,bad,yes,recommend2 age3,income3,bad,no,recommend1 age3,income4,good,yes,recommend5 age3,income4,good,no,recommend4 age3,income4,none,yes,recommend4 age3,income4,none,no,recommend3 age3,income4,bad,yes,recommend2 age3,income4,bad,no,recommend1 age4,income1,good,yes,recommend3 age4,income1,good,no,recommend3 age4,income1,none,yes,recommend2 age4,income1,none,no,recommend2 COSC757 Team Project Paper Page 62 of 80 Spring 2001
  • 63. age4,income1,bad,yes,recommend1 age2,income3,none,yes,recommend3 age2,income3,none,no,recommend2 age2,income3,bad,yes,recommend2 age2,income3,bad,no,recommend1 age2,income4,good,yes,recommend5 age2,income4,good,no,recommend5 age2,income4,none,yes,recommend4 age2,income4,none,no,recommend3 age2,income4,bad,yes,recommend3 age2,income4,bad,no,recommend1 age3,income1,good,yes,recommend3 age3,income1,good,no,recommend3 age3,income1,none,yes,recommend2 age3,income1,none,no,recommend2 age3,income1,bad,yes,recommend1 age3,income1,bad,no,recommend1 age3,income2,good,yes,recommend3 age3,income2,good,no,recommend3 age3,income2,none,yes,recommend3 age1,income1,good,yes,recommend3 age1,income1,good,no,recommend3 age1,income1,none,yes,recommend3 age1,income1,none,no,recommend2 age1,income1,bad,yes,recommend1 age1,income1,bad,no,recommend1 age1,income2,good,yes,recommend3 age1,income2,good,no,recommend3 age1,income2,none,yes,recommend3 age1,income2,none,no,recommend2 age1,income2,bad,yes,recommend1 age1,income2,bad,no,recommend1 age1,income3,good,yes,recommend3 age1,income3,good,no,recommend3 age1,income3,none,yes,recommend3 age1,income3,none,no,recommend3 age3,income3,none,yes,recommend3 age3,income3,none,no,recommend2 age1,income4,none,no,recommend2 age1,income4,bad,yes,recommend2 COSC757 Team Project Paper Page 63 of 80 Spring 2001
  • 64. age1,income4,bad,no,recommend1 age2,income1,good,yes,recommend3 age2,income1,good,no,recommend3 age2,income1,none,yes,recommend2 age2,income1,none,no,recommend2 age2,income1,bad,yes,recommend2 age2,income1,bad,no,recommend1 age2,income2,good,yes,recommend4 age2,income2,good,no,recommend3 age2,income2,none,yes,recommend3 age2,income2,none,no,recommend2 age2,income2,bad,yes,recommend2 age2,income3,bad,no,recommend1 age2,income3,good,yes,recommend4 age2,income3,good,no,recommend4 age2,income3,none,yes,recommend3 age2,income3,none,no,recommend2 age2,income3,bad,yes,recommend2 age2,income3,bad,no,recommend1 age2,income4,good,yes,recommend5 age2,income4,good,no,recommend5 age2,income4,none,yes,recommend4 age2,income4,none,no,recommend3 age2,income4,bad,yes,recommend3 age2,income4,bad,no,recommend1 age3,income1,good,yes,recommend3 age3,income1,good,no,recommend3 age3,income1,none,yes,recommend2 age3,income1,none,no,recommend2 age4,income2,bad,yes,recommend1 age4,income2,bad,no,recommend1 age4,income3,good,yes,recommend4 age4,income3,good,no,recommend3 age4,income3,none,yes,recommend3 age4,income3,none,no,recommend2 age4,income3,bad,yes,recommend2 age4,income3,bad,no,recommend1 age4,income4,good,yes,recommend4 age4,income4,good,no,recommend4 age4,income4,none,yes,recommend3 COSC757 Team Project Paper Page 64 of 80 Spring 2001
  • 65. age4,income4,none,no,recommend2 age4,income4,bad,yes,recommend2 age4,income4,bad,no,recommend1 age1,income1,good,yes,recommend3 age1,income1,good,no,recommend3 age1,income1,none,yes,recommend3 age1,income1,none,no,recommend2 age1,income1,bad,yes,recommend1 age1,income1,bad,no,recommend1 age1,income2,good,yes,recommend3 age1,income2,good,no,recommend3 age1,income2,none,yes,recommend3 age1,income2,none,no,recommend2 age1,income2,bad,yes,recommend1 age1,income2,bad,no,recommend1 age1,income3,good,yes,recommend3 age4,income2,bad,yes,recommend1 age4,income2,bad,no,recommend1 age4,income3,good,yes,recommend4 age4,income3,good,no,recommend3 age4,income3,none,yes,recommend3 age4,income3,none,no,recommend2 age4,income3,bad,yes,recommend2 age4,income3,bad,no,recommend1 age4,income4,good,yes,recommend4 age4,income4,good,no,recommend4 age4,income4,none,yes,recommend3 age4,income4,none,no,recommend2 age4,income4,bad,yes,recommend2 age4,income4,bad,no,recommend1 age1,income1,good,yes,recommend3 age1,income1,good,no,recommend3 age1,income1,none,yes,recommend3 age1,income1,none,no,recommend2 age1,income1,bad,yes,recommend1 age1,income1,bad,no,recommend1 age1,income2,good,yes,recommend3 age1,income2,good,no,recommend3 age1,income2,none,yes,recommend3 age1,income2,none,no,recommend2 COSC757 Team Project Paper Page 65 of 80 Spring 2001
  • 66. age1,income2,bad,yes,recommend1 age1,income2,bad,no,recommend1 age1,income3,good,yes,recommend3 age4,income4,bad,no,recommend1 age3,income4,good,yes,recommend5 age4,income4,good,no,recommend4 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,good,no,recommend2 age2,income4,none,no,recommend2 age2,income4,none,no,recommend2 age2,income4,none,no,recommend2 age2,income4,bad,yes,recommend1 age2,income4,bad,no,recommend1 age2,income4,bad,no,recommend1 age3,income4,bad,no,recommend1 age2,income1, none, no,recommend3 age2,income1, none, yes,recommend4 age3,income2, bad, no,recommend3 age4,income3, bad, no,recommend3 age2,income3, good, yes,recommend2 age3,income2, good, yes, recommend1 age3,income3, none, yes,recommend1 COSC757 Team Project Paper Page 66 of 80 Spring 2001
  • 67. APPENDIX 4 Decision rules: J48 pruned tree ------------------ history = good | income <= 59580: 10000 (16.0/1.0) | income > 59580 | | age <= 23: 10000 (4.0) | | age > 23 | | | income <= 79465: 20000 (6.0/1.0) | | | income > 79465 | | | | age <= 39: 50000 (2.0) | | | | age > 39: 20000 (4.0/1.0) history = none | house_owner = yes | | income <= 64280 | | | age <= 30: 10000 (3.0) | | | age > 30: 5000 (5.0/1.0) | | income > 64280: 10000 (8.0/2.0) | house_owner = no: 5000 (16.0/4.0) history = bad | house_owner = yes | | income <= 95680 | | | age <= 20: 0 (3.0) | | | age > 20 | | | | age <= 56: 5000 (5.0) | | | | age > 56: 0 (4.0/1.0) | | income > 95680: 5000 (4.0/1.0) | house_owner = no: 0 (16.0) Number of Leaves : 14 Size of the tree : 26 === Error on training data === Correctly Classified Instances 84 87.5 % Incorrectly Classified Instances 12 12.5 % Mean absolute error 0.0777 Root mean squared error 0.1971 Total Number of Instances 96 === Confusion Matrix === COSC757 Team Project Paper Page 67 of 80 Spring 2001
  • 68. a b c d e <-- classified as 2 1 0 0 0 | a = 50000 0 8 3 0 0 | b = 20000 0 1 28 6 0 | c = 10000 0 0 0 24 1 | d = 5000 0 0 0 0 22 | e = 0 === Stratified cross-validation === Correctly Classified Instances 61 63.5417 % Incorrectly Classified Instances 35 36.4583 % Mean absolute error 0.1507 Root mean squared error 0.3352 Total Number of Instances 96 === Confusion Matrix === a b c d e <-- classified as 0 3 0 0 0 | a = 50000 3 5 3 0 0 | b = 20000 0 6 21 8 0 | c = 10000 0 0 5 17 3 | d = 5000 0 0 0 4 18 | e = 0 COSC757 Team Project Paper Page 68 of 80 Spring 2001
  • 69. APPENDIX 5 J48 pruned tree ------------------ history = good | income <= 70510 | | income <= 59580 | | | income <= 32600: 10000 (82.0) | | | income > 32600 | | | | income <= 38240: 20000 (7.0/1.0) | | | | income > 38240: 10000 (35.0) | | income > 59580 | | | age <= 23: 10000 (20.0) | | | age > 23 | | | | age <= 66: 20000 (33.0) | | | | age > 66: 10000 (7.0) | income > 70510 | | age <= 37 | | | age <= 19: 10000 (8.0) | | | age > 19: 5000 (21.0) | | age > 37 | | | age <= 58 | | | | house_owner = yes: 50000 (22.0) | | | | house_owner = no | | | | | age <= 39: 50000 (9.0) | | | | | age > 39: 20000 (10.0) | | | age > 58: 20000 (18.0) history = none | house_owner = yes | | income <= 89630 | | | income <= 36540 | | | | income <= 22000: 10000 (11.0/1.0) | | | | income > 22000: 5000 (24.0) | | | income > 36540 | | | | age <= 59: 10000 (55.0/1.0) | | | | age > 59 | | | | | income <= 64280: 5000 (4.0) | | | | | income > 64280: 10000 (7.0) | | income > 89630 COSC757 Team Project Paper Page 69 of 80 Spring 2001
  • 70. | | | age <= 53 | | | | age <= 28: 10000 (4.0) | | | | age > 28: 20000 (19.0) | | | age > 53: 10000 (7.0) | house_owner = no | | income <= 45600: 5000 (52.0/1.0) | | income > 45600 | | | age <= 58 | | | | age <= 46 | | | | | age <= 17: 10000 (9.0) | | | | | age > 17 | | | | | | age <= 38: 5000 (26.0) | | | | | | age > 38 | | | | | | | age <= 41: 10000 (9.0) | | | | | | | age > 41: 5000 (11.0) | | | | age > 46: 10000 (14.0) | | | age > 58: 5000 (19.0) history = bad | house_owner = yes | | income <= 68420 | | | age <= 56 | | | | age <= 20: 0 (21.0) | | | | age > 20: 5000 (19.0) | | | age > 56: 0 (23.0) | | income > 68420 | | | age <= 39 | | | | age <= 34 | | | | | age <= 28 | | | | | | age <= 21 | | | | | | | income <= 95680: 0 (4.0) | | | | | | | income > 95680: 5000 (5.0) | | | | | | age > 21: 0 (4.0) | | | | | age > 28: 5000 (9.0) | | | | age > 34: 10000 (9.0) | | | age > 39: 5000 (37.0) | house_owner = no: 0 (137.0/2.0) Number of Leaves : 37 Size of the tree : 72 === Error on training data === Correctly Classified Instances 805 99.2602 % COSC757 Team Project Paper Page 70 of 80 Spring 2001
  • 71. Incorrectly Classified Instances 6 0.7398 % Mean absolute error 0.0056 Root mean squared error 0.053 Total Number of Instances 811 === Confusion Matrix === a b c d e <-- classified as 31 0 0 0 0 | a = 50000 0 86 1 0 0 | b = 20000 0 0 275 1 2 | c = 10000 0 0 0 226 0 | d = 5000 0 1 1 0 187 | e = 0 === Stratified cross-validation === Correctly Classified Instances 798 98.397 % Incorrectly Classified Instances 13 1.603 % Mean absolute error 0.0092 Root mean squared error 0.08 Total Number of Instances 811 === Confusion Matrix === a b c d e <-- classified as 31 0 0 0 0 | a = 50000 0 86 1 0 0 | b = 20000 0 0 272 4 2 | c = 10000 0 0 4 222 0 | d = 5000 0 1 1 0 187 | e = 0 COSC757 Team Project Paper Page 71 of 80 Spring 2001
  • 72. APPENDIX 6 J48 pruned tree ------------------ history = good | income = income1: recommend3 (43.26) | income = income2: recommend3 (33.53/5.0) | income = income3 | | age = age1: recommend3 (13.22) | | age = age2: recommend4 (8.0/2.0) | | age = age3: recommend4 (7.0) | | age = age4 | | | house_owner = yes: recommend4 (4.0) | | | house_owner = no: recommend3 (4.0) | income = income4 | | age = age1: recommend3 (4.31) | | age = age2 | | | house_owner = yes: recommend5 (4.0) | | | house_owner = no: recommend2 (20.0/5.0) | | age = age3 | | | house_owner = yes: recommend5 (7.0) | | | house_owner = no: recommend4 (4.0) | | age = age4: recommend4 (11.0) history = none | house_owner = yes | | income = income1 | | | age = age1: recommend3 (8.0) | | | age = age2: recommend2 (6.0/1.0) | | | age = age3: recommend2 (5.0) | | | age = age4: recommend2 (6.0) | | income = income2: recommend3 (18.0/2.0) | | income = income3: recommend3 (23.0/1.0) | | income = income4 | | | age = age1: recommend3 (3.0) | | | age = age2: recommend4 (5.0) | | | age = age3: recommend4 (4.0) | | | age = age4: recommend3 (4.0) | house_owner = no | | age = age1 | | | income = income1: recommend2 (8.0) COSC757 Team Project Paper Page 72 of 80 Spring 2001
  • 73. | | | income = income2: recommend2 (7.34/0.34) | | | income = income3: recommend3 (5.0) | | | income = income4: recommend2 (4.0) | | age = age2: recommend2 (29.0/6.0) | | age = age3 | | | income = income1: recommend2 (5.0) | | | income = income2: recommend3 (2.0) | | | income = income3: recommend2 (6.0) | | | income = income4: recommend3 (6.0) | | age = age4: recommend2 (16.0) history = bad | house_owner = yes | | income = income1 | | | age = age1: recommend1 (8.0) | | | age = age2: recommend2 (5.0) | | | age = age3: recommend1 (4.0) | | | age = age4: recommend1 (4.0) | | income = income2 | | | age = age1: recommend1 (7.0) | | | age = age2: recommend2 (3.0) | | | age = age3: recommend2 (4.0) | | | age = age4: recommend1 (4.0) | | income = income3: recommend2 (15.0/1.0) | | income = income4 | | | age = age1: recommend2 (4.0) | | | age = age2: recommend3 (8.0/3.0) | | | age = age3: recommend2 (8.0) | | | age = age4: recommend2 (4.0) | house_owner = no: recommend1 (84.33/2.33) Number of Leaves : 47 Size of the tree : 66 === Error on training data === Correctly Classified Instances 467 94.1532 % Incorrectly Classified Instances 29 5.8468 % Mean absolute error 0.0379 Root mean squared error 0.1368 Total Number of Instances 496 === Confusion Matrix === a b c d e <-- classified as 109 1 5 0 0 | a = recommend1 COSC757 Team Project Paper Page 73 of 80 Spring 2001
  • 74. 0 142 2 1 0 | b = recommend2 3 6 164 0 0 | c = recommend3 0 1 4 41 0 | d = recommend4 0 5 0 1 11 | e = recommend5 === Error on test data === Correctly Classified Instances 247 94.2748 % Incorrectly Classified Instances 15 5.7252 % Mean absolute error 0.0377 Root mean squared error 0.1355 Total Number of Instances 262 Ignored Class Unknown Instances 1 === Confusion Matrix === a b c d e <-- classified as 58 3 1 0 0 | a = recommend1 0 67 1 0 0 | b = recommend2 0 4 82 1 0 | c = recommend3 0 0 2 31 0 | d = recommend4 0 3 0 0 9 | e = recommend5 COSC757 Team Project Paper Page 74 of 80 Spring 2001
  • 75. APPENDIX 7 J48 pruned tree ------------------ history = good | income = income1: recommend3 (65.24) | income = income2 | | age = age1: recommend3 (18.55) | | age = age2 | | | house_owner = yes: recommend4 (6.0) | | | house_owner = no: recommend3 (6.0) | | age = age3: recommend3 (15.0/1.0) | | age = age4: recommend3 (12.0) | income = income3 | | age = age1: recommend3 (19.56) | | age = age2: recommend4 (14.0/2.0) | | age = age3: recommend4 (14.0) | | age = age4 | | | house_owner = yes: recommend4 (8.0) | | | house_owner = no: recommend3 (7.0) | income = income4 | | age = age1: recommend3 (8.31) | | age = age2 | | | house_owner = yes: recommend5 (8.0) | | | house_owner = no: recommend2 (28.0/8.0) | | age = age3 | | | house_owner = yes: recommend5 (13.0) | | | house_owner = no: recommend4 (10.0) | | age = age4: recommend4 (17.0) history = none | house_owner = yes | | income = income1 | | | age = age1: recommend3 (10.0) | | | age = age2: recommend2 (7.0/1.0) | | | age = age3: recommend2 (8.0) | | | age = age4: recommend2 (10.0) | | income = income2 | | | age = age1: recommend3 (11.0) | | | age = age2: recommend3 (6.0) COSC757 Team Project Paper Page 75 of 80 Spring 2001
  • 76. | | | age = age3: recommend3 (7.0) | | | age = age4: recommend2 (4.0) | | income = income3: recommend3 (38.34/1.0) | | income = income4 | | | age = age1: recommend3 (3.1) | | | age = age2: recommend4 (9.31/0.31) | | | age = age3: recommend4 (10.34/0.34) | | | age = age4: recommend3 (7.24) | house_owner = no | | income = income1: recommend2 (35.0/1.0) | | income = income2 | | | age = age1: recommend2 (11.34/0.34) | | | age = age2: recommend2 (6.0) | | | age = age3: recommend3 (4.0) | | | age = age4: recommend2 (5.0) | | income = income3 | | | age = age1: recommend3 (9.0) | | | age = age2: recommend2 (9.0) | | | age = age3: recommend2 (11.0) | | | age = age4: recommend2 (7.0) | | income = income4 | | | age = age1: recommend2 (5.0) | | | age = age2: recommend2 (21.0/9.0) | | | age = age3: recommend3 (10.0) | | | age = age4: recommend2 (7.0) history = bad | house_owner = yes | | income = income1 | | | age = age1: recommend1 (10.0) | | | age = age2: recommend2 (6.0) | | | age = age3: recommend1 (7.0) | | | age = age4: recommend1 (9.0) | | income = income2 | | | age = age1: recommend1 (11.0) | | | age = age2: recommend2 (6.0) | | | age = age3: recommend2 (7.0) | | | age = age4: recommend1 (7.0) | | income = income3 | | | age = age1: recommend1 (4.33/0.33) | | | age = age2: recommend2 (9.0) COSC757 Team Project Paper Page 76 of 80 Spring 2001
  • 77. | | | age = age3: recommend2 (10.0) | | | age = age4: recommend2 (7.0) | | income = income4 | | | age = age1: recommend2 (5.0) | | | age = age2: recommend3 (13.0/4.0) | | | age = age3: recommend2 (13.0) | | | age = age4: recommend2 (7.0) | house_owner = no: recommend1 (137.33/2.33) Number of Leaves : 60 Size of the tree : 84 === Error on training data === Correctly Classified Instances 780 96.2963 % Incorrectly Classified Instances 30 3.7037 % Mean absolute error 0.0224 Root mean squared error 0.1042 Total Number of Instances 810 Ignored Class Unknown Instances 1 === Confusion Matrix === a b c d e <-- classified as 183 0 6 0 0 | a = recommend1 0 225 0 1 0 | b = recommend2 2 10 265 1 0 | c = recommend3 0 1 0 86 0 | d = recommend4 0 8 0 1 21 | e = recommend5 === Stratified cross-validation === Correctly Classified Instances 764 94.321 % Incorrectly Classified Instances 46 5.679 % Mean absolute error 0.0321 Root mean squared error 0.1376 Total Number of Instances 810 Ignored Class Unknown Instances 1 === Confusion Matrix === a b c d e <-- classified as 179 4 6 0 0 | a = recommend1 0 221 4 1 0 | b = recommend2 2 14 261 1 0 | c = recommend3 0 1 4 82 0 | d = recommend4 0 8 0 1 21 | e = recommend5 COSC757 Team Project Paper Page 77 of 80 Spring 2001
  • 78. APPENDIX 8 Data Mining Project Proposal for Getting a Loan Approval Bohui Qi, Yi Yu, Jianping Du, Lanlan Wang, Ying Zhang, Jin Guo Department of Computer Science Towson University Project Objective The objective of this project is to use data mining tools to address what factors are and how they affect getting the approval of a person’s application of a certain amount of a loan. Details listed as below: • Understanding of how a machine-learning package works and what it does. • Knowing how to choose a best-suited data mining method from different kinds for a certain application. • Through this project, we should understand data mining is actually a knowledge discovery. Data mining methodology extracts hidden predictive information from various enormous databases. • Learning how to prepare data for our mining tool, and translate the input data in its required format. • Understanding and analyzing the observation of new knowledge mined from the application. • Learning the mining technique used in the tool. Project Description By using data mining tools, we address what factors are and how they affect getting the approval of a person’s application of a certain amount of a loan. We organize the database, then we evaluate several data mining tools and choose one that is suitable to mine this application, our project. Finally, after observing and analyzing the new knowledge, we will validate the findings. COSC757 Team Project Paper Page 78 of 80 Spring 2001
  • 79. Project Implementation Object Identification The goal of this project is to display patterns of the amount of loan approval in different groups in age, income, credit history, and home ownership, etc. We will identify the critical factors and to what extent they affect the amount of the loan approved. Data Selection, Preparation and Audition We will generate a set of data with the above attributes. These data sets will be prepared in an excel spreadsheet and in word ARFF format. For age group, we will set four intervals respectively: <20, 20-40, 40-60, >60 years old; For income, the intervals are: <$30,000, $30-60,000, $60-90,000, >$90,000; For credit history, the categories are: good, bad, and none; House owned categories are yes and no; Loan approved intervals are: $0K, $5K, $10K, $20K, and $50K. We will evaluate the nature and the structure of the database in order to determine the appropriate tools. Tools Selection Based on objectives and the data structure, we will select an appropriate data mining tool. For this project, we will use Weka data mining tool sets. Solution Formation The format of the solution is determined by the data audit, the business objective and the mining tool. In this project, the report will consist of the amount of loan approved as a function of different intervals in each of the 4 categories. Expect Output Through analyzing this project, we will get the following association rules: • If a person’s credit history is bad and he/she is not a house-owner, the application will be denied. • If a person has no credit history before, a loan of $5K for the first time application will be granted. • If a person’s credit history is bad, but he/she is a house-owner and his/her annual income is more than $90K, or his/her annual income is more than $60K and his/her age is between 40 and 60, a loan of $5K will be approved. • Different amount of loans will be approved based on different annual income and the age, etc. We also expect to get the classification rules and decision trees. Model Construction COSC757 Team Project Paper Page 79 of 80 Spring 2001
  • 80. We will use a training set and a test set of data to do the mining test. Based on the test results, we will construct and evaluate a model. This stage will help the generation of classification rules, decision trees, clustering sub-groups, scores, and evaluation data/error rates. We will conclude the roles of different intervals in each category in the amount of loan approval, which interval in each category has the highest approved loan amount, and the most critical factor in all categories for the highest amount of loan approval. Findings Validation and Delivering We will discuss the results of the analysis with some experts to ensure that the findings are correct and appropriate for the business objectives. Then a final report is delivered with documentations of the entire data mining process including data preparation, tools used, mining techniques used in the tools and the detailed information, test results, visualizing techniques used and its detailed information, source code and rules. Operating System Environment Windows 95, 98 and Windows 2000 Project Schedule We will hold weekly team meeting regularly. Our schedule is as following: 02/01/01 --- 02/15/01 Search information and chose project topic 02/16/01 --- 02/25/01 Choose data mining tools, prepare data and proposal 02/26/01 --- 03/20/01 Fully understanding the tool and implement the application 03/21/01 --- 04/01/01 Test the output data 04/02/01 --- 04/14/01 Analyze the result 04/15/01 --- 04/30/01 Prepare the project report 05/01/01 --- 05/07/01 Prepare presentation materials COSC757 Team Project Paper Page 80 of 80 Spring 2001