BI Apps Data Mining- SQL Server Analysis Services 2008

1,309 views

Published on

This is not the full copy of the document. The original document pages has to be reduced to make it easier to upload

1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
1,309
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
8
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

BI Apps Data Mining- SQL Server Analysis Services 2008

  1. 1. DATA MINING WITHMICROSOFT SQL SERVER ANALYTICAL SERVICES By SUNNY OKORO
  2. 2. ContentsIntroduction to SSAS Archiecture ................................................................................................................. 2Entity Relationship Diagram ....................................................................................................................... 11Description .................................................................................................................................................. 11Decision Tree Analysis................................................................................................................................. 12Business Case .............................................................................................................................................. 12Neural Network Analysis............................................................................................................................. 40Business Case .............................................................................................................................................. 40Logistic Regression Analysis ........................................................................................................................ 52Business Case .............................................................................................................................................. 52Reference .................................................................................................................................................... 68 1
  3. 3. Introduction to SSAS Archiecture Microsoft SQL Server Analysis Services(SSAS) is one of the components that makes upthe Microsoft Business Intelligence Suit which includes Microsoft SQL Server ReportingServices (SSRS) and Microsoft SQL Server Intergration Services(SSIS). SSAS can bedesigned,depolyed and browsed using Microsoft Business Intelligence DevelopmentStudios(BIDS. SSAS can also be integrated with other Microsoft applications like Excel andVisio to create mining related projects. For this project BIDS would be utitlized fordesign,deployment and browsing. Microsoft Excel would be utitlized to demonstrate miningexecrise on the last mining exercise.Applications 1. Microsoft SQL Server 2008R2 2. Microsoft Business Intelligence Design Studio 3. Microsoft Excel 4. Microsoft Analysis Server 5. Microsoft Data Mining for Excel(Add-On)Datasets 1. Adventure Works DataWarehouseData Mining 1. Cube 2. Dimensions 3. Mining Structure Designing Microsoft SSAS Project 2
  4. 4. Figure 1 Microsoft BIDS1. Click SQL Server Business Intelligence Development Studio icon to open BIDS2. Click File New and select Project as illustrated in figure 1 to open New project Dialog box illustrated in figure 2 below.3. Select Analysis Services Project and entre the file name along with is folder path. Click Ok to return back to BIDS Figure 2 3
  5. 5. Figure 3Data Source – contains the data source location. Make sure all services relating to the application ordatabase is started before connecting to a particular database or applicationData Source- contains a graphical representation or ERD of the data from the data source.Cubes- 3 dimensional view of dataDimensions –Mining structure – Mining models like decision tree created upon existing cube or database to constructdata mining 4. Click on the data source to add the data source connection and click next to enter the credentials needed by SSAS to access the data source as illustrated in Figure 4 5. Click on data source view to add new data source view containing objects like table that would be used for mining as illustrated in Figure 5 6. Click on Cube to create a new cube based on existing tables from the data source using the cube wizard which creates new dimensions. The designprocess has been captured in Figure 6. The cube is created to make the data mining processing faster instead of getting the data sets from the database. 4
  6. 6. 7. Once the Cube has been created , it needs to be processed as illustrated in Figure 78. For this mining project, Icreated thedimensions relating to Product, Customer, Geography, Sales Territory, Time and Currency and applied thosedimension to my cube which I created later.9. To create dimensions, click on dimension to open the dimension wizard as illustrated in Figure 8.10. To create mining structure, click on mining structure to open the mining structure wizard as illustrated in Figure 10 5
  7. 7. Figure 4 SSAS Data Source 6 Figure 5 SSAS Data Source View
  8. 8. Cube DesignFigure 6 Cube Design Process 7
  9. 9. Figure 7 Cube ProcessingDimension Creation 8
  10. 10. Figure 8 Dimension Process 9
  11. 11. DATA MININIG ACTIVITIES 10
  12. 12. Entity Relationship Diagram Figure 9 Data Source ViewDescriptionThe data warehouse schema of Adventure Works Outdoor Company. For the mining exercise only Sales,DimProduct, DimCustomer, DimSalesTeritorry ,DimGeography and DimTime dimensional and facttables would be utilized for mining activities 11
  13. 13. Decision Tree AnalysisBusiness CaseManagers from various sales regions at Adventure Works Outdoor Company want to view thetotal of amount spend from the sale data warehouse base on demographics of customers whichare Gender, Marital, Educational and Occupational backgrounds using decision tree.Demographics data are collected about the customer each time they register their profile online.Other information collected during the registration process includes Yearly Income and Numberof Children. The goal of this mining activity is to determine the amount of each demographyspends based on the sales data in the data warehouse to aid decision makers in determiningwhich promotions to create for each demography. 12
  14. 14. Figure 10SSAS ArchitectureFigure 11 SSAS Data Mining Wizard-Definition Method 13
  15. 15. Figure 12 SSAS Data Mining Structure-Mining Model 14
  16. 16. Figure 13 SSAS Data Mining Wizard -Cube Dimension 15
  17. 17. Figure 14 SSAS Mining Wizard –Case Key 16
  18. 18. Figure 15 SSAS Data Mining Wizard-Attributes and Measure Selection 17
  19. 19. Figure 16 SSAS Data Mining Wizard- Input and Prediction Column Usage 18
  20. 20. Figure 17 Data Mining Wizard –Content and Data Type Figure 18 Data Mining Wizard-Testing Set Design 19
  21. 21. Figure 19 Data Mining Model Processing- Dim Customer1.dmn Figure 20 Data Mining Structure – Dim Customer1.dmn 20
  22. 22. Figure 21 SSAS Data Mining Structure-Dim Customer1.dmn Display Figure 22 SSAS Data Mining model- Dim Customer1.dmn 21
  23. 23. DECISION TREE ANALYSIS EDUCATIONAL LEVELTree ALL Figure 23Branch 1: Total Amount >= 10285.300 Figure 24 22
  24. 24. Branch 2: Total Amount < 1471.90 Figure 25Branch 2- A: Total Amount >= 296.760 Figure 26 23
  25. 25. Branch 2-B: Total Amount <296.760 Figure 27Branch 3: Total Amount between >= 1471.900 and <10285.300 Figure 28 24
  26. 26. GENDER LEVELTree All Figure 29 MARITAL STATUSTree ALL 25
  27. 27. Figure 30Branch 1: Total Amount >=10285.300 Figure 31 26
  28. 28. Branch 2: Total Amount < 1471.900 Figure 32 27
  29. 29. Branch 2-A: Total Amount >=590.560 Figure 33Branch 2-B: Total Amount <590.560 Figure 34 28
  30. 30. Branch 3: Total Amount Between >1471.900 And<10285.300 Figure 35 OCCUPTIONATIONAL LEVELTree ALL 29
  31. 31. Figure 36 30
  32. 32. Branch 1: Total Amount >=4409.700 Figure 37Branch 1-A: Total Amount < 7494.390 31
  33. 33. Figure 38Branch 1-B: Total Amount >= 7494.390 Figure 39Branch 2: Total Amount Between >= 1471.900 And< 2940.800 32
  34. 34. Figure 40Branch 2-A: Total Amount Between >=2353.240 And <2647.020 Figure 41 33
  35. 35. Branch 2-B: Total Amount <2353.240 OR >2646.020 Figure 42Branch 3: Total Amount Between >=2940.800 And<4409.700 Figure 43 34
  36. 36. Branch 3-A: Total Amount <3381.470 OR >=415.920 Figure 44 35
  37. 37. Branch 3-A-1: Total Amount >=33811.470 And< 4262.810 Figure 45Branch 3-A-2:Total Amount <3381.470 OR >4262.810 Figure 46 36
  38. 38. Branch 3-B: Total Amount > 381.470 and <4115.920 Figure 47Branch 4: Total Amount <1471.900 Figure 48 37
  39. 39. Branch 4-A: Total Amount >=737.450 Figure 49Branch 4-B: Total Amount >=737.450 Figure 50 38
  40. 40. ANALYSISThe mining models for various decision tresses revealed interesting pictures of the demographicsof the customers in the data warehouse and their spending behaviors. On the Gender level, Malecustomers outspend female customers by a small margin 50% to 49% as illustrated on Figure 7on Decision Trees Analysis Document. Based on marital status married customers outspendsingle customers 56% to 43% and in every branch of the decision tree models with expectationof branch 2-A where the margin remained close 50% to 49% as illustrated in Figure 11onDecision Tree Analysis Document.On the occupational level, professional and skilled manualpositions represented the majority of the population with 2835(30%) and 2344(24%). Howeverbreakdown of the decision tree models revealed different dynamics when the populations aresliced intodifferent nodes and the lead once held byprofessional and skilled manual 39
  41. 41. positionsdecreases slightly or diminishes as illustrated in branch 3 and corresponding nodes.Thesame lesson holds truth for mining based on educational levels. Bachelor degree holders andcustomers with partial college experience represented the majority of the population with 29%and 27% . Neural Network AnalysisBusiness CaseManagers at Adventure Works Outdoor Companywant to gain better understandings of the salaryrange of each occupation based on the educational levels collected from the customers likepartial college, bachelor, graduate and high school diplomas. The educational demographyincludes partial. With the information gained from the mining activity, they would be able todetermine which credits to offer to a customer based on their educational and occupationalbackground. 40
  42. 42. Figure 51Data Mining Wizard-Microsoft Neural NetworkFigure 52 SSAS Data Mining Wizard- Microsoft Neural Network Cube Dimension Selection 41
  43. 43. Figure 53 SSAS Data Mining Wizard- Microsoft Neural Network Attribute and Measure Selection Figure 54 SSAS Data Mining Wizard – Microsoft Neural Network Column usage selection 42
  44. 44. Figure 55SSAS Data Mining Wizard- Microsoft Neural Network Test Set CreationPercentage of data for testing has to be set because SSAS would throw numerous errors if thepercentage is above 50%. This done to achieve a good result with the mining model Figure 56 Data Mining Model Processing-Dim Customer4.dmn 43
  45. 45. Figure 57-Dim Customer 4dmn Mining ModelThe gender and Marital status attributes has been set to ignore to make the model easier to read andunderstand. In this section I would try to compare the income levels of customers based on theireducational levels Bachelor, Graduate and High School Diploma or Degree Salary Range of Occupations based on Educational Levels of Customers Overview Figure 58 Overview of the Model 44
  46. 46. Bachelors Degree Salary Range of Occupations Figure 59-Bachelor Degree Salary Range- Model 1Salary Range:10 ,000.000($10,000) - 35,541.537($35,541.54)Salary Range: 35,541.537($35,541.54)- 57321817($57,321.82) Figure 60 Bachelor Degree Salary Range- Model 2Salary Range Value 1:35, 726.250($35,726.25) – 57,637.887($57,637.89)Salary Range Value 2:57,637.887($57,637.89) – 79,549.525($79,549.53) 45
  47. 47. Figure 61 Bachelor Degree Salary Range- Model 3Salary Range Value 1 57,637.887($57,637.89) – 79,549.525($79,549.53)Salary Range Value 2 79,549.525($35,726.25)-155,096.614($155,096.61) Graduate Degree Salary Range of Occupations Figure 62Graduate Degree Salary Range- Model 1Salary Range: 10,000.000($10,000)-35,726.250($35,726.25)Salary Range: 35,726.250($35,726.25)-57,637.887($57,637.89) 46
  48. 48. Figure 63Graduate Degree Salary Range-Model 2Salary Range Value 1 35,726.250($35,726.25)-57,637.887($57,637.89)Salary Range Value 2 57,637.887($57,637.89)-79,549.525($79,549.53) Figure 64 Graduate Degree Salary Range-Model 3Salary Range Value 1 57,637.887($57,637.89) – 79,549.525($79,549.53)Salary Range Value 2 79,549.525($79,549.53)- 155,096.614($155,096.61) High School Diploma SalaryRange of Occupations 47
  49. 49. Figure 65High School Diploma Salary Range-Model 1Salary Range: 10,000.000($10,000)-35,726.250($35,726.50)Salary Range: 35,726.250($35,726.25)-57,637.887($57,637.89) Figure 66- High School Diploma Salary Range-Model 2Salary Range Value 1 35,726.250($35,726.25)-57,637.887($57,637.89)Salary Range Value 2 57,637.887($57,637.89)-79,549.525($79,549.53) Figure 67 High School Diploma Salary Range-Model 3 48
  50. 50. Salary Range Value 1 57,637.887($57,637.89) – 79,549.525($79,549.53)Salary Range Value 2 79,549.525($79,549.53)- 155,096.614($155,096.61) Partial College Salary Rnage of Occupations Figure 68 Partial College Salary Range of Occuption-Model1Salary Range: 10,000.000($10,000)-35,726.250($35,726.25)Salary Range: 35,726.250($35,726.25)-57,637.887($57,637.89) Figure 69 Partial College Salary Range of Occupation-Model 2Salary Range Value 1 35,726.250($35,766.25)-57,637.887($57,637.89)Salary Range Value 2 57,637.887($57,637.88)-79,549.525($79,549.53) 49
  51. 51. Figure 70 Partial College Salary Range of Occupation-Model 3Salary Range Value 1 57,637.887($57,637.89) – 79,549.525($79,549.53)Salary Range Value 2 79,549.525($79,549.53)- 155,096.614($155,096.61) AnalysisThe income level of the occupations varies based on the educational background and the career.Clerical and manual labor related positions for example are the careers with average salaryrange between $10,000 and $35,000 for customers with bachelor, graduate and high schooldiplomas and partial college experiences as illustrated in data mining model 1 of eacheducational background. Only skilled manual related careers have an income average between$10,000 and $35,000 for customers with high school diplomas. A closer extermination of eachmining models based on educational levels indicates discrepancies between occupations basedon the population used to create that specific mining model. For example the average salary formanagement position in model 2 for bachelor degree holders is between $57,637.89 and79,549.53 but in model 3 the average salary range is between $79,549.53 and $155,096.61.Based on the mining evidence, the state of each of mining models would change based on 50
  52. 52. population of the customer records that are added to the data warehouse. The mining modelwould partially satisfy the business case considering that a customer with college degree orcollege experience tends to earn more money. However additional criteria like payment historycan be used to qualify or disqualify customers from receiving a special coupon. 51
  53. 53. Logistic Regression AnalysisBusiness CaseManagers at Adventure Works Outdoor Company want to gain an understanding of the totalamount spend by customers of a particular product across various Sales Territory Countrieswhich includes France, United Kingdom, Canada, Germany, United States of America andAustralia by constricting sales from different fiscal year (2002-2005). 52
  54. 54. Figure 71 SSAS Data Mining Wizard- Regression AnalysisFigure 72 SSAS Data Mining Wizard - Regression Analysis Cube Dimension Selection 53
  55. 55. Figure 73 SSAS Data Mining Wizard- Regression Analysis Case Key SelectionFigure 74 SSAS Data Mining Wizard- Regression Analysis Column Usage selection 54
  56. 56. Figure 75 SSAS Data Mining Wizard- Regression Analysis Data Type Set up Figure 76 SSAS Data Mining Wizard- Regression Analysis- Testing Setup 55
  57. 57. Figure 77 Sales2 dmn mining model EXCEL AND DATA MINING Figure 78Excel ApplicationTo successfully use Excel as a data mining application install Microsoft SQL Server 2008 Data MiningAdd-ins. 1. Click Project Icon to set up the configurations which would open the Analysis Services Connection Wizard displayed in Figure 55 Make sure toStart Services relating to SQL Server & SSAS 2. Click New to enter the credentials needed to access SSAS in the Connect to Analysis Services displayed in Figure 56 3. Click Manage Models and select the structures and Models applicable as Figure 57. Process the model 4. Click Browse and select the model and Click Next 56
  58. 58. 5. Select Attribute filter to filter outputs and copy the data to excel as illustrated in figure 58 Figure 79 Excel SSAS Connection Configuration 57
  59. 59. Figure 80 SSAS Models 58
  60. 60. Figure 81 SSAS Model Browse 59
  61. 61. Figure 82 60
  62. 62. Snapshot of UK Sales2-UK(2002-2003) Fiscal Year 61
  63. 63. Snapshot of UK Sales2-UK(2003-2004) Fiscal Year 62
  64. 64. Snapshot of UK Sales2-UK(2004-2005) Fiscal Year 63
  65. 65. Snapshot of USA Sales2-US(2002-2003) Fiscal Year 64
  66. 66. Snapshot of USA Sales2-US(2003-2004) Fiscal Year 65
  67. 67. Snapshot of USA Sales2-US(2004-2005) Fiscal YearEach graph bar contains numeric values associated with the fiscal year of each product 66
  68. 68. AnalysisThe mining model satisfies the business case because each product sales are broken down based onsales territories across the fiscal years from 2002 to 2005. For example Road-150 Red, 44 product saleswere at $100 in both Canadian and Australian sales territories. Having these mining models allowsmanagers throughout the various sales territories to compare sales prices based on fiscal year. 67
  69. 69. ReferenceCameron, S (2009). Microsoft SQL Server 2008.Analysis Services Step by Step. Retrieved fromhttp://proquestcombo.safaribooksonline.com.ezproxy.umuc.edu/book/databases/microsoft-sql-server/9780735626201?bookview=overviewBen-gan, I (2008).Microsoft SQL Server 2008 T-SQL Fundamentals. Redmond, WA: MicrosoftPress.Nielsen,P , Parui, U & White, M(2009) Microsoft SQL Server 2008 Bible. Indianapolis, IN:Wiley Publishing, Inc.Fouché, P(2010). Pro SQL Server 2008 Analysis Services. Retrieved fromhttp://proquestcombo.safaribooksonline.com.ezproxy.umuc.edu/book/databases/microsoft-sql-server/9781430219958?bookview=overviewLangit,L , Goff, K, Mauri, D,Malik, S &Welch,J(2008). Smart Business Intelligence Solutionswith Microsoft SQL Server 2008.Retrieved fromhttp://proquestcombo.safaribooksonline.com.ezproxy.umuc.edu/book/databases/microsoft-sql-server/9780735625808Vitt, E, Luckevich, M &Misner,S (2008).Business Intelligence.Retrieved fromhttp://proquestcombo.safaribooksonline.com.ezproxy.umuc.edu/book/databases/business-intelligence/9780735626607 68

×