DESIGN AND IMPLEMENTATION OF DATA ANALYSIS COMPONENTS
Upcoming SlideShare
Loading in...5
×
 

DESIGN AND IMPLEMENTATION OF DATA ANALYSIS COMPONENTS

on

  • 3,072 views

 

Statistics

Views

Total Views
3,072
Views on SlideShare
3,072
Embed Views
0

Actions

Likes
0
Downloads
138
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    DESIGN AND IMPLEMENTATION OF DATA ANALYSIS COMPONENTS DESIGN AND IMPLEMENTATION OF DATA ANALYSIS COMPONENTS Document Transcript

    • DESIGN AND IMPLEMENTATION OF DATA ANALYSIS COMPONENTS A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for the Degree Master of Science Grace C. Shiao May, 2006
    • DESIGN AND IMPLEMENTATION OF DATA ANALYSIS COMPONENTS Grace C. Shiao Thesis Approved: Accepted: _________________________________ ____________________________________ Advisor Dean of the College Dr. Chien-Chung Chan Dr. Ronald F. Levant _________________________________ ____________________________________ Committee Member Dean of the Graduate School Dr. Xuan-Hien Dang Dr. George R. Newkome _________________________________ ____________________________________ Committee Member Date Dr. Zhong-Hui Duan _________________________________ Department Chair Dr. Wolfgang Pelz ii
    • ABSTRACT This thesis describes the design and implementation of the data analysis components. Many features of modern database systems facilitate the decision-making process. Recently, Online Analytical Processing (OLAP) and data mining are increasingly being used in a wide range of applications. OLAP allows users to analyze data from a wide variety of viewpoints. Data mining is the process of selecting, exploring, and modeling large amounts of data to discover previously unknown patterns for business advantage. Microsoft® SQL server™ 2000 Analysis Services provides a rich set of tools to create and to maintain OLAP and data mining objects. In order to use these tools, users need to fully understand the underlying architectures and the specialized technological terms, which are not related to the data analysis. The complexities in the development challenges prevent the data analysts to use these tools effectively. In this work, we developed several components, which can be used as the foundation in the analytical applications. Using these components in the software applications can hide the technical complexities and can provide tools to build the OLAP and mining model and to access data information from these model systems. Developers can also reuse these components without coding from scratch. The reusability of these components enhances the application’s reliability and reduces the development costs and time. iii
    • DEDICATION Dedicated to my late parents Mr. and Mrs. K. C. Chang Who taught me the value of Education And Opened my eyes to the Power of Knowledge iv
    • ACKNOWLEDGEMENTS First of all, I want to thank my adviser Dr. Chien-Chung Chan for his guidance and support throughout my graduate research. His feedback helped to strengthen my research skills and contributed greatly to this thesis. I want to thank my thesis committee members, Dr. Xuan-Hien Dang and Dr Zhong-Hui Duan, for their guidance and encouragement. In addition, I want to thank the faculty members of the Department of Computer Science for building the foundation of my computer knowledge. I also want to thank my late parents and wish they would have been able to see this finished manuscript. I appreciate both of them for their love, support and encouragement in my life. I thank my husband S. Y. for his love and support through these years, and to my daughter Ming-Hao and my son Ming-Jay for their love, humor, and understanding. Lastly, I thank the Mighty God for all His grace and blessing in my life. v
    • TABLE OF CONTENTS Page LIST OF TABLES............................................................................................................. ix LIST OF FIGURES ............................................................................................................ x CHAPTER I. INTRODUCTION .......................................................................................................... 1 1.1 What is Online Analytical Processing (OLAP)? .................................................... 2 1.2 Data Mining ............................................................................................................ 3 1.3 Statement of the Problem........................................................................................ 3 1.4 Motivations and Contributions ............................................................................... 3 1.5 Organization of the Thesis ...................................................................................... 5 II. MICROSOFT SQL SERVER 2000 ANALYSIS SERVICES ..................................... 7 2.1 Overview................................................................................................................. 7 2.2 Architecture............................................................................................................. 7 2.2.1 Server Architecture .......................................................................................... 7 2.2.2 Client Architecture........................................................................................... 9 2.3 OLAP Cube............................................................................................................. 9 2.4 Analysis Manager ................................................................................................. 11 2.4.1 Creating the Basic Cube Model ..................................................................... 12 2.4.2 Browsing a Cube............................................................................................ 23 2.4.3 Building the Data Mining Models ................................................................. 24 vi
    • III. DESIGN OF DATA ANALYSIS COMPONENTS................................................ 32 3.1 Component-Based Development .......................................................................... 33 3.2 What Is a Component?.......................................................................................... 33 3.3 The cubeBuilder Component ................................................................................ 34 3.4 The cubeBrowser Component............................................................................... 37 3.4.1 Browsing OLAP objects ................................................................................ 38 3.4.1.1 Retrieving Information of Cube Schema ............................................. 39 3.4.1.2 Analytical Querying of Cube Data ...................................................... 41 3.5 The DMBuilder Component ................................................................................. 43 3.6 Conclusions........................................................................................................... 47 IV. CASE STUDIES AND RESULTS............................................................................ 48 4.1 A Case Study of the Heart Disease Datasets ........................................................ 48 4.1.1 Heart Disease Sample File ............................................................................. 49 4.1.2 Software Implementation............................................................................... 49 4.2 Implementation of the cubeBuilder Component................................................... 50 4.2.1 Creating a New Cube ..................................................................................... 51 4.2.2 The Fact Table and Measures Selections....................................................... 52 4.2.3 Adding Dimensions to the Cube .................................................................... 52 4.2.4 Processing and Building the New Cube......................................................... 53 4.2.5 The Results..................................................................................................... 54 4.3 Implementation of the cubeBrowser Component ................................................. 56 vii
    • 4.3.1 Connection to the Analysis Server................................................................. 56 4.3.2 Retrieving the Cardio Cube Data................................................................... 57 4.3.3 Displaying the Cardio Cube Data .................................................................. 59 4.3.4 Drill-down and Drill-up Capacities ............................................................... 60 4.4 Implementation of the DMBuilder component ..................................................... 62 V. DISCUSSIONS AND FUTURE WORKS ................................................................. 67 5.1 Contributions and Evaluations.............................................................................. 67 5.2 Future Works ........................................................................................................ 70 BIBLIOGRAPHY............................................................................................................. 71 APPENDICES .................................................................................................................. 73 APPENDIX A. DATASET USED FOR CASE STUDIES......................................... 74 APPENDIX B. APPLICATION INTERFACE OF OLAP CUBE BUILDER ........... 76 APPENDIX C. SOURCE CODE OF CUBEBUILDER ............................................. 77 APPENDIX D. SOURCE CODE OF CUBEBROWSER ........................................... 84 APPENDIX E. SOURCE CODE OF DMBUILDER.................................................. 88 viii
    • LIST OF TABLES Table Page 2.1 Storage options supported by Analysis Services ..................................................... 19 2.2 Summary of cube process options ........................................................................... 22 3.1 Values of the connection string ............................................................................... 41 3.2 Listings of properties required for OLAP mining model objects ............................ 46 ix
    • LIST OF FIGURES Figure Page 2.1 Analysis Services architecture ................................................................................ 8 2.2 The star and snowflake schemas........................................................................... 10 2.3 Screenshot of the Analysis Manager .................................................................... 11 2.4 Screenshot of the database dialog box of Cube Wizard ....................................... 13 2.5 Screenshot of the Provider for the Data Link dialog box ..................................... 13 2.6 Screenshot of the Connection tab of the Data Link dialog box ............................ 14 2.7 Screenshot of the "Select a fact table" dialog box with a selected fact table........ 15 2.8 Screenshot of the "Defining measures" dialog box. ............................................. 15 2.9 Screenshot of the Dimension Wizard ................................................................... 16 2.10 Screenshot of the "Select Dimension Table" dialog box ...................................... 17 2.11 Screenshot of the "Select levels" dialog box ........................................................ 17 2.12 Screenshot of the "Dimension Finish" dialog box................................................ 18 2.13 Screenshot of the "Storage Design Wizard" for selecting of storage options …...19 2.14 Screenshot of the "Set aggregation options" dialog box....................................... 20 2.15 Screenshot of the "Process" window .................................................................... 21 2.16 Screenshot of the "Process a cube" dialog box..................................................... 22 2.17 Screenshot of the "Cube Browser" and sample results......................................... 23 2.18 Screenshot of "Select source type" dialog box ..................................................... 25 x
    • 2.19 Screenshot of "Select source cube" window......................................................... 26 2.20 Screenshot of the selecting mining model technique ........................................... 26 2.21 Screenshot of the "Select case" dialog box for specifying a case of analysis....... 27 2.22 Screenshot of the "Select predicted entity" window............................................. 28 2.23 Screenshot of the "Select training data" window.................................................. 29 2.24 Screenshot of the "Saving the data model" of the Mining Model Wizard........... 30 2.25 Screenshot of the "Model execution diagnostics" window................................... 30 2.26 Screenshot of the content details of a created mining model ............................... 31 3.1 Architecture of the component cubeBuilder......................................................... 35 3.2 Relationship of cubeBrowser to the Analysis Server ........................................... 38 3.3 The basic workflow of browsing OLAP cube data using cubeBrowser............... 40 3.4 The architecture and logic relations of DMBuilder with DSO ............................ 44 3.5 Flow Logic of the DMBuilder Component …...................................................... 45 4.1 Relationship of the heart disease test data ............................................................ 49 4.2 Screenshot of the cardio cube builder interface.................................................... 50 4.3 Screenshot of the “Data Source/Cube” section..................................................... 51 4.4 Screenshot of sample entries for both sections of "Data Source/Cube" and "Specify Fact/Measures"………………………...............……………………….51 4.5 Screenshot of sample entries of “Specify Fact/Measure” section ........................ 52 4.6 Screenshot of the “Add Dimensions to Cube” section ......................................... 53 4.7 Screenshot of sample entries for cube dimension................................................. 53 4.8 Screenshot of the “Process/Build Cube” section .................................................. 54 xi
    • 4.9 Screenshot of the cardio test database object before building the new cardio cube……….…………………………………………………………...….55 4.10 Screenshot of the cardio test database object after building the sample "cube1"………………………………………………………………………...…55 4.11 Screenshot of the web form BrowseCube.aspx .................................................... 56 4.12 Screenshot of listing of available cube ................................................................. 57 4.13 Screenshot of specifying cube entry and measures………….. ............................ 57 4.14 Screenshot of selections of measures and the pre-defined view options.............. 58 4.15 Screenshot of selections of location for Pain-Type option ................................... 59 4.16 Screenshot of selections of pain-type for Patient option ...................................... 59 4.17 Results of cube data for Pain-Type option with test country................................ 59 4.18 Results of cube data for the angina chest pains per patient test city..................... 60 4.19 Screenshot of drill-down to the test center level of Patient option....................... 61 4.20 Screenshot of drill-up to the country’s level of Patient option ............................. 61 4.21 Screenshot of the main interface DMMBuilder .................................................... 62 4.22 Screenshot of the “Server/Database” section........................................................ 63 4.23 Screenshot of Mining model setup ....................................................................... 63 4.24 Screenshot of setting the mining model role ........................................................ 64 4.25 Screenshot of setting properties and algorithm for the mining model.................. 64 4.26 Screenshot of setting the attributes of analytical column ..................................... 65 4.27 Screenshot of the cardio mining model using Microsoft Decision Trees Algorithm…………………………...……………………………………………66 B.1 Screenshot of the OLAP cube builder interface for the power users .................... 76 xii
    • CHAPTER I INTRODUCTION Data are not only valuable assets, but also the strategic resources in today’s competitive environment. Organizations around the world are accumulating vast and growing amounts of data in different database formats. Business companies need to understand the effectiveness of their marketing efforts and quickly maintain the large volumes of data created each day. These challenges require a well-defined database system that can bring together disparate data with different dimensionality and granularity. Making the data meaningful is no small task, especially given the different aspects of data analysis. Companies need quality analysis of operational information to understand their business strengths and weaknesses. Business analysis focuses on the effective use of data and information to drive positive business actions. With good and accurate data analysis, business decision makers can make well-informed decisions for the future of their organizations. The Business Intelligence (BI) tools allow companies to automate its functions of analysis, strategy, and forecasting to make better business decisions. Online Analytical Processing (OLAP) and Data mining model are the key features of the BI tools that help companies extract data from an operational system, to summarize data into working totals, to find the hidden patterns from data for future analysis and prediction, and to intuitively present these results to the end users [1, 2]. 1
    • 1.1 What is Online Analytical Processing (OLAP)? The standard definition of OLAP provided by the OLAP Council [2] is: “A category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user”. The functionality of OLAP, according to the definition of the OLAP Council, lets the users complete the following tasks [2]: • Calculations and modeling applied across dimensions, through hierarchies and/or across members • Trend analysis over sequential time periods • Slicing subsets for on-screen viewing • Drill-down to deeper levels of consolidation • Reach-through to underlying detail data • Rotation to new dimensional comparisons in the viewing area. Therefore, OLAP performs multidimensional analysis of enterprise data and provides the capabilities for complex calculations, trend analysis and very sophisticated data modeling. In addition, OLAP enables end-users to perform ad hoc analysis of data in multiple dimensions, thereby providing the insight and understanding they need for better decision making. An OLAP structure created from the operational data is called an OLAP cube [1, 2]. OLAP cubes are data processing units consisting of the fact and the dimensions from the database. They provide multidimensional views and analytical querying capacities. Therefore, OLAP technology can provide fast answers for complex querying on operational data for decision-making management. 2
    • 1.2 Data Mining Data Mining is defined as the automated extraction of hidden predictive information from database systems [3, 4]. Generally, it is the process of analyzing data from different perspectives and discovering patterns and regularities in sets of data. Specifically, the hidden patterns and the correlations discovered in the data can provide strategic business advantages for decision-making in organizations. 1.3 Statement of the Problem Microsoft® Analysis Services, shipped with SQL server™ 2000, is the OLAP database engine and is able to build multidimensional cubes [1, 5]. It also provides the application programs to browse the cube data and tools to support data mining algorithms for discovering trends in data and predicting future results. The implementation of Analysis Services is heavily wizard oriented in building and managing data cube and data mining model. Although many features are also available through the predefined editors, the wizard-intensive process still requires users to fully understand the cube structure and associated objects in the definition process. The complexity of cube development makes it difficult for end-users with little technical experience to gain access to these analysis tools. 1.4 Motivations and Contributions In reality, most decision-makers within an enterprise want to be able to use the insights gained from their data for more tactical decision-making purposes. However, they are not generally interested in spending time in building cube or mining model to 3
    • answer their business issues. Analysis Services provides intensive wizards and editors in the development of OLAP cubes and the mining models. It has been designed to be flexible for all levels of users, but users have difficulty learning to use these features effectively and creating useful models for decision making. The best solution is to design a specific front-end interface to meet the user’s requirements with the ability to cross- analyze data even through a single click and to mask the underlying complexities of the applications from the users. Analysis applications contain sensitive and confidential information that should be protected against unauthorized access and only are available to appropriate decision makers. Analysis Services automatically creates an OLAP Administrators group in the operating system. A member of the OLAP Administrators group has complete access to the analysis objects. A user that is not a member of the OLAP Administrators group has read- or write-access to the extent permitted based on dimension-level or cell-level security but performs no administrative tasks. However, the active user must be a member of the OLAP administrators group to use Analysis Manager. Therefore, the non- Administrator user can not exploit the cube information through Analysis Manager. One of the scope of this thesis is to construct a client-application interface by using the Multi- dimensional Expressions (MDX) and ActiveX® Data Objects/Multi-dimensional (ADO/MD) to query OLAP data to solve this conflict issue [1, 6]. The main contributions of this thesis are as follows: • Development of a component, cubeBuilder, for software developers to design application interface which can build the OLAP cube model to meet user’s analytical requirements 4
    • • Development of a component, DMBuilder, for developers to design a specific user-interface to create data mining model for users to uncover previously unknown patterns • Development of a component, cubeBrowser, for developers to design a client interface to browse the cube data for non-Administrators group users. In addition, these data analysis components not only help the software developers to design the specific application without coding from scratch, but also hide the complexities of development challenges from the less technically-oriented users. 1.5 Organization of the Thesis This thesis covers the work on the development of the data analysis components, cubeBuilder, cubeBrowser and DMBuilder for OLAP and mining model solutions. This thesis is organized as follows: Chapter II provides an overview of Microsoft SQL Server Analysis Services including its fundamental operations and architectures in the functionality of OLAP and Data Mining model. The step-by-step processes used to create an OLAP cube, to browse the existing cube data and to create a data mining model with Analysis Manager are also illustrated and described in Chapter II. Chapter III focuses on the development of the design and the structures of the analysis components for OLAP and mining model solutions. Chapter IV describes the implementations of these analysis components in the desktop and web-based applications interface for OLAP cube and mining model system. 5
    • It also describes a case study with the heart disease dataset to demonstrate the application of the analysis components. Chapter V presents a summary of the work that has been done in this thesis. It also compares the functionalities between the analysis components and Analysis Manager in the aspects of building of OLAP cube and mining model. The directions of future work and the conclusion of this thesis are also presented in Chapter V. 6
    • CHAPTER II MICROSOFT SQL SERVER 2000 ANALYSIS SERVICES 2.1. Overview Microsoft® SQL server™ 2000 Analysis Services provides fully-functional OLAP environment, which includes both OLAP and data-mining functionality [5]. It is a suite of decision support engines and tools. It can also function as an intermediate layer that converts relational warehouse data into a form, also called a cube, which makes it fast and flexible for creating an analytical report. 2.2. Architecture The architecture of Analysis Services can be divided into two portions: the server and the client, as shown in Figure 2.1. The server portion, including the engines, provides the functionality and power, while the client portion has interfaces for front-end applications [5]. 2.2.1. Server Architecture The primary component of Analysis Services is the Analysis Server. The Analysis Server operates as a Microsoft Window NT or Windows 2000 service and is 7
    • Server Microsoft Management Console (MMC) Analysis Manager Decision Support Objects Data sources (DSO) Cubes Analysis Server Mining models Client PivotTable Service ADO MD Client Application Client Application Figure 2.1 Analysis Services architecture specifically designed to create and maintain multidimensional data structures [5, 6]. It also provides multi-dimensional data values to client queries and manages connections to the specified data sources and local access security. Figure 2.1 illustrates the Analysis Manager, a snap-in console in Analysis Services, which communicates with the server 8
    • through the Decision Support Objects (DSO) component tool. The DSO is a set of programming instructions for applications to work with the Analysis Services [7]. 2.2.2. Client Architecture The client side of the Analysis Services is primarily used to provide an accessing interface, the PivotTable Service, between the server and the custom applications, as shown in Figure 2.1 [6, 7]. PivotTable Service communicates with the Analysis server and provides interfaces for client applications to access OLAP data and data mining data on the server [6, 7]. It provides the OLE DB interface for users to access data managed by Analysis Services, custom programs or client tools. 2.3 OLAP Cube The primary form of data representation within the Analysis Services is the OLAP cube [5-8]. A cube is a logical construct. It is a multidimensional representation of both detailed and summary data. Cubes are designed according to the client’s analytical requirements. Each cube represents data values of different business entities. Each side of the cube presents a different aspect of the data. Cubes in the Analysis Services are built using one of two types of database schemas: the star schema and the snowflake schema [9]. Both schemas consist of a fact table and dimension tables. The Analysis Services aggregates data from these tables to build cubes. As shown in Figure 2.2, the star schema consists of a fact table and several dimension tables. Each dimension table corresponds to a column in the fact table. The data in the dimension tables are used to form the analytical queries in the fact table. 9
    • However, in the snowflake schema, several dimension tables are joined before being linked to the fact table. Star Schema Dimension Table Fact Table Dimension Dimension Table Table Snowflake Schema A layer of Dimension tables Dimension table 1 Fact Table Dimension table 2 Dimension table 3 Figure 2.2 The star and snowflake schemas 10
    • 2.4 Analysis Manager The Analysis Manager is a tool for the Analysis Server administration in Microsoft SQL Server 2000 Analysis Services [5-9]. It is a snap-in application within the Microsoft Management Console (MMC), which is the common framework for hosting administrative tools. Figure 2.3 illustrates the screenshot of the hierarchical, tree-view representation of the server and all its components in the left pane of the console. Figure 2.3 Screenshot of the Analysis Manager 11
    • The major functional features for the Analysis Manager are summarized as follows: • Administering Analysis server • Creating database and specifying data sources • Creating and processing cubes • Creating dimensions for the specified database • Specifying storage options and optimizing performance • Authorizing and managing cube security • Browsing cube data, shared dimensions and other objects • Creating data mining model from relational and multidimensional data • Viewing the Mining Model. 2.4.1 Creating the Basic Cube Model Analysis Services provides wizards and editors within the Analysis Manager to let the user create the cube easily [6, 8]. The step-by-step instructions for building a basic cube model in the Analysis Manager using the Cube Wizard are summarized as follows: 1. Creating an Analysis Server’s database A database acts like a folder that holds cubes, data sources, shared dimensions, mining model and database roles as illustrated in Figure 2.3. To create a new database on a server, after launching onto the Analysis Manager, right-click the server name and then select new database from the pop-up menu [1, 2]. The Database dialog box appears for user to enter a new database name for the new cube model, as shown in Figure 2.4. 12
    • Figure 2.4 Screenshot of the database dialog box of Cube Wizard 2. Specifying the data source After creating a new database, a data source needs to be specified for the cube. The data source contains the information of the data used in the cube [6, 7]. The purpose of adding a data source is to let Analysis server establish connections to the source data. The Data Link dialog box, as illustrated in Figure 2.5, can be opened by right-clicking the Data Source folder and selecting New Data source from the pop-up menu. Figure 2.5 Screenshot of the Provider for the Data Link dialog box 13
    • In the Data Link dialog box shown in Figure 2.6, the user can specify a provider, the server name, login information and a database name to connect to the Analysis server. Figure 2.6 Screenshot of the Connection tab of the Data Link dialog box 3. Selecting the fact table and the measures The Cube Wizard and the Cube Editor are the tools to be used in the Analysis Manager to create the OLAP cube [8]. A fact table contains the measure fields, which consist of the numeric values for the analysis, and the key fields that are used to join to dimension tables. The fact table should not contain any descriptive information or any labels in addition to the measures and the index fields. Each cube must be based on only one fact table. As shown in Figure 2.7, the panel displays all the tables in the specified data source. After selecting the fact table, click the “Next” button, the Wizard displays all of the available numeric data in the selected table, as shown in Figure 2.8 14
    • Figure 2.7 Screenshot of the “Select a fact table” dialog box with a selected fact table After specifying the measures from the list, click the “Next” button, the Cube Wizard asks the user to select dimensions or to create dimensions. Figure 2.8 Screenshot of the “Defining measures” dialog box 4. Adding dimensions and levels to the cube Dimensions are the categories for the user to analyze and summarize the data [6-8]. In other words, dimensions are the organized hierarchies that describe the data functions in the fact table. There are two types of dimensions to be created for use in the cube. A dimension created for use in an individual cube is called a private dimension. A shared dimension is the one that multiple cubes can use [8]. A cube must contain at least one dimension, and the dimension must exist in the database object where a cube will be created. 15
    • In the Analysis Manager, a new dimension can be created either by the Cube Editor or the Cube Wizard. If the editors are used to build the cube, then a dimension has to be created before adding to a cube. However, if the Cube Wizard is used to create a cube, then it will launch the Dimension Wizard to handle the task as part of the processing in creating a cube [8]. The step-by-step processes of creating a new shared dimension with the Dimension Wizard are summarized as follows: a. Selecting the type of dimension schema in the screen of the “Choose how you want to create the dimension”, as shown in Figure 2.9. Figure 2.9 Screenshot of the Dimension Wizard b. Specifying the dimension table from the available table list in the screen of the “Select the dimension table”, as shown in Figure 2.10. c. Selecting the level on the screen of the “Select the levels for your dimension”, as shown in Figure 2.11. 16
    • Figure 2.10 Screenshot of the “Select Dimension table” dialog box Figure 2.11 Screenshot of the “Select levels” dialog box d. Specifying the new dimension name and previewing the dimension data in the “Finish” dialog box of the Dimension Wizard, as illustrated in Figure 2.12. 17
    • Figure 2.12 Screenshot of the “Dimension Finish“ dialog box 5. Setting the storage options and setting up the cube aggregations The storage mode determines how the data is organized in the server [8, 9]. It affects the requirements of disk-storage space and the data-retrieval performance. There are three types of storage options supported by Analysis Services: Multi-dimensional OLAP (MOLAP), the relational OLAP, and the Hybrid OLAP (HOLAP). The descriptions and storage locations of each mode are summarized in Table 2.1. The Storage Design Wizard is used to select the option for the cube in the Analysis Manager, as shown in Figure 2.13 18
    • Table 2.1 Storage options supported by Analysis Services Storage Locations Storage Mode Description Fact data Aggregated Values ROLAP Relational OLAP Relational Relational 1. Slow processing, database Database 2. Slow query response and Server Server 3. Huge storage requirements 4. Suitable for large databases or legacy data. MOLAP Multidimensional OLAP Cube Cube 1. Require data duplication 2. Pre-summarizes the data to improve performance in querying and displaying the data 3. High performances 4. Good for small to medium size data sets. HOLAP Hybrid OLAP Relational Cube A combination of ROLAP and database MOLAP Server 1. Does not create a copy of data 2. Provides connectivity to a large number of relational databases. 3. Good for limited storage space but faster query responses are needed. Figure 2.13 Screenshot of the “Storage Design Wizard” for selecting of storage options 19
    • After deciding the storage option, the next step is to specify the aggregation options in the Set Aggregation Options dialog, as illustrated in Figure 2.14 [8, 9]. This option allows the user to set the level of aggregation for the cube to boost the performance of queries. Aggregations are pre-calculated summaries of data that improve query response time. The larger the level of cube’s aggregation, the faster the queries will be executed, but a greater amount of disk space will be needed and more time will be required to process the cube. In the Analysis Services, there are three aggregation options for selection: • Estimated storage reaches: specifying the maximum storage size in either megabytes (MB) or gigabytes (GB) • Performance gain reaches: specifying the percentage amount of performance gain for the queries • Until I click stop: selecting the manual control of the balance . Figure 2.14 Screenshot of the “Set aggregation options” dialog box 20
    • 6. Processing the cube Processing the cube is required before attempting to browse the cube data, especially after designing its storage options and aggregations, because the aggregations are needed to be calculated for the cube before the user to view the cube data [8, 9]. The major activities involved in the cube processing are described in a “Process” window, as shown in Figure 2.15, and summarized as follows: a. Reading the dimension tables to populate the levels from the actual data b. Reading the fact table c. Calculating specified aggregations d. Storing the results in the cube. Figure 2.15 Screenshot of the “Process” window 21
    • In the Analysis Manager, there are three options to be used to process a cube depending on the different circumstances of the data structures. These options, summarized in Table 2.2, can be selected in the “Process a Cube” dialog box, as shown in Figure 2.16 [9]. Table 2.2 Summary of cube process options Options of Process Circumstances Full process Modifying the structure of the cube Incremental update Adding new data to the cube Refresh data Clear out and replacing a cube’s source data Figure 2.16 Screenshot of the “Process a cube” dialog box 22
    • 2.4.2 Browsing a Cube In the Analysis Manager, using Cube Wizard to view the cube data is one of viewing methods [5- 9]. There are two ways to open the Cube Browser to load cube data into it: a. Right-click the cube name in the Analysis Manager Tree pane and selecting “Browse Data” from the pop-up menu b. Click the “Browse Sample Data” in the last step of the Cube Wizard The cube Browser not only let users to view the multidimensional data in a flattened two-dimensional grid format, as shown in Figure 2.17, but also makes it possible to drill up or drill down different dimensions of data. However, the Cube Browser can not be used to view unprocessed cube data [6]. Figure 2.17 Screenshot of the “Cube Browser” and sample results 23
    • 2.4.3 Building the Data Mining Models Data Mining is the process of extracting knowledge hidden from large volumes of data [10, 11]. It involves uncovering patterns, trends, and relationships from historical data and predicting outcomes of future situations. The primary mechanism for data mining is the data mining model, an abstract object that stores data mining information in a series of schema rowsets. The mining model serves as the blueprint for how data should be analyzed or processed. Once the model is processed, information associated with the mining model not only represents what was learned from the data, but also allows users to discover the business trends for future decision making [11]. Two data mining algorithms are built into Microsoft SQL server 2000 Analysis Services: Microsoft Decision Trees and Microsoft Clustering [12, 13] A. Decision Trees Algorithm: Microsoft Decision Trees algorithm uses the recursive partitioning to divide the data in a tree structure, and continually performs this search for predictive factors until there is no more data to continue with [10-13]. A node in the tree structure represents each predictive factor used to classify the data. This method focuses on providing information paths for rules and patterns within data, and is useful in predicting the exact outcomes for the future problems [12, 13]. B. Microsoft Clustering Algorithm: Microsoft Clustering algorithm is based on the Expectation and Maximization (EM) algorithm [11, 12]. It uses iterative refinement techniques to group records into neighborhoods (clusters) that exhibit similar, predictable characteristics [13]. These are 24
    • useful for uncovering a relationship among data items in a large database with hundreds of evaluated attributes. The following steps describe the process of creating a mining model using the mining model wizard in the Analysis Manager [13]: 1. Specifying the type of data: In the window of “select data source type”, as shown in Figure 2.18, users can select either relational data type or OLAP data to build the target mining model. Figure 2.18 Screenshot of the “Select source type” dialog box 2. Selecting the source cube: In the “select source cube” window, as shown in Figure 2.19, users need to highlight the target cube from the available cube lists [11, 13]. 25
    • Figure 2.19 Screenshot of “Select source cube” window 3. Specifying the data mining method; In the “Select data mining technique” window, as shown in Figure 2.2, users can select one of two mining algorithms provided with the Analysis Services: Microsoft Decision Trees and Microsoft Clustering [9, 10]. Figure 2.20 Screenshot of the selecting mining model technique 26
    • 4. Identifying the case base or unit of analysis In the “Select case” window, as shown in Figure 2.21, users need to specify the case base of the analysis for the modeling task. A case is the basic unit of analysis for mining task. Figure 2.21 Screenshot of the “Select case” dialog box for specifying a case of analysis 5. Selecting the predicted entity: In this step users must provide information for prediction used in the mining model [12], as shown in Figure 2.22. The predicted entity can be chosen as one of the following items: A measure of the source table A member property of the case dimension and level Members of another dimension in the cube. This feature provides flexibility in the process of predictive analysis using OLAP data. 27
    • Figure 2.22 Screenshot of “Select predicted entity” window 6. Selecting a training data: The training data is used to process OLAP data mining model and to define the column structure of a data mining for the case set. As shown in Figure 2.23, the users should select at least one additional data item from the data training data [12, 13]. 28
    • Figure 2.23 Screenshot of the “Select training data” window 7. Naming the model and process the model: After user enters a model name and selects the “Save and process now” check box, as shown in Figure 2.24, the wizard will process the model and train the model with data based on the specified algorithm. Figure 2.25 displays the process of model execution [13]. When the process is complete, a message of “Processing completed successfully” appears in the bottom of dialog box. 29
    • Figure 2.24 Screenshot of the “Saving the data model” of the Mining Model Wizard Figure 2.25 Screenshot of the “Model execution diagnostics” window 30
    • After clicking the “close” button, the OLAP Mining Model Editor will be launched and system displays the content details of the proposed mining model, as shown in Figure 2.26. Figure 2.26 Screenshot of the content details of a created mining model 31
    • CHAPTER III DESIGN OF DATA ANALYSIS COMPONENTS It has been known that Microsoft SQL Server 2000 provides the OLAP functionality to build and manage multidimensional models of data and applications for use in large enterprise systems [1, 2]. There are three programmatic interfaces in the Analysis Services for user’s applications: ActiveX Data Objects Multidimensional (ADO MD), OLE DB for Online Analytical Processing (OLE DB for OLAP) and Decision Support Objects (DSO) [10 -14]. ADO MD is an extension to the ADO programming interface that can be used to access multidimensional schema, to query cubes, and to retrieve the results [10]. It uses an underlying OLE DB provider, which is Microsoft's strategic low-level application program interface (API) for access to different data sources [11]. OLE DB for Online Analytical Processing (OLE DB for OLAP) is a set of objects and interfaces that extend the ability of OLE DB to provide access to multidimensional data stores [12]. DSO is the administrative programming interface to create and alter cubes, dimensions, and calculated members. It also can use other functions that are able to perform interactively through the Analysis Manager application [13, 14]. 32
    • Using these programmatic tools can provide more controls over those OLAP and data mining operations. Developer can hide the complexities in the process of creating cubes and mining model from a less technical user. ADOMD, the data abstraction tool, allows developers to create either a local or remote front-end interface for exploration metadata, databases and analysis functions. Especially it provides an analytical tool for end-users who do not have the OLAP administrator privileges to access cube data with Analysis Manager. This chapter will introduce the data analysis components developed for building and viewing the cube and data mining model in Microsoft SQL server 2000 environment. 3.1 Component-Based Development Component-based development (CBD) is a software application methodology that allows developers to reuse the existing components. The idea of reuse and the flexibility are the main characteristics of CBD [15, 16]. Developers no longer need to construct software applications from scratch; they only need to reuse existing re-built components to meet application requirements. This feature of code reuse reduces the production costs and enhances the maintainability of the software system. Flexibility is another useful trait which allows for components to be easily replaced, modified and maintained. Using CBD, the process of software design is made more effective and flexible. 3.2 What Is a Component? A software component involves three essential parts: a service interface, an implementation, and deployment [17]. A service interface specifies the component. An 33
    • implementation implements the interface to make the component work. The deployment is the executable file to make the component run to meet the requirements. Kirby McInnis [17] has given a single comprehensive definition of a component: “A component is a language-neutral, independently implemented package of software services, delivered in an encapsulated and replaceable container, accessed via one or more published interfaces. A component is not platform contained or application bound.” The reuse of existing components reduces the development and maintenance costs. It also increases the productivity since there is no need to build new applications from scratch. 3.3 The cubeBuilder Component The component, cubeBuilder has been developed and is built on top of the DSO to allow the developers to create the OLAP cube programmatically without using the Analysis Manager [17, 18]. Figure 3.1 not only depicts the component’s architecture and relation to the server, but also shows the workflow to create a data cube with the component. The sequence of operations involved in building an OLAP cube is described as follows: 1. Connecting to an Analysis server: The first step in the process of building an OLAP cube is to connect to an Analysis server. A server object clsServer of the DSO object model is the main entry for accessing the Analysis server. 34
    • Custom Application cubeBuilder component Connect to Create Create Process server Database Cube Cube Add Data Create Source Dimension Decision Support Objects (DSO) Cube Relational Analysis Server Database Figure 3.1 Architecture of the component cubeBuilder The cubeBuilder component provides a method called ConnectToServer to use the server object of the DSO to connect a computer where the Analysis server service is running [18]. 2. Creating a database object to contain dimensions and cubes: After connection to the Analysis server, the database object is the first object needs to be created in the process of building of the OLAP cube. A database object is a container for the related cubes and other objects. It consists of data sources, shared dimensions and database roles. It is also used to store the cubes, data mining models and 35
    • other related objects. The cubeBuilder component can either create a new database object or open the existing database system in the server. 3. Adding a data source that contains the data. After setting up the database object, a link to a data source into the database has to be added before constructing an OLAP cube. The data source object of DSO specifies a source of a data file to be used as the source database for the cube. The cubeBuilder component is able to handle the following tasks through the data source object of DSO: Setting up the connection to data source Finding the specified data source Adding new data source into the specified database object. Setting the link to the specified data source 4. Creating dimensions and their levels A dimension is a structural attribute of an OLAP cube and is an organized hierarchy of categories that describe data in the fact table of the data warehouse system. These categories provide users the base of data analysis. The cubeBuilder component uses the dimension object of DSO to create a shared dimension in the user-specified database object. The dimension object provides a specific implementation of the DSO dimension interface. Through the Dimension interface, the component cubeBuilder can provide the following tasks: Creating a new dimension object in the database object Creating a new level on the dimension, and sets the associated level’s properties. 36
    • 5. Create a cube and specify dimensions and measures The following steps illustrate the way to add a cube to the user-specified database object by using the cubeBuilder component; Using the method to add the user-specified cube name into the collections of the database object Specifies the data source of the cube Specifying the fact table of the cube Setting up the SourceTable and EstimatedRows properties of the cube through the method of AddFactTblToCube Specify the measures from the fact table for analysis Adding the database’s dimension to the cube’s collections with the AddSharedDimToCube method. 6. Process a cube to load its structure and data. After defining a cube and its measures to the database objects, the cube can be processed. The cube can be fully processed by using the ProcessCube method of the cubeBuilder component to load the cube’s structure and data. 3.4 The cubeBrowser Component The component cubeBrowser can be used in the software applications to access data information from the multidimensional data sources in the Microsoft Analysis Services. It is a layer on top of ADO MD that can be used to write OLAP applications to retrieve data information from the OLAP cube. Figure 3.2 shows how the component cubeBrowser fits into the Analysis Services architecture. The PivotTable Services is the client side of Microsoft Analysis Services and implements the OLE DB for OLAP, which 37
    • is a standard interface for returning OLAP data. OLE DB for OLAP is a high- performance COM that doesn’t support OLE automation. ADO MD is the Microsoft’s extension to ADO for accessing and manipulating data cubes [16, 17]. Application Analysis Server ADOMD cubeBrowser OLAP Engine OLEDB for OLAP PivotTable Service Cube User Figure 3.2 Relationship of cubeBrowser to the Analysis Server 3.4.1 Browsing OLAP objects The component cubeBrowser can be used in the OLAP applications to allow the end users to browse the OLAP cubes. It also can view the properties of the cubes and underlying structures and to execute the analytical queries for their business questions. The data information of cube schema is one of the two options to access data from OLAP cubes. It includes the concept of an OLAP database containing all the cubes and their underlying structures, while the other option consists of the execution of the analytical queries and displaying the queried results for business analysts [15, 16]. 38
    • The basic workflow of the usage of the component cubeBrowser in browsing the cube objects is shown in Figure 3.3 and is summarized as follows: A. Retrieving the Information of cube schema a. Setting up connection string and connect to server. b. Displaying the results. B. Execution of a analytical query a. Setting up the direct connection to the Analysis Server. b. Displaying the hierarchical structures of an OLAP database. c. Constructing the MDX queries and displaying the retrieval results. d. Illustrating the definition of a particular OLAP cube and its underlying dimensions. 3.4.1.1 Retrieving information of cube schema The information of the cube schema includes the concept of an OLAP database containing all the cubes and their underlying structures. In order to get information of the cube schema, the first step is to set up a connection to the Analysis Services engine. The connection string consists of values for the provider, data source, initial catalog, and other user’s and system’s information. Table 3.1 lists the primary values needed to construct a connection string. The provider is the name of the OLE DB for OLAP provider which is used to connect to the OLAP engine. In the Analysis Services, the value is MSOLAP2, which is the name of the Microsoft OLE DB Provider for OLAP Services 8.0 [19]. The data source is the host- 39
    • name of the server. The initial catalog is the particular database object in the specified server. Hierarchical structure of a database object Analysis Server ADOMD OLAP Application Catalog cubeBrowser Database object object Setup connection Cellset Listing the definitions object Cube Setup connection Create MDX query Process query & display result Details of cell values Figure 3.3. The basic workflow of browsing OLAP cube data using cubeBrowser 40
    • Table 3.1 Values of the connection string Parameter Value Provider Name of the OLE DB for OLAP used to connect to the OLAP engine. In the Analysis Services, this value is MOSOLAP2. Data Source The location of the server, expressed as a hostname. Initial Catalog The name of OLAP database objects to be connected. User ID Username to use for connection to the server. Password Password used for user to connect to the server. After construction of the connection string, the component cubeBrowser provides a method to connect to an ADO MD Catalog object to the server and the database object specified in the connection string. The detailed hierarchical structures of a cube can be viewed by using the method ViewCubeStrct of the component cubeBrowser after specification of a particular cube. This method uses the CubeDef object of ADO MD to display the definition of a particular OLAP cube and its underlying dimensions [15, 16]. In summary, by using the component cubeBrowser in conjunction with ADO MD object in the OLAP application, the end user can retrieve the complete information about the structure of any cube stored in the Analysis Services [20, 21]. 3.4.1.2 Analytical querying of cube data In addition to drill down and to display the object schema with a particular OLAP database, the component cubeBrowser provides features to support querying of an Analysis Services cube with the MDX. The result from a MDX query of a cube is returned in a structure called a Cellset [16]. 41
    • The querying language to manipulating data through ADMD is called Multidimensional Expressions (MDX). The MDX syntax supports the definition and manipulation of multidimensional objects and the data stored in the cubes of the Analysis Server [22]. In addition to its query capabilities, the MDX can be used to define the cube structures and to change the data in some cases. It also can be used in conjunction with ADOMD to build client applications to access OLAP data for business analysts [16, 23]. The following steps are required to process an MDX query: a. Create a new Cellset object A Cellset object is used to store the results of a multidimensional MDX query in the ADO MD object model. The Cellset object is created based on a MDX query for the user’s analysis. b. Establishing the connection: To make a connection to the Analysis Services engine, it is necessary to specify the values of the provider, data source and initial catalog of the connection string of a Cellset object. c. Construction of an MDX query: The general syntax for an MDX statement is shown as follows: SELECT <member selection> on axis1, <member selection> on axis2 --- FROM <cube name> WHERE <slicer> The three clauses shown above describe the nature and scope of an MDX query. The axis clause specifies the data information wanted and the format to display the results. A FROM clause defines the specific cube which contains the required data. A WHERE clause is used to specify the conditional selection for the data slicing. The component 42
    • cubeBrowser provides a function to set up the MDX query based on the user’s specification and the analytical questions. d. Perform the query and populate the results: After the construction the required query, the component cubeBrowser provides a method to open a specific Cellset object. After the Cellset object is open, the resulted data can be along the positions and displaying the data in its cell of the Cellset. 3.5 The DMBuilder component In addition to the programmatic access to the OLAP cube resources, the Decision Support Objects (DSO) can also be used to create and maintain the data mining objects programmatically [10, 11]. The component DMBuilder, developed in this work, acts on top of DSO to allow the software developers to accommodate direct programmatic access to the data mining functionality within the Analysis Services. The analysis component DMBuilder can provide an object model to program a range of a varied object set to work with, including servers, databases, mining structures and algorithms, as well as OLAP cube objects. It also allows developers to embed data mining functionality into applications to meet user’s mining requirements. The architecture and the logic relations of the component DMBuilder to DSO are depicted in Figure 3.4. 43
    • DM Solutions User DMBuilder Data DSO Mining Model Analysis Server Cube Figure 3.4.The architecture and logic relations of DMBuilder with DSO The following steps describe the basic operations involved in the process of creating a data mining model programmatically using the developed DMBuilder component in conjunction with DSO [17] (Figure 3.5): 1. Connecting to the target Analysis server: The component DMBuilder can connect to the target Analysis server through the function of ConnectToServer with the user’s specified server name. 2. Selecting a target database object which contains the OLAP cube data sources: After connecting to the target server, the database object which consists of the target OLAP cube data sources can be selected and set up from the target server by using the component’s function of SelectDbObj. 44
    • DMBuilder Connect Setup mining properties Server/Database Analysis Column 1. Data Source Name object Entries 2. Source cube Name Add Mining 3. Case Dimension Name Process Data Set Up Data Mining Model Mining Add Mining Algorithm Roles DSO Data Mining Model Analysis Server Figure 3.5 Flow Logic of the DMBuilder Component 3. Creating a new data mining model: A new data mining model object can be created by using the AddNewMiningModel of the DMBuilder component with the user-specified mining model name and class type. When the OLAP mining model is created, the class type is set to be so called sbclsOLAP. 4. Creating and assigning a mining model roles: Using the function “AddMiningModel” of DMBuilder component, the user-specified mining model role can be created and assigned to the new OLAP mining model object. 5. Setting the needed properties for the new mining model: There are several needed properties to be set up for the OLAP mining model objects. 45
    • Table 3.2 summarizes these needed properties required for the OLAP mining model object. These properties can be set up by using the function “SetModelProperty” of the component DMBuilder. Table 3.2 Listings of properties required for OLAP mining model objects [17] Property Descriptions Case dimension Defines the case dimension. Case Level Defines the case level of the case dimension. It identifies the lowest level in the dimension. Mining model algorithm Defines the data mining algorithm providers. In analysis Services, there are two types: Decision Trees and Microsoft Clustering. Source cube Defines the OLAP cube used for training data. Subclass type Defines the column option type. The value for OLAP mining model object is set to sbclsOLAP. 6. Creating a new mining model column and setting its properties: Data mining column has several property types which are needed for the new mining model. The most important properties are data type, content type and usage. In data mining with the Analysis Services of SQL server 2000, there are four types of column usage: input, predict, disabled and key. The component DMBuilder provides the function called “EnableColumnProperty” to process this task and to send the column metadata to the server. 7. Training and processing the mining model object: All the necessary properties and definitions required for the target mining model are set up and complete. Before the mining model can be used for analysis, this model needs to be trained to find useful information or patterns in the data. This processing step is executed in the server, and the time needed in the processing depends on the amount of 46
    • data involved and on the complexity of the analytical category. Before training and processing, the model has only the defined metadata, however, after processing, the hidden patterns are stored in the model. The function ProcessMiningModel of the component DMBuilder will handle this task using the ProcessFull option [3, 11]. 3.6 Conclusions These analysis components cubeBuilder, cubeBrowser and DMBuilder provide a set of functions for creating and managing OLAP solutions and data mining model in the Analysis server. Fully compatible with the .NET environment, these components let developers easily embed code into user-specific applications to build and to process the target OLAP solutions and mining model systems. Using these data analysis components, the SQL Server 2000 business intelligence is able to be integrated directly into user- friendly applications, and the OLAP solutions can be created and managed programmatically to meet user’s needs and specifications for their daily business analysis and decision-makings. 47
    • CHAPTER IV CASE STUDIES AND RESULTS These data analysis components, developed in this thesis, are applied to a case study of the heart disease database in the Microsoft SQL server 2000 environment . The purpose of this case study is to provide the user’s application interface wrapped with the analysis components in building the OLAP cube, in browsing the cube data and in creating the mining model with the cardio test dataset. The results and implementations of the case study are used to illustrate the advantage of using these data analysis components for the OLAP solutions and the mining models. Each of the following sections describes the practical aspects of the developed analysis components. 4.1 A Case Study of the Heart Disease Datasets The heart disease datasets are collected from four different locations and are the results of the heart disease diagnosis tests [24]. Each database has the same instance format using only thirteen of a possible seventy-five attributes for analysis. Appendix A provides the detailed descriptions of the heart disease datasets. 48
    • 4.1.1 Heart Disease Sample File The heart disease datasets are downloaded and are saved as the Microsoft Access 2003 database. These database samples consist of four tables. The relationship of these sample tables is depicted in Figure 4.1. This constructed schema resembles the structure of the star schema. Figure 4.1 Relationship of the heart disease test data 4.1.2 Software Implementation These data analysis components are implemented in the Microsoft SQL server 2000 environment using Visual Basic.Net (VB.Net) as the major programming language for both the OLAP solutions and the mining model objects. The window front-end applications implemented with the cubeBuilder and DMBuilder components are the desktop stand-alone software applications and require the Analysis Services to reside on 49
    • the same system. The advantage of this approach is that the runtime’s access security will not be an issue for connection to the Analysis server. In addition, an ASP.Net web- based application implemented with the cubeBrowser component is also developed using VB.Net as the major source code for end-users to browse the OLAP cube data. 4.2 Implementation of the cubeBuilder Component The interface of cardio cube builder, cardioCube, as shown in Figure 4.2, implements the component cubeBuilder in order to demonstrate the process of building the OLAP cube with the heart disease database. The detailed procedures for building of the cardio test cube are described, step-by-step, in the following subsections. Figure 4.2 Screenshot of the cardio cube builder interface 50
    • 4.2.1 Creating a New Cube When the form was loaded, only the “Data Source/Cube” section was visible for users to specify the name of the data source and the name of the new cube (Figure 4.3). The name for the data source specifies the cardio data file used to build the cardio cube (Figure 4.4). The name of the new cube is the name saved in the database object for future reference. These specified names are added to the cardio database object in the Analysis server through the functions, SetDataSource and AddCubetoDb, of the component cubeBuilder. Figure 4.3 Screenshot of the “Data Source/Cube” section Figure 4.4 Screenshot of sample entries for both sections of “Data Source/Cube” and “Specify Fact/Measures” 51
    • 4.2.2 The Fact Table and Measures Selections After the setting of the data source and the adding of the new cube name “test1” to the target database object, the section of “Specify Fact/Measures” is visible for the user to specify the fact table and the measures which can be used in the building of the cardio cube (Figure 4.4). The fact table lists the core features for the query to be used in the analysis. It contains a column for each measure as well as a column for each dimension. The measures are a set of numeric data based on the column values of the fact table and are the key indicators for the primary analytical interest of the user [6]. Figure 4.5 shows the details of the sample entries for the selection of the fact table and measures. Figure 4.5 Screenshot of sample entries of “Specify Fact/Measure” section 4.2.3 Adding Dimensions to the Cube Dimensions are the categories of the data analysis. As shown in Figure 4.6, the section of “Add Dimensions to Cube” is used to add dimensions to the cube. There are pre-defined shared dimensions are available in the cardio database object; these dimensions can be specified and be added to the cube through the function of 52
    • AddDimToCube of the component cubeBuilder after clicking the button “Add Dimension”. Figure 4.7 shows the sample entries of the dimension and its related key column. Figure 4.6 Screenshot of the “Add Dimensions to Cube” section Figure 4.7 Screenshot of sample entries for cube dimension 4.2.4 Processing and Building the New Cube After determining the measures, the dimensions and the fact table of the cube, the section of “Process/Build Cube” of the form is visible as shown in Figure 4.8. After clicking the “Build Cube” button, the “multidimensional OLAP” (MOLAP) is chosen as the storage mode for the cardio cube [1, 6]. The storage format affects the disk-storage space requirements and the data-retrieval performance. The MOLAP mode is chosen because it stores the fact data and the aggregations on the Analysis server in a space- 53
    • efficient, highly indexed multidimensional form [1, 5]. In addition, MOLAP mode summarizes the transactions into multidimensional views ahead of time. The data retrievals on these types of databases are extremely fast, because all calculations have been pre-generated when the cube is created. Figure 4.8 Screenshot of the “Process/Build Cube” section 4.2.5 The Results The detailed hierarchical database objects before and after the processes of building the cardio cube are depicted in Figure 4.9 and Figure 4.10 respectively. The difference between these two figures is that the new sample cube was added to the cube folder of the cardio database object after the process of building the cardio cube. However, these figures do not show the cube data of the cardio cube; the detailed cube data can be accessed in the following section with the web-based application, Cardio Cube Browser, which is implemented with the cubeBrowser component and is developed in this thesis. 54
    • Figure 4.9 Screenshot of the cardio test database object before building the new cardio cube Figure 4.10 Screenshot of the cardio test database object after building the sample “cube1” 55
    • 4.3 Implementation of the cubeBrowser Component The ASP.NET web-based application, using VB.net as the programming code, implements the cubeBrowser component for the end-users to browse the cardio cube data. This application’s user interface developed in this work, is contained within a single web form, called cubeBrowser.aspx, as shown in Figure 4.11 [20, 21]. This application provides the following functional features for user in the process of retrieving cube data: A. Connecting to the Analysis sever where the targeted cardio cube located B. Retrieving the cardio cube data based on the user’s specifications Figure 4.11 Screenshot of the web form BrowseCuber.aspx 4.3.1 Connection to the Analysis Server In querying the cardio cube, the first stage is to set up the connection to the Analysis server, which is the location of the target cardio cube. There are three functions in the analysis component cubeBrowser: ConnectToServer, SetUpDatabase and 56
    • SetUpDataSource, to be used to connect to the server, to set up database object and data sources in the Analysis server respectively. When the page is requested by the user, the server will process the request and sends the page to the browser. In addition, the server also connects to the Analysis server and lists the available cubes of the cardio database object for users to view the data as shown in Figure 4.12. Figure 4.12 Screenshot of listing of available cube 4.3.2 Retrieving the Cardio Cube Data Once a connection has been made to the OLAP data source, the multidimensional data of the cardio cube can be queried and manipulated through the MDX querying language [22, 23]. The first step in creating the MDX query is to select the target cube from the dropdown list (Figure 4.13). After the specification of the target cube, user needs to select the measures whose data is in the cube, as shown in Figure 4.13. Figure 4.13 Screenshot of specifying cube entry and measures 57
    • As shown in Figure 4.14, there are two pre-defined queries for user to select for their analytical purpose to view the cardio cube data: a. Pain type-location data This option will display the cardio cube data results of different chest pain types in the aspects of different geographical locations for the selected target measures (Figure 4.15). b. Patient-pain type data This option will display the cube data of different patient data with the selected chest pain type for the target measures. This option needs user’s selection of the different chest pains from the dropdown list as shown Figure 4.16. Figure 4.14 Screenshot of selections of measures and the pre-defined view options 58
    • Figure 4.15 Screenshot of selections of location for Pain-Type option Figure 4.16 Screenshot of selections of pain-type for Patient option 4.3.3 Displaying of the Cardio Cube Data After the selections of measures and specifying of view options, click the “Browse” button, the server processes the user’s request and displays the cube data in the grid format as shown in Figure 4.17 and Figure 4.18. Figure 4.17 Results of cube data for Pain-Type option with test country 59
    • Figure 4.18 Results of cube data for the angina chest pains per patient test city 4.3.4 Drill-down and Drill-up Capacities OLAP tools organize the data in multiple dimensions and in hierarchies. Dimensions usually associate with hierarchies, which organize data according to the levels. The drilling-down and the drilling-up are the two analytical techniques whereby the user navigates among various levels of data ranging from the most summarized (up) to the most detailed (down) [20, 21]. For example, when viewing the cardio cube data of different cites, a drill-down operation in the dimension of patient test center would display tc001 to tc004 of each test centers, as shown in Figure 4.19. However, a drilling- up operation would go in the reverse direction to a higher level and display the data of test countries as shown in Figure 4.20. 60
    • Figure 4.19 Screenshot of drill-down to the test center level of Patient option Figure 4.20 Screenshot of drill-up to the country’s level of Patient option 61
    • 4.4 Implementation of the DMBuilder component Figure 4.21 depicts the application interface DMMBuilder, which implements the component DMBuilder. This interface is used to create a mining model with the Microsoft Decision Trees algorithm as the constructing rules in the cardio cube data, which is created from the previous section. This application interface is coded and designed as a simple window form using VB.Net as the major programming code in the MS SQL server 2000 environment [12]. As shown in Figure 4.21, the “Mining Model Builder” form is divided into five groups. The following steps describe how to use this application form to build the mining model with the cardio cube data. Figure 4.21 Screenshot of the main interface DMMBuilder 62
    • Step 1: Setting up server and database information: The first step of creating the mining model is to provide not only the name for the Analytical server and database on which the user wants to perform the mining model task but also the mining model’s name for storing of mining model attributes, as shown in Figure 4.22. After clicking the “OK” button, an empty mining model has been created in the user-specified server and added to the user-specified database object. In addition, the section of “Mining Model Setup” is also available for the rest of the process, as shown in Figure 4.23. Figure 4.22 Screenshot of the “Server/Database” section Figure 4.23 Screenshot of Mining model setup 63
    • Step 2: Setting up the mining model role: The screen of the “Mining Model Role” is used to get the information of the mining model role in order to set up the security role for the new mining model, as shown in Figure 4.24. The method of SetMiningRole of the component DMBuilder is used to perform this task. Figure 4.24 Screenshot of setting the mining model role Step 3: Setting up the properties of the mining model: In this step, user needs not only to provide the data source name, source cube name, case dimension and the general description of the model but also to specify the mining model algorithm to be used for the target mining model (Figure 4.25). The Microsoft Decision Trees algorithm is chosen as the method for prediction [10]. After clicking the button “Add to DB”, the attribute information is added into each related property for the target mining model. Figure 4.25 Screenshot of setting properties and algorithm for the mining model 64
    • Step 4: Adding analysis column attribute: The properties needed for the new data mining model column are set through the form section “Analysis Column Entry”, as shown in Figure 4.26. In this step, the user needs to identify the training case and the predictive outcome for the purpose of analysis. Figure 4.26 Screenshot of setting the attributes of analytical column Step 5: Saving and processing data mining model: This new mining model object can be saved in the Analysis server after clicking the “Save DMM” button (Figure 4.26). At this point, this new mining model is created but it is not processed yet. Although a new mining model does not need to be processed, the mining model can not be viewed until the processing is completed. After clicking the button “Process DMM”, the processing of the new mining is performed in the mode of fully process and the information about the patterns and rules discovered in the training data are stored as the mining model content. The actual data from the training dataset is not stored in the target server database. Figure 4.27 is a screenshot of the content detail after processing the cardio mining model. 65
    • Figure 4.27 Screenshot of the cardio mining model using Microsoft Decision Trees Algorithm 66
    • CHAPTER V DISCUSSIONS & FUTURE WORKS This chapter summarizes the main contributions and conclusions of this thesis regarding the data analysis components with OLAP solutions in the Microsoft SQL server 2000 system. Moreover, this chapter also addresses some future works based on the current work. 5.1 Contributions and Evaluations The main purpose of this thesis is to develop the data analysis components as the foundation for developers to build user-friendly interface applications for OLAP solutions. These analysis components also can be used to hide the complexity challenge and the heavy technological terms from the non-technical users in the process of building the OLAP cubes and mining model system. Our contributions are summarized as follows: Detailed review the functionalities of Analysis Manager in the process of building and viewing of the OLAP cubes as well as of building the data mining models Development of the data analysis components for OLAP solutions 67
    • Development of the desktop stand-alone interfaces implemented with the component cubeBuilder and DMBuilder Applying the case study of cardio disease dataset with the user-specific application which is implemented the data analysis components developed in the current work Development of the web-based interface application implemented with the component cubeBrowser. This web-based interface is also used to browse the cardio cube data, which is created in the current work. In addition to these contributions, the detailed reviews of the functionalities of Analysis Manager in the process of building and viewing of the OLAP cubes as well as of building the data mining models are also included in this thesis [5, 7]. Both Microsoft Analysis Manager and the analysis components can perform following tasks for OLAP solutions: A. Creating the database objects and specifying the data sources in the Analysis server B. Building and processing the OLAP cubes C. Creating and processing the data mining models D. Specifying the storage options and optimizing the query performance E. Browsing the cube data Although Analysis Manager provides wizards and editors to help users to build and to process the OLAP cubes and the mining models, but the technical terms and the fully understanding of the underlying structure still become the barrier for user to use these tools efficiently. In addition, the analysis components can help the developers in 68
    • designing the user-friendly interface application to hide the technical complexities from the non-technical users. In addition, the analysis components offer the potential to assemble applications much more rapidly and efficiently. A key to developing applications quickly is the ability to reuse the existing pre-built components to meet the user’s application requirements [6]. Analysis Manager installs a PivotTable service on the database server, which includes an OLE DB provider that allows connecting to the OLAP data sources. The PivotTable Service is an OLE DB provider for multidimensional data and data mining operations [7, 14]. It is the primary method of communication between a client application and a multidimensional data source or data mining model. It is used to build client applications to interact the multidimensional data. It also provides methods for online and off-line data-mining analysis of multidimensional data and relational data. It offers connectivity to multidimensional cubes and data-mining models managed by the Analysis Services. The major limitation of the PivotTable Services is that it must be installed on the client machine; otherwise, the client’s PivotTable control is unable to communicate with the OLAP data sources. To overcome this limitation, the development of the data analysis component, implemented in the web-based OLAP browsing application interface, can provide re- usable business solutions and can disseminate information more effectively. The architecture that we presented here is designed to utilize several sophisticated technologies, including SQL 2000 Analysis Services, the cubeBrowser component, and ASP.NET, to the best of their capacities. 69
    • 5.2 Future Works This research developed the data analysis components for the OLAP solutions and the mining model systems. It also demonstrated their functionalities with the cardio databases. However, the analysis component for viewing the data of mining model is not developed as well as its implementation. The development of the component in visualizing the mining model will be the future work we are faced with. The new release version, Microsoft SQL server 2005, enhances many features of Business Intelligence and also builds complex business analytics with Analysis Services [25, 26]. In addition, ADOMD.NET uses the XML for analytical protocol to communicate with the analytical data source [27]. More works will be needed to develop the data analysis components to use the new features of the Microsoft SQL server 2005 and ADOMD.NET to provide a user-friendly interface for OLAP solutions. In addition to unload the design burden from the developers, these analysis components are able to benefit the end users to navigate a rich, complex data set with a higher degree confidence in analysis. . 70
    • BIBLIOGRAPHY [1]. Mailvaganam, H. 2004. “Introduction to OLAP: Slice Dice and Drill”. Retrieved August 22, 2005 from http://www.dwreview.com/OLAP/Introduction_OLAP.html. [2]. The OLAP Council. OLAP and OLAP Server definitions. Retrieved August, 2005 from http://altaplana.com/olap/glossary.html. [3]. Thearling, K. 1995. “From Data Mining to Database Marketing”, Data Intelligence Group. [4]. Thearling, K. 2000. An Introduction to Data Mining: Discovering hidden value in your data warehouse. Retrieved August 18, 2005 from http://www.thearling.com/text/dmwhite/dmwhite.htm. [5]. Pearson, W. 2002. “Introduction to SQL server 2000 Analysis Services-Creating our first cube”. http://www.databasejournal.com/feature/mssql/article.php/1429671. [6]. OLAP Train and Jacobson, R. 2000. Microsoft SQL Server 2000 Analysis Services Step by Step. Microsoft Press. [7]. Garcia, L. 2003. “Understanding Microsoft SQL Server 2000 Analysis Services”. http://www.phptr.com/articles/article.asp. [8]. Bertucc, P. 2002. Microsoft SQL Server Analysis Services. Microsoft® SQL Server 2000 Unleashed, Second Edition. Chapter 42, 1347-1392. [9]. Soni, S.; Kurtz, W. 2001. “Analysis Services: optimizing cube performance using Microsoft SQL server 2000 Analysis Services”. Retrieved April, 2005 from http://msdn.microsoft.com/library/en-us/dnsql2k/html/olapunisys.asp. [10]. de Ville, B. 2001. “Data Mining in SQL server 2000”. SQL Server Magazine http://www.windowsitpro.com/SQLServer/Article/ArticleID/16175/16175.html. [11]. Charran, E. 2002. “Introduction to Data Mining with SQL server”. Retrieved August, 2005 from http://www.sql-server-performance.com/ec_data_mining.asp. 71
    • [12]. Rae, S. 2005. “Building intelligent .NET applications: Data-Mining predictions”. http://www.awprofessional.com/articles/article.asp. [13]. Data Mining: http://www.megaputer.com/dm/dm101.php3. [14]. Microsoft OLE DB Programmer's Reference: http://msdn.microsoft.com/library. [15]. Brust, A. J. 1999. “Put OLAP and ADO MD to Work”. VBPJ, November 1999 Issue. 94-97. [16]. Youness, S. 2000. “Using MDX and ADOMD to access Microsoft OLAP data”. http://www.topxml.com/conference/wrox/2000_vegas/text/sakhr_olap.pdf. [17]. Whitney, R. 2002. “Collaboration through DSO”. http://www.windowsitpro.com/SQLServer/Article/ArticleID/26564/26564.html. [18]. Frank C Rice (2002) Programming OLAP databases from Microsoft Access- Using DSO. http://msdn.microsoft.com/library/default. [19]. Microsoft OLE DB programmer's reference: http://msdn.microsoft.com/library. [20]. Nolan, C. 1999. “Manipulate and Query OLAP Data Using ADOMD and MDX - Part I". Microsoft System Journal, August, 1999. [21]. Nolan, C. 1999. “Manipulate and Query OLAP Data Using ADOMD and MDX - Part II". Microsoft System Journal, September, 1999. [22]. Pearson, W. 2002. “MDX in Analysis Services”. Retrieved December, 2004 from http://www.databasejournal.com/features/mssql/article.php/1495511. [23]. Pearson, W. 2002. “MDX Essentials”. Retrieved December, 2004 from http://www.databasejournal.com/features/mssql/article.php/1550061. [24]. Heart Disease database. http://www.ics.uci.edu/~mlearn/MLSummary.html. [25]. Frawley, M. 2004. “Analysis Services Comparison: SQL 2000 vs. 2005”. Retrieved October, 2005 from http://www.devx.com/dbzone/Article/21539. [26]. Utley, C. 2005. “Solving Business Problems with SQL Server 2005 Analysis Services”. Retrieved January, 2006 from http://www.microsoft.com/technet/prodtechnol/sql/2005/solvngbp.mspx. [27]. Analysis Services Data Access Interfaces: ADOMD.NET Client Programming. Retrieved January, 2006 from http://msdn2.microsoft.com/en-us/ library/ms123483.aspx. 72
    • APPENDICES 73
    • APPENDIX A DATASET USED FOR CASE STUDIES The database used in this work is downloaded from the web site of the Repository of machine learning databases [24]. This heart-disease directory contains four databases concerning the heart disease diagnosis. The data was collected from the following locations: 1. Cleveland Clinic Foundation (Cleveland.data) 2. Hungarian Institute of Cardiology, Budapest (Hungarian.data) 3. V. A. Medical Center, Long Beach, Ca (long-beach-va.data) 4. University Hospital, Zurich, Switzerland (Switzerland.data) Each database has the same instance format. While the databases have seventy-six raw attributes, but all published experiments refer to using a subset of fourteen of them. The authors of the databases have requested that any publications resulting from the use of the data include the names of the principal investigator responsible for the data collection. They would be: A. Creators: 1. Hungarian Institute of Cardiology, Budapest: Andras Janosi, M. D. 2. University Hospital, Zurich, Switzerland: William Steinbrunn, M.D. 3. University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D. 4. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D. B. Donors: David W. Aha (aha@ics.uci.edu) Date: July, 1988. 74
    • C. Attributes a. Age: age in years b. Sex: gender ( 1 = male; 0 = female) c. cp: chest pain type Value 1: typical angina 2: atypical angina 3: non-angina pain 4: asymptomatic d. tresbps: resting blood pressure (in mm Hg on admission to the hospital) e. chol: serum cholesterol in mg/dl f. fbs: fasting blood sugar > 120 mg/dl (1 = true; 0 = false) g. restecg: resting electrocardiographic results Value 1. 0: normal 2. 1: Having ST/T wave abnormality 3. 2: showing probable or definite left ventricular hypertrophy by 4. Estes’ criteria h. thalach: maximum heart rate achieved i. exang: exercise induced angina (1 = yes; 0 = no) j. oldpeak: ST depression induced by exercise relative to rest k. slope: the slope of the peak existence ST segment l. ca: number of major vessels (0-3) colored by fluoroscopy m. thal: 3= normal; 6= fixed defect; 7= reversible defect n. num (prediction attribute): diagnosis: 0 is healthy, 1, 2, 3, 4 are sick. 75
    • APPENDIX B APPLICATION INTERFACE OF OLAP CUBE BUILDER Figure B.1 Screenshot of the OLAP cube builder interface for the power users 76
    • APPENDIX C SOURCE CODE OF CUBBUILDER This section consists of the source code for the analysis component, cubeBuilder, which was written in the Visual Basic.NET programming language. ‘Visual Basic.NET source code Public Class CubeBuilder ‘Declarations Public DataServer As New DSO.Server() Public DataSource As DSO.DataSource() Public DataProj As DSO.MDStore() Public Provider As String Public DataPath As String Public SerName As String ‘Initializations Sub New() End Sub Sub New(ByVal inServ As String, ByVal inProv As String, ByVal inPath As String) SerName = inServ Provider = inProv DataPath = inPath End Sub 77
    • ‘Class Property Public ReadOnly Property server() Get Return DataServer End Get End Property Public Property DSProvider() As String Get Return Provider End Get Set(ByVal Value As String) Provider = Value End Set End Property Public Property DataProject() Get Return DataProj End Get Set(ByVal Value) DataProj = Value End Set End Property ‘Connects to the specified server Public Sub ConnectToServer(ByVal servName As String, ByRef serv As DSO.Server) serv.Connect(servName) End Sub ‘Closes the connection to the server Public Sub CloseServerConnect(ByRef inServer As DSO.Server) inServer.CloseServer() End Sub 78
    • ‘Checking the validation status of a server Public Function ServerValid(ByRef serv As DSO.Server) As Boolean If serv.IsValid Then Return True Else Return False End If End Function ‘Finding the target database object in the server Public Function FindDataProj(ByVal db As String, ByRef dServ As DSO.Server) As Boolean If dServ.MDStores.Find(db) Then Return True Else Return False End If End Function ‘Adding new database object to the server Public Function AddNewDataProj(ByVal db As String, ByRef dServ As DSO.Server) As DSO.MDStore Return dServ.MDStores.AddNew(db) End Function ‘Setting Database object Public Function SetDataProj(ByVal db As String, ByRef dServ As DSO.Server) As DSO.MDStore Return dServ.MDStores.Item(db) End Function 79
    • ‘Searching the specified data source Public Function FindDataSource(ByVal ds As String, ByRef dDB As DSO.MDStore) As Boolean If dDB.DataSources.Find(ds) Then Return True Else Return False End If End Function ‘Adding new Data Source Public Function AddNewDataSource(ByVal ds As String, ByRef dDB As DSO.MDStore) As DSO.DataSource Return dDB.DataSources.AddNew(ds) End Function ‘Setting data source Public Function SetDataSource(ByVal ds As String, ByRef dDB As DSO.MDStore) As DSO.DataSource Return dDB.DataSources.Item(ds) End Function ‘Get Data link connection Public Function GetDataLink(ByVal p As String, ByVal dp As String) As String Dim str As String Dim info As String = ";Persist Security Info=False;Jet OLEDB:SFP=True;""" str = "Provider=" & p & ";Data Source=" & dp & ";Persist Security Info=False;Jet OLEDB:SFP=True;" Return str End Function 80
    • ‘Setting data link of data source Public Sub SetLinkDataSource(ByVal dLink As String, ByRef ds As DSO.DataSource) ds.ConnectionString = dLink ds.Update() End Sub ‘Creating Database Connection Public Function CreateDbaseDimension(ByRef dDbase As DSO.MDStore, ByRef dataSrc As DSO.DataSource, ByVal strDim As String, ByVal strDescr As String ByVal strFromClause As String, ByVal strJoin As String, ByVal strDimType as String) As DSO.Dimension Dim dsoNewDim As DSO.Dimension dsoNewDim = dDbase.Dimensions.AddNew(strDim) dsoNewDim.DataSource = dataSrc dsoNewDim.Description = strDescr dsoNewDim.FromClause = strFromClause dsoNewDim.JoinClause = strJoin dsoNewDim.DimensionType = strDimType Return dsoNewDim End Function ‘Adding Level to the Dimension table Public Sub AddLeveltoDim(ByRef dsoDim As DSO.Dimension, ByVal levStr As String, ByVal strDimtbl As String, ByVal ColumnStr As String, ByVal ColType As Short, ByVal colSize As Integer, ByVal EstSize As Integer) Dim dsoLev As DSO.Level Dim strKeyColumn As String dsoLev = dsoDim.Levels.AddNew(levStr) strKeyColumn = """" & strDimtbl & """" & "." & """" & ColumnStr & """" dsoLev.MemberKeyColumn = strKeyColumn dsoLev.ColumnType = ColType dsoLev.ColumnSize = colSize 81
    • dsoLev.EstimatedSize = EstSize dsoDim.Update() End Sub ‘ Alternative method for adding level to the dimension table Public Sub AddLeveltoDim1(ByRef dsoDim As DSO.Dimension, ByVal levStr As String, ByVal strDimtbl As String, _ ByVal ColumnStr As String, ByVal ColType As String) Dim dsoLev As DSO.Level Dim strKeyColumn As String dsoLev = dsoDim.Levels.AddNew(levStr) strKeyColumn = """" & strDimtbl & """" & "." & """" & ColumnStr & """" dsoLev.MemberKeyColumn = strKeyColumn dsoLev.ColumnType = CShort(ColType) dsoLev.ColumnSize = 255 dsoLev.EstimatedSize = 1 dsoDim.Update() End Sub ‘Adding new cube to the database object Public Function AddNewCube(ByRef dSer As DSO.Server, ByVal dDB As String, ByVal DtSrc As String, ByVal dtCube As String) As DSO.MDStore Dim dsoCube As DSO.MDStore dsoCube = dSer.MDStores.Item(dDB).MDStores.AddNew(dtCube) dsoCube.DataSources.AddNew(DtSrc) dsoCube.Update() Return dsoCube End Function ‘Adding fact table to the cube Public Sub AddFactTblToCube(ByRef inCube As DSO.MDStore, ByVal strFactTblName As String) inCube.SourceTable = strFactTblName inCube.EstimatedRows = 100000 End Sub 82
    • ‘Adding shared dimension to the cube Public Sub AddShareDDimToCube(ByRef inCube As DSO.MDStore, ByVal strDimName As String) inCube.Dimensions.AddNew(strDimName) inCube.Update() End Sub ‘Adding measure to the cube Public Sub AddMeasureToCube(ByRef inCube As DSO.MDStore, ByVal inMeaText As String, ByVal inDescr As String, ByVal factTbl As String, ByVal inField As String) Dim dsoMeasure As DSO.Measure dsoMeasure = inCube.Measures.AddNew(inMeaText) dsoMeasure.Description = inDescr dsoMeasure.SourceColumn = "" & factTbl & "." & "" & inField & "" dsoMeasure.SourceColumnType = ADODB.DataTypeEnum.adDouble dsoMeasure.AggregateFunction = DSO.AggregatesTypes.aggSum inCube.Update() End Sub ‘ Processing the cube Public Sub ProcessCube(ByRef iCube As DSO.MDStore) iCube.Process(DSO.ProcessTypes.processFull) End Sub End Class 83
    • APPENDIX D SOURCE CODE OF CUBEBROWSER This section consists of the source code for the analysis component, cubeBrowser, which was written in the Visual Basic.NET programming language. ‘Visual Basic.NET source code Public Class cubeBrowser ‘Declarations Public cbServer As String Public cbDatabase As String Public cbDBconnect As New ADODB.Connection() Public cbCellset As New ADOMD.Cellset() 'Dim conStr As String ‘Initialization Public Sub New(ByVal oSer As String, ByVal oDb As String) cbServer = oSer cbDatabase = oDb End Sub ‘Getting the connection string for Catalog object Public Function GetConCatalogString() As String Dim strTemp As String strTemp = " " strTemp = strTemp & "Provider=msolap; data source=" & cbServer strTemp = strTemp & "; Initial Catalog=" & cbDatabase & ";" 84
    • Return strTemp End Function ‘Connecting to Catalog object Public Function ConnectToCatalog(ByVal conStr As String) As Object Dim adomdCatlog As New ADOMD.Catalog() adomdCatlog.let_ActiveConnection(conStr) Return adomdCatlog End Function ‘Get Cellset connection string Public Function GetCellConnectString() As String Dim strCon As String strCon = " " strCon = strCon & "Provider=msolap; data source=" & cbServer strCon = strCon & "; database=" & cbDatabase & ";" Return strCon End Function ‘Getting connection to the cellset Public Function GetConnectToCell(ByVal olapDb As ADODB.Connection) As Object cbCellset.ActiveConnection = olapDb Return cbCellset End Function ‘Connecting to Database object Public Function ConnectToDB(ByVal oS As String) As Object cbDBconnect.Open(oS) Return cbDBconnect End Function ‘Connecting to the cube Public Function ConnectToCube(ByVal oStr As String, ByVal oMdx As String) As Object cbDBconnect.Open(oStr) cbCellset.ActiveConnection = cbDBconnect 85
    • cbCellset.Open(oMdx) Return cbCellset End Function ‘Displaying the Cellset I Public Sub ViewCubeStruct(ByRef lstBox As Object, ByRef inCat As Object, ByVal inCubeName As String) Dim cbDef As ADOMD.CubeDef Dim cbDim As ADOMD.Dimension Dim strDim As String Dim cbHir As ADOMD.Hierarchy Dim strLevel As String Dim cbLev As ADOMD.Level Dim strTemp As String cbDef = inCat.cubeDefs(inCubeName) strTemp = "Cube: " & inCubeName lstBox.Items.Add(strTemp) strTemp = " " For Each cbDim In cbDef.Dimensions strDim = " " strDim = " -Dimension: " & cbDim.Name lstBox.Items.Add(strDim) For Each cbHir In cbDim.Hierarchies For Each cbLev In cbHir.Levels strLevel = " -- " & cbLev.Name lstBox.Items.Add(strLevel) Next Next Next End Sub ‘Displaying the Cellset II Public Sub ViewCubeStruct(ByRef lstBox As Object, ByRef inCat As Object) Dim cbDef As ADOMD.CubeDef Dim cbDim As ADOMD.Dimension Dim strDim As String Dim cbHir As ADOMD.Hierarchy Dim strLevel As String 86
    • Dim cbLev As ADOMD.Level Dim strTemp As String For Each cbDef In inCat.CubeDefs strTemp = "Cube: " & cbDef.Name lstBox.Items.Add(strTemp) strTemp = " " For Each cbDim In cbDef.Dimensions strDim = " " strDim = " -Dimension: " & cbDim.Name stBox.Items.Add(strDim) For Each cbHir In cbDim.Hierarchies For Each cbLev In cbHir.Levels strLevel = " -- " & cbLev.Name lstBox.Items.Add(strLevel) Next Next Next Next End Sub ‘Close the object connection Public Sub CloseConnection(ByVal iConn As Object) iConn.Close() End Sub End Class 87
    • APPENDIX E SOURCE CODE OF DMBUILDER This section consists of the source code for the analysis component, DMBuilder, which was written in the Visual Basic.NET programming language. ‘Visual Basic.NET source code Public Class clsBuildMiningModel ‘Declarations Public dsoCol As DSO.Column ‘Initialization Public Sub New() End Sub ‘Clearing the object Public Sub ClearObject(ByRef inObj As Object) inObj = Nothing End Sub ‘Connecting to the Server Public Sub ConnectToServer(ByVal strSer As String, ByRef ser As DSO.Server) ser = New DSO.Server() ser.Connect(strSer) End Sub 88
    • ‘Closing the server connection Public Sub CloseServerConnection(ByRef s As DSO.Server) s.CloseServer() End Sub ‘Checking the Server connection status Public Function IsServerConnect(ByRef ser As DSO.Server) As Boolean If ser.IsValid Then Return True Else Return False End If End Function ‘Checking the target object’s status Public Function IsExistingModel(ByRef db As DSO.MDStore, ByVal strName As String) As Boolean If db.MiningModels.Item(strName) Is Nothing Then Return False Else Return True End If End Function ‘Checking the target cube’s status Public Function IsValidCube(ByRef db As DSO.MDStore, ByVal sCube As String) As Boolean If db.MDStores.Find(sCube) Then Return True Else Return False End If End Function ‘Adding a new mining model 89
    • Public Sub AddNewMiningModel(ByRef db As DSO.MDStore, ByVal mName As String, ByVal dtType As DSO.SubClassTypes, ByRef dMM As DSO.MiningModel) dMM = db.MiningModels.AddNew(mName, dtType) End Sub ‘Adding a new model role Public Sub AddNewMMRole(ByRef dmm As DSO.MiningModel, ByVal rName As String, ByRef dRole As DSO.Role) dRole = dmm.Roles.AddNew(rName) End Sub ‘Setting the properties of the target mining model Public Sub SetModelProperty(ByRef dmm As DSO.MiningModel, ByVal dtSrc As String, ByVal mDescr As String, ByVal dtType As DSO.SubClassTypes, ByVal mmAlgo As String, ByVal srcCube As String, ByVal cDim As String, ByVal mTrainQ As String) With dmm .DataSources.AddNew(dtSrc, DSO.SubClassTypes.sbclsOlap) .Description = mDescr .MiningAlgorithm = mmAlgo .SourceCube = srcCube .CaseDimension = cDim .TrainingQuery = mTrainQ .Update() End With End Sub ‘Enabling the column’s property Public Sub EnableColumnProperty(ByRef dmm As DSO.MiningModel, ByVal strCol As String, ByVal CheckFlag As Boolean, ByVal InputSelect As Boolean, ByVal PredictSelect As Boolean) 90
    • dsoCol = dmm.Columns.Item(strCol) If CheckFlag = True Then dsoCol.IsInput = InputSelect dsoCol.IsPredictable = PredictSelect End If dsoCol.IsDisabled = False End Sub ‘Saving the target mining model Public Sub SaveMiningModel(ByRef dMM As DSO.MiningModel) dMM.LastUpdated = Now dMM.Update() End Sub ‘Processing the mining model Public Sub ProcessMiningModel(ByRef dsoDMM As DSO.MiningModel, ByRef dsoLockType As DSO.OlapLockTypes, _ ByRef dsoLockDescr As String, ByVal prcType As DSO.ProcessTypes) With dsoDMM .LockObject(dsoLockType, dsoLockDescr) .Process(DSO.ProcessTypes.processFull) .UnlockObject() End With End Sub End Class 91