Your SlideShare is downloading. ×
Group2.doc.doc
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Group2.doc.doc

1,090
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,090
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
36
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ISM3610 Decision Support and Intelligence System Data Warehousing By Group B Chan Chi Leung (03012034) Chan Wing Sze (03012077) 1
  • 2. Cheung Helios Su Ho (03012107) Fong Yau Shing (03000923) Kong Kevin Tsz Wang (03012239) Lau Ka Wing (03012255) Pong Shuk Ting (04001737) Wong Chi Ho (03012468) 19th April, 2007 2
  • 3. Table of Contents 1. Introduction....................................................................................................................6 1.1 What is data warehouse?....................................................................................6 1.2 Construction .......................................................................................................9 1.3 Data Acquisition and Collection ........................................................................10 1.4 Metadata ..........................................................................................................10 1.5 Data Marts ........................................................................................................11 1.6 Trustworthiness and Security ...........................................................................11 2. Characteristics of a Data Warehouse..........................................................................13 Subject Oriented ...............................................................................13 2.1 Integrated..........................................................................................................13 2.2 Time Variant......................................................................................................14 2.3 Non-Volatile.......................................................................................................15 3. Data Warehouse Architecture......................................................................................16 Operational Database / External Database Layer............................17 3.1 Information Access Layer..................................................................................18 3.2 Data Access Layer............................................................................................19 3.3 Data Warehouse (Physical) Layer.....................................................................20 3.4 Application Messaging Layer............................................................................21 3
  • 4. 3.5 Process Management Layer.............................................................................21 3.6 Data Directory (Metadata) Layer.......................................................................21 Data Staging Layer............................................................................22 4. Examples on Data Warehousing Vendors...................................................................23 IBM....................................................................................................23 4.1.1 Introduction to IBM................................................................................23 4.1.2 Features of DB2 Data Warehouse Edition.............................................23 4.1.3 Advantages............................................................................................26 4.1.4 Disadvantages.......................................................................................26 4.1.5 Application of DB2 DWE in Copenhagen’s TDC....................................27 Oracle................................................................................................31 4.1.6 Introduction to Oracle............................................................................31 4.1.7 Features of Oracle Data Warehousing..................................................33 4.1.8 Advantages............................................................................................37 4.1.9 Disadvantages.......................................................................................38 4.1.10 Application of Oracle Datawarehousing in Absa Group Limited...........40 4.2 SAS ..................................................................................................................41 4.2.1 Introduction to SAS................................................................................41 4.2.2 Features of SAS Warehousing Administrator........................................43 4
  • 5. 4.2.3 Advantages............................................................................................47 4.2.4 Disadvantages.......................................................................................48 4.2.5 Application of SAS Data Warehousing in the HK Trade Development Council...........................................................................................................49 5. How to implement Data Warehouse successfully........................................................51 5.1 “If you Built It, They Will Come”.........................................................................51 5.2 Omission of an Architectural Framework...........................................................51 5.3 Understanding the Importance of Documenting Assumptions...........................52 5.4 Failure to Use the Right Tool for the Job...........................................................52 5.5 Life Cycle Abuse...............................................................................................53 5.6 Ignorance Concerning the Resolution of Data Conflicts....................................53 Failure to Learn from Mistakes..........................................................54 6. Concerns & Conclusion...............................................................................................55 7. References..................................................................................................................56 5
  • 6. 1. Introduction Since the early 1990s, data warehouses have been at the forefront of information technology applications as a way for organizations to effectively use digital information for business planning and decision making. As information professionals, we no doubt will encounter the data warehouse phenomenon if we have not already been exposed to it in our work. Hence, an understanding of data warehouse system architecture is or will be important in our roles and responsibilities in information management. 1.1 What is data warehouse? Simply saying, a data warehouse could be thought of as a place for secondhand data that originates in either other corporate applications, such as the one our company uses to solve printer problems that are reported from customers, and our front and second line support staff, or some other data source external to our company, such as a public database that contains customer support information gathered from our competitors. Technically, a data warehouse is the coordinated, architected, and periodic copying of data from various sources, both inside and outside the enterprise, into an environment optimized for analytical and informational 6
  • 7. processing. The key here is that the data is copied (duplicated) in a controlled manner and is copied periodically (batch-oriented processing). Data warehousing is also, therefore, the process of creating an architected information-management solution to enable analytical and informational processing despite platform, application, organizational, and other barriers. The key concept here is that barriers are being broken and distributed information is being consolidated for analysis, although no preconceived notion exists for the exact means of doing so, such as duplicating data. As we all know that, large companies use software packages that gather and store data in special configurations called data warehouses. Since a data warehouse is an integrated collection of data it can support management analysis and decision making. For example, in a typical company, data is generated by transaction-based systems, such as order entry, inventory, accounts receivable, and payroll. If a user wants to know the customer number on a particular sales order, they can retrieve the data easily from the order entry system application. On the other hand, suppose that a user wants to see May sales results for the sales representative assigned to a specific customer, as shown in the figure 1 for a typical data warehouse. 7
  • 8. Figure 1 – Typical Data Warehouse Although the information systems are interactive, it is difficult for a user to extract specific data that spans several systems and time frames; the average user might need assistance from the IT staff. What's nice about a data warehouse is that rather then accessing separate systems, the data warehouse stores transaction data in a format that allows users to access, combine, and analyze the data. Again, this should help in taming and controlling data volume. A data warehouse allows users to specify certain dimensions, or characteristics. In a 8
  • 9. consumer products data warehouse, dimensions might include time, customer, and sales representative. By selecting values for each characteristic, a user can obtain multidimensional information from the stored data. Data warehousing is also a collection of decision support technologies, aimed at enabling the knowledge worker, who could be an executive, manager, or analyst to make better and faster decisions. 1.2 Construction The steps in planning a data warehouse are identical to the steps for any other type of computer application. Users must be involved to determine the scope of the warehouse and what business requirements need to be met. After selecting a focus area, for example, analyzing the use of state government records over time, a data warehouse team of business users and information professionals compiles a list of different types of data that should go into the warehouse. After business requirements have been gathered and validated, data elements are organized into a conceptual data model. The conceptual model is used as a blueprint to develop a physical database design. As in all systems design projects, there are a number of iterations, prototypes, and technical decisions that need to be made between the steps of systems analysis, design, development, implementation, and support. 9
  • 10. 1.3 Data Acquisition and Collection The data warehouse team must determine what data should go into the warehouse and where those particular pieces of information can be found. Some of the data will be internal to an organization. In other cases, it can be obtained from another source. Another team of analysts and programmers create extraction programs to collect data from the various databases, files, and legacy systems that have been identified, copying certain data to a staging area outside of the warehouse. At this point, they ensure that the data has no errors, and then copy it all into the data warehouse. This source data extraction, selection, and transformation process is unique to data warehousing. Source data analysis and the efficient and accurate movement of source data into the warehouse environment are critical to the success of a data warehouse project. 1.4 Metadata Good metadata is essential to the effective operation of a data warehouse and it is used in data acquisition/collection, data transformation, and data access. Acquisition metadata maps the translation of information from the operational system to the analytical system. This includes an extract history describing data origins, updates, algorithms used to summarize data, and frequency of extractions from operational systems. Transformation metadata includes a history of data transformations, 10
  • 11. changes in names, and other physical characteristics. Access metadata provides navigation and graphical user interfaces that allow non-technical business users to interact intuitively with the contents of the warehouse. And on top of these three types of metadata, a warehouse needs basic operational metadata, such as procedures on how a data warehouse is used and accessed, procedures on monitoring the growth of the data warehouse relative to the available storage space, and authorizations on who is responsible for and who has access to the data in the data warehouse and data in the operational system. 1.5 Data Marts Data in a data warehouse should be reasonably current, but not necessarily up to the minute, although developments in the data warehouse industry have made frequent and incremental data dumps more feasible. Data marts are smaller than data warehouses and generally contain information from a single department of a business or organization. The current trend in data warehousing is to develop a data warehouse with several smaller related data marts for specific kinds of queries and reports. 1.6 Trustworthiness and Security As with any information system, trustworthiness of data is determined by 11
  • 12. the trustworthiness of the hardware, software, and the procedures that created them. The reliability and authenticity of the data and information extracted from the warehouse will be a function of the reliability and authenticity of the warehouse and the various source systems that it encompasses. In data warehouse environments specifically, there needs to be a means to ensure the integrity of data first by having procedures to control the movement of data to the warehouse from operational systems and second by having controls to protect warehouse data from unauthorized changes. Data warehouse trustworthiness and security are contingent upon acquisition, transformation and access metadata and systems documentation. 12
  • 13. 2. Characteristics of a Data Warehouse This part focuses on the fundamental characteristics of a data warehouse. Bill Inmon, is recognized as “father of data warehousing”, has defined data warehousing as a database containing Subject Oriented, Integrated, Time Variant and Non-volatile information used to support the decision making process (Martyn R Jones,1999). The following will explain these four fundamental characteristics of data warehouse. Subject Oriented Operational databases, such as order processing and payroll databases, are organized around business processes or functional areas. These databases grew out of the applications they served. Thus, the data was relative to the order processing application or the payroll application. Data on a particular subject, such as products or employees, was maintained separately (and usually inconsistently) in a number of different databases. In contrast, a data warehouse is organized around subjects. This subject orientation presents the data in a much easier-to-understand format for end users and non-IT business analysts. 2.1 Integrated Integration of data within a warehouse is accomplished by making the 13
  • 14. data consistent in format, naming, and other aspects. Operational databases, for historic reasons, often have major inconsistencies in data representations. For example, a set of operational databases may represent "male" and "female" by using codes such as "m" and "f", by "1" and "2", or by "b" and "g". Often, the inconsistencies are more complex and subtle. In a data warehouse, on the other hand, data is always maintained in a consistent fashion. 2.2 Time Variant Data warehouses are time variant in the sense that they both maintain historical and (nearly) current data. Operational databases, in contrast, contain only the most current, up-to-date data values. Furthermore, they generally maintain this information for no more than a year (and often much less). In contrast, data warehouses contain data that is generally loaded from the operational databases daily, weekly, or monthly which is then typically maintained for a period of 3 to 10 years. This is a major difference between the two types of environments. Historical information is of high importance to decision makers, who often want to understand trends and relationships between data. For example, the product manager for a Liquefied Natural Gas soda drink may want to see the relationship between coupon promotions and sales. This is information that is almost impossible - and certainly in most cases not cost effective - to determine with an operational database. 14
  • 15. 2.3 Non-Volatile Non-volatility, the final primary aspect of data warehouses, means that after the data warehouse is loaded there are no changes, inserts, or deletes performed against the informational database. The data warehouse is, of course, first loaded with transformed data that originated in the operational databases. The data warehouse is subsequently reloaded or, more likely, appended on a periodic basis (usually nightly, weekly, or monthly) with new transformed data from the operational databases. Outside of this loading process, the data warehouse generally stays static. Due to non-volatility, the data warehouse can be heavily optimized for query processing. 15
  • 16. 3. Data Warehouse Architecture A Data Warehouse Architecture (DWA) is a way of representing the overall structure of data, communication, processing and presentation that exists for end-user computing within the enterprise. The architecture is made up of several components:  Operational Database / External Database Layer  Information Access Layer  Data Access Layer  Data Warehouse Layer  Application Messaging Layer  Process Management Layer  Data Directory (Metadata) Layer  Data Staging Layer The figure below shows how the different layers are inter-connected together. 16
  • 17. Figure 2 – Data Warehouse Architecture Operational Database / External Database Layer Operational systems process data to support critical business operational needs. Operational databases have been created to provide an efficient processing structure for a relatively small number of well-defined business transactions. However, because of the limited implementation of operational systems, the databases designed to support operational systems have difficulty accessing the data for other management or informational purposes. This difficulty in accessing operational data is amplified by the fact that many operational systems are often very old in age. This means that the data access technology available to obtain operational data itself is dated. The goal of data warehousing is to free the information that is locked up in 17
  • 18. the operational databases and to mix it with information from other external sources of data. Nowadays, many large organizations are acquiring additional data from outside databases. This information includes demographic, economic, competitive and purchasing trends. The so-called "information superhighway" is providing access to more data resources every day. 3.1 Information Access Layer The Information Access layer of the Data Warehouse Architecture is the layer that the end-user deals with directly. In particular, it represents the tools that the end-user normally uses day to day, e.g., Excel, Lotus 1-2-3, Access, SAS, etc. This layer also includes the hardware and software involved in displaying and printing reports, spreadsheets, graphs and charts for analysis and presentation. Over the past two decades, the Information Access layer has expanded enormously, especially as end- users have moved to PCs and PC/LANs. Today, more and more sophisticated tools exist on the desktop PC for manipulating, analyzing and presenting data; however, there are significant problems in making the raw data contained in operational systems available easily to end-user tools. One of the key problems is to find a common data language that can be used throughout the enterprise. 18
  • 19. 3.2 Data Access Layer The Data Access Layer is involved with allowing the Information Access Layer to communicate to the Operational Layer. Today the common data language that has emerged is SQL. Originally, SQL was developed by IBM as a query language, but over the last twenty years has become the standard for data interchange. One of the key breakthroughs of the last few years has been the development of a series of data access "filters" such as Enterprise Data Access (EDA)/SQL that make it possible for SQL to access nearly all DBMSs and data file systems, relational or non-relational. These filters make it possible for Information Access tools to access data stored on database management systems that are even twenty years old. The Data Access Layer not only spans different DBMSs and file systems on the same hardware, it spans manufacturers and network protocols as well. One of the keys to a Data Warehousing strategy is to provide end- users with "universal data access". Universal data access means that, theoretically, end-users, regardless of location or Information Access tool, should be able to access any or all of the data in the enterprise that is necessary for them. In some cases, this is all that certain end-users need. However, in 19
  • 20. general, organizations are developing a much more sophisticated scheme to support Data Warehousing. 3.3 Data Warehouse (Physical) Layer The core Data Warehouse is where the actual data used for informational uses occurs. In some cases, one can think of the Data Warehouse simply as a logical or virtual view of data. In many instances, the data warehouse may not actually involve storing data. In a Physical Data Warehouse, copies, in some cases many copies, of operational and or external data are actually stored in a form that is easy to access and is highly flexible. Increasingly, Data Warehouses are stored on client/server platforms, but they are often stored on main frames as well. 20
  • 21. 3.4 Application Messaging Layer The Application Message Layer has to do with transporting information around the enterprise computing network. Application Messaging is also referred to as "middleware", but it can involve more than just networking protocols. Application Messaging for example can be used to isolate applications, operational or informational, from the exact data format. Application Messaging can also be used to collect transactions or messages and deliver them to a certain location at a certain time. Application Messaging is the transport system underlying the Data Warehouse. 3.5 Process Management Layer The Process Management Layer is involved in scheduling the various tasks that must be completed to build and maintain the data warehouse and data directory information. The Process Management Layer can be regard as the scheduler or the high-level job controller for the many processes that must be done to keep the Data Warehouse up-to-date. 3.6 Data Directory (Metadata) Layer In order to provide for universal data access, it is necessary to maintain some form of data directory or repository of meta-data information. Meta- 21
  • 22. data is the data about data within the enterprise. Record descriptions in a COBOL program are meta-data. So are DIMENSION statements in a FORTRAN program, or SQL Create statements. In order to have a fully functional warehouse, it is necessary to have a variety of meta-data available, data about the end-user views of data and data about the operational databases. Ideally, end-users should be able to access data from the data warehouse without having to know where that data resides or the form in which it is stored. Data Staging Layer The final component of the Data Warehouse Architecture is Data Staging. Data Staging is also called copy management or replication management. Actually, it includes all the processes necessary to select, edit, summarize, combine and load data warehouse and information access data from operational and/or external databases. Data Staging often involves complex programming, but increasingly data warehousing tools provide help in this process. Data Staging may also involve data quality analysis programs and filters that identify patterns and data structures within existing operational data. 22
  • 23. 4. Examples on Data Warehousing Vendors As IBM, Oracle, and SAS are the famous software vendors. Also, their data warehouse technology that provided by those vendors are widely common used by different industries. Therefore, we chose data warehouse examples from these companies. IBM 4.1.1 Introduction to IBM IBM is aligned around a single, focused business model: innovation. It takes its breadth and depth of insight on issues, processes and operations across a variety of industries, and invents and applies technology to help solve its clients' most intractable business and competitive problems. It provides different types of data warehouses for the users to deliver dynamic warehousing. One of the data warehouse is the DB2 Data Warehouse Edition (DB2 DWE). 4.1.2 Features of DB2 Data Warehouse Edition DB2 DWE integrates and simplifies the data warehouse environment to deliver all of the capabilities in order to consolidate, manage, deliver and analyze your business information. It is optimized for reporting and analysis and data are summarized and stored in a dimension-based 23
  • 24. model. It can allow the people get a good high-level understanding of what it takes to implement a successful data warehouse project in their business. It represents the IBM offering for implementing integrated Business Intelligence solutions in order to remove cost and time to facilitate the data analysis for the business. Figure 3 – Platform of IBM DB2 Warehouse 4.1.2.1 Powerful DB2 data server foundation The IBM DB2 platform is the foundation for the DB2 Warehouse solution. With its massively scalable, shared-nothing distributed architecture, DB2 9 provides high performance for mixed workload query processing against 24
  • 25. both relational and native XML data. Advanced features such as data partitioning, new row compression, multidimensional clustering and materialized query tables (MQTs) make DB2 a powerful engine for dynamic warehousing. 4.1.2.2 Do it right DB2 DWE captures new opportunities with a highly flexible, scalable data warehousing framework, and combines common design tools, advanced compression technology, inline analytics and pre-built mining capabilities. 4.1.2.3 Do it smarter It can increase the return on your data warehouse investment by choosing a high-performance, open-standards-based solution that can be rapidly implemented with reduced risk to your business. 4.1.2.4 Modeling and design tool It provides the core components to graphically model data structures, move and transform data within the data warehouse, implement online analytical process (OLAP), build and score data mining models, and finally the ability to develop embedded analytic application components. 25
  • 26. 4.1.3 Advantages DB2 DWE is a comprehensive and integrated solution to enterprise data warehouse development. It provides tools to help data warehouse administrator on designing, deploying and maintaining enterprise data warehouse. DB2 DWE’s multidimensional database provides OLAP (Online analytical processing) which allow users to view data in the system from different point of view dynamically. Users can generate statistics by specifying their own requirement in DB2 DWE. On the other hand, DB2 DWE provides advanced compression of data which lowers the cost of storing large volume of data. Benchmark reports that DB2 DWE can save 45-69 percent of disk spaces. The compression of data also reduces the read/write frequency of the storage devices. Thus, the efficiency of querying is higher than uncompressed data. 4.1.4 Disadvantages The disadvantages of using DB DWE are the high cost and the high system requirement. It costs about US $1,000 for each year license. The system hardware requirement is high because of the data compression scheme. Higher processing power is needed for both compression and 26
  • 27. decompression on data access. Typical personal computers cannot meet the requirement. Thus, a powerful server is needed. The cost of using DB2 DWE rises for additional hardware. 4.1.5 Application of DB2 DWE in Copenhagen’s TDC Here is a real case of company getting benefits from DB2 DWE. Copenhagen’s TDC, Denmark’s leading telecommunications company, can testify to the ongoing love affair of people with their telephones. Danish customers make so many calls that each month TDC has to deal with 1.5 terabytes of new raw data. The company’s information technology (IT) team realized that its existing technology system rapidly was running out of storage capability. After upgrading their system to DB2 DWE, they found that the productivity is increased by offering higher levels of performance and additional applications to internal users. Customer service is improved by offering most economical service plan based on usage. Also, the marketing is enhanced through better customer targeting and campaign tracking. TDC found that before the new system, TDC’s batch window hardly afforded enough time for the required data to be processed by the next morning. In the past, if the team lost even one day of productivity, it would take us as much as a week to catch up. Now, TDC has no problems receiving, loading and processing data by morning. For more information about the case, please refer to the URL, 27
  • 28. http://www-306.ibm.com/software/success/cssdb.nsf/CS/SPAT-6ATKAP? OpenDocument&Site=dmbi&cty=en_us . 28
  • 29. Some screen shots of DB2 DWE are shown as the following. The following figure shows how DB2 DWE works. DB2 DWE can generate reports on user’s needs. Figure 4 – Example on Reports Generation on user’s needs (DB2 DWE) 29
  • 30. If user wants a report of more detailed level, detailed reports can be show in very simple operation. Figure 5 – Example on Details Report (DB2 DWE) DB2 DWE provides an interactive platform between the system developers and the users. When the developers make some changes to the system, message is prompted to users to tell them what have been changed. 30
  • 31. Figure 6 – Example on Prompt Message (DB2 DWE) On users’ need, some criteria can be set to the reports. The data entry which conflicts with the criteria would be highlighted. This makes user easier to notice the characteristics of the data. Oracle 4.1.6 Introduction to Oracle Oracle Database having the ideal technology for the data warehouse because the software’s open interfaces offered easy integration with multiple systems. This was important as the company wanted to import data from existing applications into the data warehouse. It also accommodated large amounts of detail, down to individual flights in specific segments. Furthermore, the Oracle solution offered a flexible structure for reporting, enabling customer to design customized reports and allowing staff to undertake multi-dimensional analysis. Oracle is also highly scalable, ensuring it can cater for future growth. 31
  • 32. A research from Winter Corporation and Oracle stated that the size of a data warehouse triple itself in every 2 years since 2001. There are many reasons that lead to the significant growth in size of the data warehouse, which many of them can be explained by the industrial trends in data warehousing. Oracle stated that the first reason that can explain this phenomenon is the development of real-time business (Oracle.com, 2006). Organizations strive to react to the market changes as quickly as possible to gain market advantages. Data latency has to be reduced in order to achieve a real- time business model and as a result, the data size will gradually increase. Enterprises also tend to have a detailed log of the enterprise data as some regulatory compliance like Basel II (the International Convergence of Capital Measurement and Capital Standards) requires organizations to capture and retain detailed transaction histories. Moreover, new types of storage-intensive information can create new business opportunities, like the RFID technology, which is also one of the reasons in resulting with a huge size of the data warehouses. Besides data volume, data warehousing is experiencing growth in different dimensions. Traditionally data warehouses, or databases, were only used for reporting and analysis, but nowadays they often come with a prediction function and are shared and integrated to application. 32
  • 33. Furthermore, the accessibility of data warehouse is no longer limited to users within the enterprise, but to other customers, partners, and suppliers. Together with the increasing complexity of queries for intensive analysis of sophisticated business intelligence applications, there are many criteria for a data warehouse to satisfy today’s need. 4.1.7 Features of Oracle Data Warehousing Oracle, one of the most popular databases for data warehousing, has developed its data warehousing application to fit to the above needs. There are some key features of the Oracle application that increase its capability. 4.1.7.1 Partitioning Partitioning is the “foundation” for achieving effective performance in large-scale Oracle data warehouses. It means splitting data into separate “chunks”. It can shorten the response time and increase throughput. Some other different features that will be discussed below have to function depending on the partitioning of the data warehouse. 4.1.7.2 Parallel Operation Parallelism enables scalability, which makes large workloads, large databases, and very large data warehouses (VLDW) possible. It is because if not all parts in the system is functioning in parallel, any single- 33
  • 34. threaded path can potentially bottleneck the throughput of the system and as a result limiting its ability to scale. 4.1.7.3 Materialized views Materialized views can enable sophisticated data analysis on large data sets. Significant processes would have to be allocated to complicated joins and aggregations in order to produce the complex summaries required without materialized views. On the contrary, however, queries can be written against tables and views that they have been logically designed, and the application will deal with the physical tables. Materialized views often provide the performance boost necessary to turn a runaway query into a powerful analytical tool. 4.1.7.4 Intelligent Optimization The intelligent optimizer selects the best strategy and optimizes the order of operations. As a result the query performances of indexing, partitioning, and other data access features can be speeded up. 34
  • 35. Figure 7 – Example on query (Oracle Datawarehousing) 4.1.7.5 Table Compression Data compression to save disk space is an attractive option to save costs by decreasing storage requirements. Whereas traditional data compression will lead to query performance degradation, Oracle’s table compression feature eliminates duplicate, or redundant, data values without any negative impacts on the query performance. 4.1.7.6 Online Analytical Processing OLAPs are deployed to gain better visibility into the business. It helps to 35
  • 36. understand what’s happening, why it’s happening, and what will happen to the business. Thus, all necessary knowledge and information for planning, budgeting, forecasting, sales, and marketing functions can be derived from the existing databases. Oracle’s OLAP product uses a single database platform for all query processing. Both SQL and OLAP API queries can be directed to 1 single data store. Without the need of transferring data to different environments, users can benefit from reducing data latency, faster access to more recent data, and reducing low cost and complexity. Figure 8 – Example on OLAPs (Oracle Datawarehousing) 36
  • 37. 4.1.7.7 Data Mining Data mining is intended to sift through volumes of data to find hidden patterns. These patterns can derive new business insights that can attract and retain customers, enhance customer and supplier relationships, identify new sales opportunities, or identify potentially fraudulent behavior. 4.1.8 Advantages  Provided an integrated view of the business by building an enterprise data warehouse  Supported decision-making and business analysis at all levels of the company  Improved performance through early detection of market opportunities  Catered for future growth with scalable solution  Oracle is better known and management often feels more comfortable with a better-known vendor and product.  There are more Oracle DBAs in the job market  There are more books written on supporting Oracle  Vendors of data warehouse products will almost always write their products to support Oracle first. In addition, there is usually more experience with these products with Oracle  Oracle has a pretty complete suite of products for data warehouse.  A company using Oracle's ERP products almost always use Oracle’s 37
  • 38. RDBMS for their data warehouse. 4.1.9 Disadvantages  It can be extremely expensive to build and maintain. You may need to have buy-in from senior management to get approval for a data warehouse.  You need large amounts of storage space, potentially one terabyte or more.  Because there is a huge amount of data, it is possible to write queries that seem to run forever and never come back with an answer (the query from the Twilight Zone).  The data is not up-to-date? In some cases, 24 hours or more old.  They are not easily changed. If you spot an error in the data warehouse, you will have to correct it in the source system. If that system cannot be changed, the data warehouse cannot be changed and you will have to live with incorrect data. For example, company "ABC Widgets" could be stored in the database as "A.B.C. Widgets", "AB and C Widgets", or "AB&C". Unless you know about these possible irregularities, you will get incomplete results. You may have a difficult time persuading your company to change their procedures to satisfy the data warehouse.  Because the data is coming from different sources, you may not be able to get the same answer from your OLTP system as you do from 38
  • 39. the data warehouse. It will be difficult, if not impossible, to identify if any OLTP transactions are missing from the data warehouse. 39
  • 40. 4.1.10Application of Oracle Datawarehousing in Absa Group Limited Absa Group Limited, one of South Africa’s largest financial services organizations offers a complete range of products and services. Absa has assets of R372 billion (US$62 billion), 686 staffed outlets, 5,468 ATMs and South Africa’s largest internet banking customer base. In 2005, Barclays took a majority stake, to help Absa become the financial services leader in South Africa and ultimately the pre-eminent bank on the African continent. 4.1.10.1Challenges  Improve Absa’s business responsiveness by consolidating its fragmented business intelligence environment, which required compiling 1,200 reports and 31 business intelligence projects  Align business intelligence to corporate strategy by standardized methodology, architecture, tools and measurement  Deliver reports required by all business units  Cut costs of delivering and printing manually generated reports  Replace paper-based reports with electronic intelligence for individual business units to reduce report delivery-to-desk time 4.1.10.2Solution  Used Oracle Database as the single source of data to make the 40
  • 41. Enterprise Data Warehouse more efficient  Consolidated the business intelligence environment on Oracle Application Server to reduce duplication of reports  Implemented a business intelligence methodology with OLAP and Oracle Balanced Scorecard tools to support common strategic planning across the group  Aligned business performance measurement to focus on causes of problems and thus ensure better business decision making  Used Oracle Warehouse Builder to create common processes for extracting data and loading into the Data Warehouse  Able to source data from 52 core banking systems, on a daily, weekly or monthly basis, as well as external data sources  Implemented Oracle Discoverer for end-user analysis  Anticipated cost savings from reduced manual reporting and removal of disparate BI projects represents a possible return on investment of more than 300% over five years 4.2 SAS 4.2.1 Introduction to SAS SAS Institute Inc., has been a major producer of software since it was founded in 1976. SAS was originally an acronym for Statistical Analysis System but for many years has been used as an arbitrary trade-name to 41
  • 42. refer the company as a whole. The SAS System, originally Statistical Analysis System, is an integrated system of software products provided by SAS Institute that enables the programmer to perform:  Data entry, retrieval, management, and mining  Report writing and graphics  Statistical and mathematical analysis  Business planning, forecasting, and decision support  Operations research and project management  Quality improvement  Applications development  Data warehousing (extract, transform, load)  platform independent and remote computing In addition, the SAS System integrates with many SAS business solutions that enable large scale software solutions for areas such as human resource management, financial management, business intelligence, customer relationship management and more. 42
  • 43. 4.2.2 Features of SAS Warehousing Administrator SAS/warehousing administrator is designed for the IT professional responsible for creating and managing data warehouse / data mart processes. It provides Customizable solution that offers a single point of control, making it easier to respond to the ever-changing needs of the business community. Also, it simplifies the creation and maintenance of data warehouses. The main benefit of using SAS/warehousing administrator is simplifying the setup and management of multiple data warehouses and data marts. The details are as follows,  Integrates extraction, transformation and loading tools for building and managing data warehouses/data marts.  Provides a framework for effective warehouse management through a metadata-driven architecture.  Facilitates business subject definition, consolidation of business rules, scheduling of processes for warehouse maintenance and integration with decision-support tools for effective warehouse exploitation.  Leverages the strengths of SAS software and rapid warehousing to deliver the well-proven benefits of a data warehouse even faster. With using the graphical user interface, the visualization, navigation and maintenance of the data warehouse are simplified and eliminate much of 43
  • 44. the coding work required to build and manage it. Moreover, it offers the adaptability and the manageability you need as your business and information needs change, as more data is added, as processes become more complex, and as users require greater support. Figure 9 – Example on use interface (SAS/Warehousing Administrator) 44
  • 45. Figure 10 – Example on reports (SAS Data Warehouse) 4.2.2.1 SAS Enterprise Data Integration In different from common data warehousing, SAS provides a complete functional data capturing, storage, integration and analysis software across the enterprise. The SAS Enterprise Data Integration attains and manages consistent and trusted data throughout the organization in a flexible and reliable manner. 45
  • 46.  Graphical user interface provides technicians with an interactive, single point of control for managing data integration processes, including wizards for building and executing data access, transformations and storage process flows.  Connectivity to more data sources on more platforms such as IBM DB2, Oracle DB, Microsoft Access, Sybase, etc.  Data quality embedded into batch, near-time and real-time processes  Metadata is captured and documented throughout transformation and data integration processes  Migrate or synchronize data between database structures, enterprise applications, mainframe legacy files, text, XML and message queues  Join data across these virtual data sources for real-time access and analysis  Business metadata design interface allows data analysts to quickly build semantic layer  Business rules library for reusable business rules clean, standardize, match and enhance data as it moves into the master reference file and is reused for downstream processes 46
  • 47. Figure 11 – Overview on SAS Data Integration 4.2.3 Advantages 4.2.3.1 High Compatibility Access to ERP systems such as Baan, People Soft, and SAP; relational databases such as DB2, Oracle, Informix, ODBC, MS SQL Server, Sybase, and Teradata; and non-relational databases such as Adabas and PC file formats. 4.2.3.2 Point-and-click interface The user friendly interface enable data management specialist implementing the warehousing application without the assist of programmer and also operators. 47
  • 48. 4.2.4 Disadvantages 4.2.4.1 Unknown Implementation cost When compare with other, like Oracle, SAS does not has a well pricing policy. It gives difficulty for customers to choose between available products. 4.2.4.2 Unknown difficulty of implementation Compare with other company, such as Oracle offers data warehouse and analytic specific services that combine technical leadership and expertise with Oracle technology to provide a complete business intelligence solution, SAS does not mention the degree of difficulty of implementation. 48
  • 49. 4.2.5 Application of SAS Data Warehousing in the HK Trade Development Council The Hong Kong Trade Development Council, which launched Business- Stat On-line using Data Warehousing and Web Enablement technology from SAS. Business-Stat On-line (BSO) is an interactive on-line service allowing companies to access monthly trade figures compiled by the Census and Statistics Department. Information available includes Hong Kong’s total trade figures, overseas trade, and trade according to specific types of product and service. The project involved the design and implementation of a Data Warehouse containing five years of export, domestic export and import data broken down by a wide range of product and market areas. Other trade service data was also imported into the system using customized tools provided by SAS. SAS also developed an extensive number of statistical reports. Over 6000 pre-summarized general tables were created for on-line access, designed as a starting point to Hong Kong’s general trade performance. In addition, users of the service can view an unlimited number of dynamic reports based on selection criteria such as region, industry and product type. Registration and administration tools provided by SAS allow subscribers to register for the BSO service on-line free of 49
  • 50. charge. They are then automatically notified by e-mail of their logon ID and password, allowing them full access to the service. 50
  • 51. 5. How to implement Data Warehouse successfully Talking so many benefits about the application of data warehouse in an enterprise, but how we could implement the DW technology successfully into our operational processes? Denis Kozar suggested the “seven deadly sins” on the DW implementation. 5.1 “If you Built It, They Will Come” The blind faith on the DW technology leads to the failure to recognize the importance of defining a set of business objectives for the data warehouse prior to its implementation. A clearly defined data warehouse plan is important to the needs of the entire enterprise and a documented set of requirements is necessary to guide the design, construction, and rollout of the project. 5.2 Omission of an Architectural Framework One the most important factors in a successful data warehouse implementation is the development and maintenance of a comprehensive architectural framework. The framework serves as the blueprint for construction and use of the various DW components. Developers need to 51
  • 52. consider, the number of end-users, volume and diversity of data, expected data-refresh cycle, etc., in the DW architecture. 5.3 Understanding the Importance of Documenting Assumptions The assumptions and potential data conflicts associated with the DW must be included in the architectural framework for the project. Several questions need to be considered during the requirements phase of the project that serve to reveal these important underlying assumptions about the DW. How much data would be loaded into the warehouse? How often the data need to be refreshed? On what platform the DW will be developed? Answers to these questions are essential to the success of DW implementation. 5.4 Failure to Use the Right Tool for the Job The design and construction of a DW is much different from that of an operational application system. The DW tools can be categorized into four areas:  Analysis Tools – assist in identification of data requirements  Development Tools – responsible for data cleansing, code generation, data integration, and loading of the data into the data 52
  • 53. repository.  Implementation Tools – contain data acquisition tools to gather process, clean, replicate, and consolidate data.  Delivery Tools – assist in data conversion, derivation, and reporting for the application platform. Correct application of these tools could help to implement the DW efficiently and effectively. 5.5 Life Cycle Abuse The life cycle of DW development is a continuous, ongoing set of activities that flow from initial investigation of DW requirements through data administration and back again. The development of DW project should be kept running continuously as if the DW is to remain a viable source of decision-making support in the ever changing business environment. 5.6 Ignorance Concerning the Resolution of Data Conflicts Analysis must be conducted to determine the best data sources available within an organization. Once these systems have been identified, the conflicts associated with disparate naming conventions, file formats and sizes, and value ranges must be resolved. This process may involve 53
  • 54. working with data owners to establish an understanding with regard to future planned or unplanned changes to the source data. Failure to allow sufficient time and resources to resolve data conflicts can delay a warehouse implementation and result in an organizational deadlock that can threaten the success of the project. Failure to Learn from Mistakes The ongoing nature of the DW development cycle suggests that DW project simply relates one another. Because of this, careful documentation of the mistakes made in the previous projects will directly impact the quality assurance activities of all future projects. By learning from the past, a strong DW with lasting benefits can be built. If developers can pay attention to the above areas, the implementation of Data Warehouse will certainly bring great benefits to the business. 54
  • 55. 6. Concerns & Conclusion Data warehouse can bring many benefits to enterprises, however, there are concerns of using it.  Extracting, cleaning and loading data is time consuming.  Data warehousing project scope must be actively managed to deliver a release of defined content and value.  Problems with compatibility with systems already in place.  Security could develop into a serious issue, especially if the data warehouse is web accessible.  Data Storage design controversy warrants careful consideration and perhaps prototyping of the data warehouse solution for each project's environments So, managers need to aware of the concerns when using the data warehouse, so that they can get the benefits of data warehousing without any problems. 55
  • 56. 7. References George M. Marakas. (©1999) pp. 343-346, Decision support system in the twenty-first century: DSS and data mining technologies for tomorrow’s manager IBM, Background, http://www-03.ibm.com/press/us/en/background.wss IBM, DB2 Data Warehouse Edition, “Features and benefits”, http://www-306.ibm.com/software/data/db2/dwe/features.html?S_CMP=rnav IBM, DB2 Data Warehouse Edition, “Overview”, http://www-306.ibm.com/software/data/db2/dwe/ IBM, “Denmarks’ TDC answers the call of Danish telephone consumers with IBM Data Warehouse”, http://www-306.ibm.com/software/success/cssdb.nsf/CS/SPAT-6ATKAP? OpenDocument&Site=dmbi&cty=en_us Ken Orr (©1996, revised 2000), Data Warehouse Technology, http://www.kenorrinst.com/dwpaper.html Manufacturing Business Technology: Software Finder, “Oracle vs SAS”, http://softwarefinder.mbtmag.com/search/for/Oracle-vs-SAS.html Martyn R Jones (1999), “Brief defining characteristics of a Data Warehouse”, http://www.brint.com/wwwboard/messages/4599.html Paul Westerman, Data Warehousing : using the Wal-Mart model SAS, Data Integration, http://www.sas.com/technologies/dw/ Wikipedia, “Bill Inmon”, http://en.wikipedia.org/wiki/Bill_Inmon 56
  • 57. Wikipedia, “SAS Institute”, http://en.wikipedia.org/wiki/SAS_Institute Wikipedia, “SAS System”, http://en.wikipedia.org/wiki/SAS_System 57