Metadata is an important component of the data warehouse in any scenario, but it takes on an entirely different dimension in the face of storing and managing external data.It is through the metadata that a manager determines much information about the external data. Properly built and maintained metadata is absolutely essential to the operation of the data warehouse—particularly with regard to external data.
Shown in Figure 8-4, notification data is merely a file created for users of the system that indicates classifications of data interesting to the users. When data is entered into the data warehouse and into the metadata, a check is made to see who is interested in it. The person is then notified that the external data has been captured.
External data can actually be stored in the data warehouse if it is convenient and cost-effective to do so. But in many cases, it will not be possible or economical to store all external data in the data warehouse. Instead, an entry is made in the metadata of the warehouse describing where the actual body of external data can be found.
Lecture 08 - External Data and the Data Warehouse
Chapter 8: External Data and the Data Warehouse http://it-slideshares.blogspot.com/
Agenda1. Introduction2. External Data in the Data Warehouse3. Metadata and External Data4. Storing External Data5. Different Components of External Data6. Modeling and External Data7. Secondary Reports8. Archiving External Data9. Comparing Internal Data to External Data10. Summary http://it-slideshares.blogspot.com/
8.1 Introduction Most organizations build their first data warehouse efforts on data whose source is existing systems (that is, on data internal to the corporation). A whole host of other data is of legitimate use to a corporation that is not generated from the corporation’s own systems. This class of data is called external data and usually enters the corporation in an unpredictable format. (Figure 8.1). The data warehouse is the ideal place to store external data. If external data is not stored in a centrally located place, several problems are sure to arise. (Figure 8.2). http://it-slideshares.blogspot.com/
8.2 External Data in the Data Warehouse Several issues relate to the use and storage of external data in the data warehouse. The first problem is the frequency of availability The second problem is totally undisciplined The third problem is unpredictability http://it-slideshares.blogspot.com/
8.2 External Data in the Data Warehouse (con’t) There are many methods to capture and store external information. One of the best places to locate external data if it is voluminous is on a bulk storage medium such as near-line storage. Another technique for handling external data that is sometimes effective is to create two stores of external data. The external data becomes an adjunct to the data warehouse. http://it-slideshares.blogspot.com/
8.3 Metadata and External Data Metadata is vital when it comes to the issue of external data. http://it-slideshares.blogspot.com/
8.3 Metadata and External Data (con’t) Associated with metadata is another type of data— notification data. http://it-slideshares.blogspot.com/
8.4 Storing External Data http://it-slideshares.blogspot.com/
8.5 Different Components of External Data One of the important design considerations of external data is that it often contains many different components, some of which are of more use than others. To manage the data, an experienced DSS analyst or industrial engineer must determine the most important units of data.
8.6 Modeling and External Data The following question must be answer. What is the relationship between the data model and external data? (As described in Figure 8.6) http://it-slideshares.blogspot.com/
8.7 Secondary Reports When data is repetitive in nature, secondary reports can be created from the detailed data over time. For example, take the month-end Dow Jones Industrial Average report shown in Figure 8-7. http://it-slideshares.blogspot.com/
8.9 Archiving External Data Every piece of information—external or otherwise—has a useful lifetime. Once that lifetime is past, it is not economical to keep the information. An essential part of managing external data is deciding what the useful lifetime of the data is. → There remains the issue of whether the data should be discarded or put into archives. http://it-slideshares.blogspot.com/
8.10 Comparing Internal Data to External Data One of the most useful things to do with external data is to compare it to internal data over a period of time. → The comparison allows management a unique perspective. The following is some problems must be notice when compare internal Data to External Data The comparison is made on a common key. There needs to be a “cleansing” of the external data. http://it-slideshares.blogspot.com/
8.11 Summary The data warehouse is capable of holding much more than internal, structured data. There is much information relevant to the running of the company that comes from sources outside the company. External data is captured, and information about the metadata is stored in the data warehouse metadata. External data often undergoes significant editing and transformation as the data is moved from the external environment to the data warehouse environment. The metadata that describes the external data and the unstructured data serves as an executive index to information. External and unstructured data may or may not actually be stored in the data warehouse. http://it-slideshares.blogspot.com/