Lectura Capitulo 5. Sistemas de Información Gerencial, James O´Brien
Management ChallengesCHAPTER 5 Business Applications Module II Information Technologies Development Foundation Processes ConceptsDATA RESOURCE MANAGEMENTChapter Highlights Learning ObjectivesSection I After reading and studying this chapter, you shouldTechnical Foundations of Database Management be able to:Real World Case: Harrah’s Entertainment and Others: 1. Explain the business value of implementing dataProtecting the Data Jewels resource management processes and technologiesDatabase Management in an organization.Fundamental Data Concepts 2. Outline the advantages of a database managementDatabase Structures approach to managing the data resources of aDatabase Development business, compared to a file processing approach.Section II 3. Explain how database management software helpsManaging Data Resources business professionals and supports the operationsReal World Case: Emerson and Sanofi: Data Stewards and management of a business.Seek Data ConformityData Resource Management 4. Provide examples to illustrate each of theTypes of Databases following concepts:Data Warehouses and Data Mining a. Major types of databases.Traditional File Processing b. Data warehouses and data mining.The Database Management Approach c. Logical data elements.Real World Case: Acxiom Corporation: Data d. Fundamental database structures.Demands Respect e. Database development. 149
150 ● Module II / Information Technologies SECTION I Technical Foundations of Database ManagementDatabase Just imagine how difficult it would be to get any information from an information sys- tem if data were stored in an unorganized way, or if there were no systematic way toManagement retrieve them. Therefore, in all information systems, data resources must be organized and structured in some logical manner so that they can be accessed easily, processed ef- ficiently, retrieved quickly, and managed effectively. Data structures and access meth- ods ranging from simple to complex have been devised to efficiently organize and access data stored by information systems. In this chapter, we will explore these con- cepts, as well as the managerial implications and value of data resource management. See Figure 5.1. Read the Real World Case on data resources in the casino gaming and hospitality industry. We can learn a lot from this case about the importance of protecting the data resources of the organization.Fundamental Before we go any further, let’s discuss some fundamental concepts about how data are organized in information systems. A conceptual framework of several levels of data hasData Concepts been devised that differentiates between different groupings, or elements, of data. Thus, data may be logically organized into characters, fields, records, files, and data- bases, just as writing can be organized in letters, words, sentences, paragraphs, and documents. Examples of these logical data elements are shown in Figure 5.2.Character The most basic logical data element is the character, which consists of a single alpha- betic, numeric, or other symbol. You might argue that the bit or byte is a more ele- mentary data element, but remember that those terms refer to the physical storage elements provided by the computer hardware, discussed in Chapter 3. Using that un- derstanding, one way to think of a character is that it is a byte used to represent a par- ticular character. From a user’s point of view (that is, from a logical as opposed to a physical or hardware view of data), a character is the most basic element of data that can be observed and manipulated.Field The next higher level of data is the field, or data item. A field consists of a grouping of related characters. For example, the grouping of alphabetic characters in a person’s name may form a name field (or typically, last name, first name, and middle initial fields), and the grouping of numbers in a sales amount forms a sales amount field. Specifically, a data field represents an attribute (a characteristic or quality) of some entity (object, person, place, or event). For example, an employee’s salary is an attribute that is a typical data field used to describe an entity who is an employee of a business. Generally speaking, fields are organized such that they represent some logi- cal order. For example, last_name, first_name, address, city, state, zipcode, and so on.Record All of the fields used to describe the attributes of an entity are grouped to form a record. Thus, a record represents a collection of attributes that describe an entity. An example is a person’s payroll record, which consists of data fields describing attributes such as the person’s name, Social Security number, and rate of pay. Fixed-length records contain a fixed number of fixed-length data fields. Variable-length records contain a variable number of fields and field lengths. Another way of looking at a record is that it represents a single instance of an entity. Each record in an employee file describes one specific employee.File A group of related records is a data file, or table. Thus, an employee file would contain the records of the employees of a firm. Files are frequently classified by the application
Chapter 5 / Data Resource Management ● 151 1REAL WORLD Harrah’s Entertainment and Others: CASE Protecting the Data JewelsI n the casino industry, one of the most valuable assets is the dossier that casinos keep on their affluent customers, the high rollers. But in 2003, casino operator Harrah’s Enter-tainment Inc. filed a lawsuit in Placer County, California,Superior Court charging that a former employee had copied lists. Through these documents, employees “acknowledge that they will be introduced to this information and agree not to disclose it on departure from the company,” says Suzanne Labrit, a partner at law firm Shutts & Bowen LLP in West Palm Beach, Florida.the records of up to 450 wealthy customers before leaving Although most states have enacted trade-secrets laws,the company to work at competitor Thunder Valley Casino Labrit says they have different attitudes about enforcing thesein Lincoln, California. laws with regard to customer lists. “But as a starting point, at The complaint said the employee was seen printing the least you have this understanding [with employees] that thelist—which included names, contact information, and credit customer information is being treated as confidential,” Labritand account histories—from a Harrah’s database. It also says. Then if an employee leaves to work for a competitor andalleged that he tried to lure those players to Thunder Valley. uses this protected customer data, the employer will moreThe employee denies the charge of stealing Harrah’s trade likely be able to take legal action to stop the activity. “If yousecrets, and the case was still pending at this writing, but don’t treat it as confidential information internally,” she says,many similar cases have been filed in the past 20 years, legal “the court will not treat it as confidential information, either.”experts say. It’s also important to educate employees about the While savvy companies are using business intelligence confidentiality of customer lists, because many peopleand customer relationship management systems to identify wrongly assume they’re public information, says Timtheir most profitable customers, there’s a genuine danger Headley, a partner at the Houston law firm of Gardereof that information falling into the wrong hands. Broader Wynne Sewell LLP. “Most people think they can take theaccess to those applications and the trend toward employees lists with them,” he says. “You have to show that you’veswitching jobs more frequently have made protecting cus- kept it a secret and told employees it’s a valuable secret.tomer lists an even greater priority. [Customer lists] are at the core of how you bring revenue Fortunately, there are managerial, legal, and technologi- into the company. These are the decision-makers who arecal steps you can take to help prevent, or at least discourage, willing to buy your product.”departing employees from walking out the door with this From a management and process standpoint, organiza-vital information. tions should try to limit access to customer lists to only For starters, organizations should make sure that certain employees, such as sales representatives, who need theemployees—particularly those with frequent access to cus- information to do their jobs. “If you make it broadly avail-tomer information—sign nondisclosure, noncompete, and able to employees, then it’s not considered confidential,” saysnonsolicitation agreements that specifically mention customer Labrit. Physical security should also be considered, Labrit says.FIGURE 5.1 Visitors such as vendors shouldn’t be permitted to roam free in the hallways or into conference rooms. And security poli- cies, such as a requirement that all computer systems have strong password protection, should be strictly enforced. Companies should instantly shut down access to com- puters and networks when employees leave, whether the rea- son is a layoff or a move to a new job. At the exit interview, the employee should be reminded of any signed agreements and corporate policies regarding customer lists and other confidential information. Employees should be told to turn over anything, including data that belongs to the company. In addition, employers should track the activities of em- ployees who’ve given notice but will be around for a while. This includes monitoring systems to see if the employee is e-mailing company-owned documents outside the company. Some organizations rely on technology to help prevent While data management is a strategic initiative in the loss of customer lists and other critical data. Inflow Inc., every modern organization, those in the gaming a Denver-based provider of managed Web hosting services, industry believe their success lies in the protection uses a product from Opsware Inc. in Sunnyvale, California, and strategic management of their data resources. that lets managers control access to specific systems, such as databases, from a central location.Source: Jose Luis Palaez, Inc./Corbis.
Chapter 5 / Data Resource Management ● 153FIGURE 5.2 Examples of the logical data elements in information systems. Note especially the examples of howdata fields, records, files, and databases are related. Human Resource Database Payroll File Benefits File Employee Employee Employee Employee Record 1 Record 2 Record 3 Record 4 Name SS No. Salary Name SS No. Salary Name SS No. Insurance Name SS No. Insurance Field Field Field Field Field Field Field Field Field Field Field FieldJones T. A. 275-32-3874 20,000 Klugman J. L. 349-88-7913 28,000 Alvarez J.S. 542-40-3718 100,000 Porter M.L. 617-87-7915 50,000 for which they are primarily used, such as a payroll file or an inventory file, or the type of data they contain, such as a document file or a graphical image file. Files are also classified by their permanence, for example, a payroll master file versus a payroll weekly transac- tion file. A transaction file, therefore, would contain records of all transactions occur- ring during a period and might be used periodically to update the permanent records contained in a master file. A history file is an obsolete transaction or master file retained for backup purposes or for long-term historical storage called archival storage.Database A database is an integrated collection of logically related data elements. A database consolidates records previously stored in separate files into a common pool of data elements that provides data for many applications. The data stored in a database are independent of the application programs using them and of the type of storage devices on which they are stored. Thus, databases contain data elements describing entities and relationships among entities. For example, Figure 5.3 outlines some of the entities and relationships in aFIGURE 5.3Some of the entities and Electric Utility Databaserelationships in a simplifiedelectric utility database.Note a few of the business Billing Paymentapplications that access the Entities: processingdata in the database. Customers, meters, bills, payments, meter readings Meter Service Relationships: reading start / stop Bills sent to customers, customers make payments, customers use meters, . . . Source: Adapted from Michael V. Mannino, Database Application Development and Design (Burr Ridge, IL: McGraw-Hill/Irwin, 2001), p. 6.
154 ● Module II / Information Technologies database for an electric utility. Also shown are some of the business applications (billing, payment processing) that depend on access to the data elements in the database.Database The relationships among the many individual data elements stored in databases are based on one of several logical data structures, or models. Database management sys-Structures tem packages are designed to use a specific data structure to provide end users with quick, easy access to information stored in databases. Five fundamental database struc- tures are the hierarchical, network, relational, object-oriented, and multidimensional models. Simplified illustrations of the first three database structures are shown in Figure 5.4.Hierarchical Early mainframe DBMS packages used the hierarchical structure, in which the rela-Structure tionships between records form a hierarchy or treelike structure. In the traditional hierarchical model, all records are dependent and arranged in multilevel structures,FIGURE 5.4 Hierarchical StructureExample of three Departmentfundamental database Data Elementstructures. They representthree basic ways todevelop and express therelationships among the Project A Project Bdata elements in a database. Data Element Data Element Employee 1 Employee 2 Data Element Data Element Network Structure Department A Department B Employee Employee Employee 1 2 3 Project Project A B Relational Structure Department Table Employee Table Deptno Dname Dloc Dmgr Empno Ename Etitle Esalary Deptno Dept A Emp 1 Dept A Dept B Emp 2 Dept A Dept C Emp 3 Dept B Emp 4 Dept B Emp 5 Dept C Emp 6 Dept B
Chapter 5 / Data Resource Management ● 155 consisting of one root record and any number of subordinate levels. Thus, all of the relationships among records are one-to-many, since each data element is related to only one element above it. The data element or record at the highest level of the hierarchy (the department data element in this illustration) is called the root element. Any data element can be accessed by moving progressively downward from a root and along the branches of the tree until the desired record (for example, the employee data element) is located.Network Structure The network structure can represent more complex logical relationships and is still used by some mainframe DBMS packages. It allows many-to-many relationships among records; that is, the network model can access a data element by following one of several paths, because any data element or record can be related to any number of other data elements. For example, in Figure 5.4, departmental records can be related to more than one employee record, and employee records can be related to more than one project record. Thus, you could locate all employee records for a particular department, or all project records related to a particular employee.Relational Structure The relational model is the most widely used of the three database structures. It is used by most microcomputer DBMS packages, as well as by most midrange and mainframe systems. In the relational model, all data elements within the database are viewed as being stored in the form of simple two-dimensional tables sometimes referred to as relations. The tables in a relational database have rows and columns. Each row repre- sents a single record in the file, and each column represents a field. Figure 5.4 illustrates the relational database model with two tables representing some of the relationships among departmental and employee records. Other tables, or rela- tions, for this organization’s database might represent the data element relationships among projects, divisions, product lines, and so on. Database management system pack- ages based on the relational model can link data elements from various tables to provide information to users. For example, a manager might want to retrieve and display an employee’s name and salary from the employee table in Figure 5.4, and the name of the employee’s department from the department table, by using their common department number field (Deptno) to link or join the two tables. See Figure 5.5. The relational model can relate data in any one file with data in another file if both files share a com- mon data element or field. Because of this, information can be created by retrieving data from multiple files even if they are not all stored in the same physical location.Relational Three basic operations can be performed on a relational database to create useful setsOperations of data. The select operation is used to create a subset of records that meet a stated cri- terion. For example, a select operation might be used on an employee database to create a subset of records that contain all employees who make more than $30,000 per year and who have been with the company more than three years. Another way to think of the select operation is that it temporarily creates a table whose rows have records that meet the selection criteria.FIGURE 5.5 Department Table Employee TableJoining the Employee and Deptno Dname Dloc Dmgr Empno Ename Etitle Esalary DeptnoDepartment tables in a Dept A Emp 1 Dept Arelational database enables Dept B Emp 2 Dept Ayou to selectively access Dept C Emp 3 Dept Bdata in both tables at the Emp 4 Dept Bsame time. Emp 5 Dept C Emp 6 Dept B
156 ● Module II / Information Technologies The join operation can be used to temporarily combine two or more tables so that a user can see relevant data in a form that looks like it is all in one big table. Using this operation, a user can ask for data to be retrieved from multiple files or databases without having to go to each one separately. Finally, the project operation is used to create a subset of the columns contained in the temporary tables created by the select and join operations. Just as the select oper- ation creates a subset of records that meet stated criteria, the project operation creates a subset of the columns, or fields, that the user wants to see. Using a project operation, the user can decide not to view all of the columns in the table but only those that have data necessary to answer a particular question or to construct a specific report. Because of the widespread use of the relational model, an abundance of commer- cial products exists to create and manage them. Leading mainframe relational database applications include Oracle 10g from Oracle Corp. and DB2 from IBM. A very popu- lar midrange database application is SQL server from Microsoft. The most commonly used database application for the PC is Microsoft Access.Multidimensional The multidimensional model is a variation of the relational model that uses multidi-Structure mensional structures to organize data and express the relationships between data. You can visualize multidimensional structures as cubes of data and cubes within cubes of data. Each side of the cube is considered a dimension of the data. Figure 5.6 is an example that shows that each dimension can represent a different category, such as product type, region, sales channel, and time .FIGURE 5.6 An example of the different dimensions of a multidimensional database. Denver Profit Los Angeles Total Expenses San Francisco Margin West COGS February March East West East Sales Actual Budget Actual Budget Actual Budget Actual Budget Sales Camera TV January TV February VCR March Audio Qtr 1 Margin Camera VCR January TV February VCR March Audio Qtr 1 April April Qtr 1 Qtr 1 March March February February Actual Budget Sales Margin January January Sales Margin Sales Margin TV VCR TV VCR TV East East Actual West Budget South Forecast Total Variance VCR East West Actual West Budget South Forecast Total Variance
158 ● Module II / Information TechnologiesFIGURE 5.8This claims analysisgraphics display providedby the CleverPathenterprise portal is poweredby the Jasmine ii object-oriented databasemanagement system ofComputer Associates. Source: Courtesy of Computer Associates. adding object-oriented modules to their relational software. Examples include multi- media object extensions to IBM’s DB2, and Oracle’s object-based “cartridges” for Oracle 10g. See Figure 5.8.Evaluation of The hierarchical data structure was a natural model for the databases used for theDatabase Structures structured, routine types of transaction processing characteristic of many business op- erations in the early years of data processing and computing. Data for these operations can easily be represented by groups of records in a hierarchical relationship. However, as time progressed, there were many cases where information was needed about records that did not have hierarchical relationships. For example, in some organizations, employees from more than one department can work on more than one project (refer back to Figure 5.4). A network data structure could easily handle this many-to-many relationship, whereas a hierarchical model could not. As such, the more flexible net- work structure became popular for these types of business operations. However, like the hierarchical structure, because its relationships must be specified in advance, the network model was unable to easily handle ad hoc requests for information, thus pointing out the need for the relational model. Relational databases allow an end user to easily receive information in response to ad hoc requests. That’s because not all of the relationships between the data elements in a relationally organized database need to be specified when the database is created. Database management software (such as Oracle 10g, DB2, Access, and Approach) cre- ates new tables of data relationships by using parts of the data from several tables. Thus, relational databases are easier for programmers to work with and easier to main- tain than the hierarchical and network models. The major limitation of the relational model is that relational database manage- ment systems cannot process large amounts of business transactions as quickly and efficiently as those based on the hierarchical and network models, or process com- plex, high-volume applications as well as the object-oriented model. This performance gap has narrowed with the development of advanced relational database software with object-oriented extensions. The use of database management software based on the object-oriented and multidimensional models is growing steadily, as these tech- nologies are playing a greater role for OLAP and Web-based applications.
Chapter 5 / Data Resource Management ● 159Experian Experian Inc. (www.experian.com), a unit of London-based GUS PLC, runs one ofAutomotive: The the largest credit reporting agencies in the United States. But Experian wanted to expand its business beyond credit checks for automobile loans. If it could collectBusiness Value vehicle data from the various motor-vehicle departments in the United States andof Relational blend that with other data, such as change-of-address records, then its ExperianDatabase Automotive division could sell the enhanced data to a variety of customers. For example, car dealers could use the data to make sure their inventory matches localManagement buying preferences. And toll collectors could match license plates to addresses to find motorists who sail past tollbooths without paying. But to offer new services, Experian first needed a way to extract, transfer, and load data from the systems of 50 different U.S. state departments of motor vehicles (DMVs), plus Puerto Rico, into a single database. That was a big challenge. “Unlike the credit industry that writes to a common format, the DMVs do not,” says Ken Kauppila, vice president of IT at Experian Automotive in Costa Mesa, California. Of course, Experian didn’t want to replicate the hodgepodge of file formats it inherited when the project began in January 1999—175 formats among 18,000 files. So Kauppila decided to transform and map the data to a common relational database format. Fortunately, off-the-shelf software tools for extracting, transforming, and loading data (called ETL tools) make it economical to combine very large data repositories. Using ETL Extract from Evolutionary Technologies, Experian created a database that can incorporate vehicle information within 48 hours of its entry into any of the nation’s DMV computers. This is one of the areas in which data management soft- ware tools can excel, says Guy Creese, analyst at Aberdeen Group in Boston. “It can simplify the mechanics of multiple data feeds, and it can add to data quality, making fixes possible before errors are propagated to data warehouses,” he says. Using the ETL extraction and transformation tools along with IBM’s DB2 data- base system, Experian Automotive created a database that processes 175 million transactions per month and has created a variety of profitable new revenue streams. Experian’s automotive database is the 10th largest database in the world—now, with up to 16 billion rows of data. But the company says the relational database is man- aged by just three IT professionals. Experian says this demonstrates how efficiently database software like DB2 and the ETL tools can work with a large database to handle vast amounts of data quickly.Database Database management packages like Microsoft Access or Lotus Approach allow end users to easily develop the databases they need. See Figure 5.9. However, large orga-Development nizations usually place control of enterprisewide database development in the hands of database administrators (DBAs) and other database specialists. This improves the in- tegrity and security of organizational databases. Database developers use the data def- inition language (DDL) in database management systems like Oracle 10g or IBM’s DB2 to develop and specify the data contents, relationships, and structure of each database, and to modify these database specifications when necessary. Such information is cata- loged and stored in a database of data definitions and specifications called a data dictio- nary, or metadata repository, which is managed by the database management software and maintained by the DBA. A data dictionary is a database management catalog or directory containing metadata, that is, data about data. A data dictionary relies on a specialized database software component to manage a database of data definitions, that is, metadata about the structure, data elements, and other characteristics of an organization’s databases. For example, it contains the names and descriptions of all types of data records and their interrelationships, as well as information outlining requirements for end users’ access and use of application programs, and database maintenance and security.
160 ● Module II / Information TechnologiesFIGURE 5.9Creating a database tableusing the Table Wizardof Microsoft Access. Source: Courtesy of Microsoft Corp. Data dictionaries can be queried by the database administrator to report the status of any aspect of a firm’s metadata. The administrator can then make changes to the definitions of selected data elements. Some active (versus passive) data dictionaries automatically enforce standard data element definitions whenever end users and ap- plication programs access an organization’s databases. For example, an active data dic- tionary would not allow a data entry program to use a nonstandard definition of a customer record, nor would it allow an employee to enter a name of a customer that exceeded the defined size of that data element. Developing a large database of complex data types can be a complicated task. Data- base administrators and database design analysts work with end users and systems analysts to model business processes and the data they require. Then they determine (1) what data definitions should be included in the database and (2) what structure or relationships should exist among the data elements.Data Planning and As Figure 5.10 illustrates, database development may start with a top-down data plan-Database Design ning process. Database administrators and designers work with corporate and end user management to develop an enterprise model that defines the basic business process of the enterprise. Then they define the information needs of end users in a business process, such as the purchasing/receiving process that all businesses have. Next, end users must identify the key data elements that are needed to perform their specific business activities. This frequently involves developing entity relationship diagrams (ERDs) that model the relationships among the many entities involved in business processes. For example, Figure 5.11 illustrates some of the relationships in a purchasing/receiving process. ERDs are simply graphical models of the various files and their relationships contained within a database system. End users and data- base designers could use database management or business modeling software to help them develop ERD models for the purchasing/receiving process. This would help identify what supplier and product data are required to automate their purchasing/ receiving and other business processes using enterprise resource management (ERM) or supply chain management (SCM) software. You will learn about ERDs and other data modeling tools in much greater detail if you ever take a course in systems analysis and design.
Chapter 5 / Data Resource Management ● 161FIGURE 5.10Database development 1. Data Planning Physical Data Modelsinvolves data planning and Develops a model of business Storage representations anddatabase design activities. processes access methodsData models that supportbusiness processes are usedto develop databases thatmeet the information needs 5. Physical Designof users. Enterprise model of business Determines the data storage processes with documentation structures and access methods Logical Data Models 2. Requirements Specification E.g., relational, network, Defines information needs of end hierarchical, multidimensional, users in a business process or object-oriented models Description of users’ needs may 4. Logical Design be represented in natural Translates the conceptual language or using the tools of a models into the data model of particular design methodology a DBMS 3. Conceptual Design Conceptual Data Models Expresses all information Often expressed as entity requirements in the form of a relationship models high-level model Such user views are a major part of a data modeling process where the relation- ships between data elements are identified. Each data model defines the logical rela- tionships among the data elements needed to support a basic business process. For example, can a supplier provide more than one type of product to us? Can a customer have more than one type of account with us? Can an employee have several pay rates or be assigned to several project workgroups? Answering such questions will identify data relationships that have to be repre- sented in a data model that supports a business process. These data models then serve as logical frameworks (called schemas and subschemas) on which to base the physical de- sign of databases and the development of application programs to support the business processes of the organization. A schema is an overall logical view of the relationshipsFIGURE 5.11 Ordered on Supplies PurchaseThis entity relationship Product Supplier Order Itemdiagram illustrates some ofthe relationships among the Stocked asentities (product, supplier, Containswarehouse, etc.) in apurchasing/receivingbusiness process. Purchase Product Holds Warehouse Order Stock
162 ● Module II / Information TechnologiesFIGURE 5.12 Example of the logical and physical database views and the software interface of a banking servicesinformation system. Installment Checking Savings Loan Application Application Application Logical User Views Checking and Installment Data elements and relationships (the subschemas) needed Savings Loan for checking, savings, or installment loan processing Data Model Data Model Data elements and relationships (the schema) Banking Services Data Model needed for the support of all bank services Software Interface Database Management System The DBMS provides access to the bank’s databases Physical Data Views Organization and location of data on the storage media Bank Databases among the data elements in a database, while the subschema is a logical view of the data relationships needed to support specific end user application programs that will access that database. Remember that data models represent logical views of the data and relationships of the database. Physical database design takes a physical view of the data (also called the internal view) that describes how data are to be physically stored and accessed on the storage devices of a computer system. For example, Figure 5.12 illustrates these dif- ferent database views and the software interface of a bank database processing system. In this example, checking, savings, and installment lending are the business processes whose data models are part of a banking services data model that serves as a logical data framework for all bank services. Aetna: Insuring On a daily basis the operational services central support area at Aetna Inc. is Tons of Data responsible for 21.8 tons of data (174.6 terabytes [TB]). Over 119.2TB reside on mainframe-connected disk drives, while the remaining 55.4TB sit on disks attached to midrange computers. Almost all of this data are located in the com- pany’s headquarters in Hartford, Connecticut—with most of the information in relational databases. To make matters even more interesting, outside customers have access to about 20TB of the information. Four interconnected data centers containing 14 mainframes and more than 1,000 midrange servers process the data. It takes more than 4,100 direct-access storage devices to hold Aetna’s key databases.
Chapter 5 / Data Resource Management ● 163 Most of Aetna’s ever-growing mountain of data is health care information. Theinsurance company maintains records for both health maintenance organizationparticipants and customers covered by insurance policies. Aetna has detailedrecords of providers, such as doctors, hospitals, dentists, and pharmacies, and itkeeps track of all the claims it has processed. Some of Aetna’s larger customers sendtapes containing insured employee data; the firm is moving toward using the Internetto collect such data. If managing gigabytes of data is like flying a hang glider, managing multipleterabytes of data is like piloting a space shuttle: a thousand times more complex.You can’t just extrapolate from experiences with small and medium data stores tounderstand how to successfully manage tons of data. Even an otherwise mundaneoperation such as backing up a database can be daunting if the time needed to finishcopying the data exceeds the time available. Data integrity, backup, security, and availability are collectively the HolyGrail of dealing with large data stores. The sheer volume of data makes thesegoals a challenge, and a highly decentralized environment complicates matterseven more. Developing and adhering to standardized data maintenance proce-dures always provide an organization with the best return on their data dollarinvestment [9, 11].
164 ● Module II / Information Technologies SECTION II Managing Data ResourcesData Resource Data are a vital organizational resource that needs to be managed like other important business assets. Today’s business enterprises cannot survive or succeed without qualityManagement data about their internal operations and external environment. With each online mouse click, either a fresh bit of data is created or already-stored data are retrieved from all those business websites. All that’s on top of the heavy demand for indus- trial-strength data storage already in use by scores of big corporations. What’s driving the growth is a crushing imperative for corporations to analyze every bit of information they can extract from their huge data warehouses for competitive advantage. That has turned the data storage and management function into a key strategic role of the information age . That’s why organizations and their managers need to practice data resource man- agement, a managerial activity that applies information systems technologies like data- base management, data warehousing, and other data management tools to the task of managing an organization’s data resources to meet the information needs of their busi- ness stakeholders. This chapter will show you the managerial implications of using data resource management technologies and methods to manage an organization’s data assets to meet business information requirements. Read the Real World Case on data administration. We can learn a lot from this case about the challenges of managing the data within an organization. See Figure 5.13.Types of Continuing developments in information technology and its business applications have resulted in the evolution of several major types of databases. Figure 5.14 illus-Databases trates several major conceptual categories of databases that may be found in many organizations. Let’s take a brief look at some of them now.Operational Operational databases store detailed data needed to support the business processesDatabases and operations of a company. They are also called subject area databases (SADB), trans- action databases, and production databases. Examples are a customer database, human re- source database, inventory database, and other databases containing data generated by business operations. For example, a human resource database like that shown earlier in Figure 5.2 would include data identifying each employee and his or her time worked, compensation, benefits, performance appraisals, training and development status, and other related human resource data. Figure 5.15 illustrates some of the common oper- ational databases that can be created and managed for a small business using Microsoft Access database management software.Distributed Many organizations replicate and distribute copies or parts of databases to networkDatabases servers at a variety of sites. These distributed databases can reside on network servers on the World Wide Web, on corporate intranets or extranets, or on other company networks. Distributed databases may be copies of operational or analytical databases, hypermedia or discussion databases, or any other type of database. Replication and dis- tribution of databases are done to improve database performance at end user worksites. Ensuring that the data in an organization’s distributed databases are consistently and concurrently updated is a major challenge of distributed database management. Distributed databases have both advantages and disadvantages. One primary ad- vantage of a distributed database lies with the protection of valuable data. If all of an organization’s data reside in a single physical location, any catastrophic event like a fire or damage to the media holding the data would result in an equally catastrophic loss of use of that data. By having databases distributed in multiple locations, the negative impact of such an event can be minimized.
Chapter 5 / Data Resource Management ● 165 2REAL WORLD Emerson and Sanofi: Data CASE Stewards Seek Data ConformityA customer is a customer is a customer, right? Actu- ally, it’s not that simple. Just ask Emerson Process Management, an Emerson Electric Co. unit inAustin that supplies process automation products. In 2000 thecompany attempted to build a data warehouse to store cus- “It’s usually a seesaw effect,” says Chris Enger, formerly manager of information management at Philip Morris USA Inc. “When something goes wrong, they put someone in charge of data quality, and when things get better, they pull those resources away.”tomer information from over 85 countries. The effort failed Creating a data quality team requires gathering peoplein large part because the structure of the warehouse couldn’t with an unusual mix of business, technology, and diplomaticaccommodate the many variations on customers’ names. skills. It’s even difficult to agree on a job title. In Rybeck’s For instance, different users in different parts of the world department, they’re called “data analysts,” but titles at othermight identify Exxon as Exxon, Mobil, Esso, or ExxonMobil, companies include “data quality control supervisor,” “datato name a few variations. The warehouse would see them as coordinator,” or “data quality manager.”separate customers, and that would lead to inaccurate results “When you say you want a data analyst, they’ll comewhen business users performed queries. back with a DBA [database administrator]. But it’s not the That’s when the company hired Nancy Rybeck as data same at all,” Rybeck says. “It’s not the data structure, it’s theadministrator. Rybeck is now leading a renewed data ware- content.”house project that ensures not only the standardization of At Emerson, data analysts in each business unit reviewcustomer names, but also the quality and accuracy of cus- data and correct errors before it’s put into the operationaltomer data, including postal addresses, shipping addresses, systems. They also research customer relationships, loca-and province codes. tions, and corporate hierarchies; train overseas workers to fix To accomplish this, Emerson has done something unusual: data in their native languages; and serve as the main contactIt has started to build a department with 6 to 10 full-time “data with the data administrator and database architect for newstewards” dedicated to establishing and maintaining the quality requirements and bug fixes.of data entered into the operational systems that feed the data As the leader of the group, Rybeck plays a role thatwarehouse. includes establishing and communicating data standards, The practice of having formal data stewards is uncom- ensuring data integrity is maintained during database con-mon. Most companies recognize the importance of data versions, and doing the logical design for the data ware-quality, but many treat it as a “find-and-fix” effort, to be con- house tables.ducted at the end of a project by someone in IT. Others The stewards have their work cut out for them. Bringingcasually assign the job to the business users who deal with the together customer records from the 75 business units yieldeddata head-on. Still others may throw resources at improving a 75 percent duplication rate, misspellings, and fields withdata only when a major problem occurs. incorrect or missing data. “Most of the divisions would have sworn they had greatFIGURE 5.13 processes and standards in place,” Rybeck says. “But when you show them they entered the customer name 17 different ways, or someone had entered, ‘Loading dock open 8:00–4:00’ into the address field, they realize it’s not as clean as they thought.” Although the data steward may report to IT—as is the case at Emerson and at pharmaceuticals company Sanofi- Synthelabo Inc.—it’s not a job for someone steeped in tech- nical knowledge. Yet it’s not right for a businessperson who’s a technophobe, either. Seth Cohen is the first data quality control supervisor at Sanofi in New York. He was hired in 2003 to help design au- tomated processes to ensure the data quality of the customer knowledge base that Sanofi was beginning to build. Data stewards at Sanofi need to have business knowledge because they need to make frequent judgment calls, Cohen says. Indeed, judgment is a big part of the data steward’s job—including the ability to determine where you don’t need 100 percent perfection. Cohen says that task is one of the biggest challenges of the job. “One-hundred percent accuracy is just not achievable,”Source: Flying Colours Ltd./Digital Vision/Getty Images
Chapter 5 / Data Resource Management ● 167FIGURE 5.14 Examples of some of the major types of databases used by organizations and end users. External Databases on the Internet and Online Client PC Services Network Server Distributed Databases Operational on Intranets Databases and Other of the Networks Organization End User Data Data Databases Warehouse Marts Another advantage of distributed databases is found in their storage requirements. Often, a large database system may be distributed into smaller databases based on some logical relationship between the data and the location. For example, a company with several branch operations may distribute its data so that each branch operation location is also the location of its branch database. Because multiple databases in a distributed system can be joined together, each location has control of its local data while all other locations can access any database in the company if so desired. Distributed databases are not without some challenges, however. The primary chal- lenge is the maintenance of data accuracy. If a company distributes its database toFIGURE 5.15Examples of operationaldatabases that can becreated and managedfor a small business bymicrocomputer databasemanagement software likeMicrosoft Access. Source: Courtesy of Microsoft Corp.
168 ● Module II / Information Technologies multiple locations, any change to the data in one location must somehow be updated in all other locations. This can be accomplished in one of two ways: replication or duplication. Updating a distributed database using replication involves using a specialized soft- ware application that looks at each distributed database and then finds the changes made to it. Once these changes have been identified, the replication process makes all of the distributed databases look the same by making the appropriate changes to each one. The replication process is very complex and, depending upon the number and size of the distributed databases, can consume a lot of time and computer resources. The duplication process, in contrast, is much less complicated. It basically identi- fies one database as a master and then duplicates that database at a prescribed time af- ter hours so that each distributed location has the same data. One drawback to the duplication process is that no changes can ever be made to any database other than the master to avoid having local changes overwritten during the duplication process. Nonetheless, properly used, duplication and replication can keep all distributed locations current with the latest data. One additional challenge associated with distributed databases is the extra com- puting power and bandwidth necessary to access multiple databases in multiple loca- tions. We will look more closely at the issue of bandwidth in Chapter 6 when we focus on telecommunications and networks.External Databases Access to a wealth of information from external databases is available for a fee from commercial online services, and with or without charge from many sources on the World Wide Web. Websites provide an endless variety of hyperlinked pages of multi- media documents in hypermedia databases for you to access. Data are available in the form of statistics on economic and demographic activity from statistical databanks. Or you can view or download abstracts or complete copies of hundreds of newspapers, magazines, newsletters, research papers, and other published material and other peri- odicals from bibliographic and full text databases. Whenever you use a search engine like Google or Yahoo to look up something on the Internet, you are using an external database—a very, very large one!Hypermedia The rapid growth of websites on the Internet and corporate intranets and extranets hasDatabases dramatically increased the use of databases of hypertext and hypermedia documents. A website stores such information in a hypermedia database consisting of hyper- linked pages of multimedia (text, graphic, and photographic images, video clips, audio segments, and so on). That is, from a database management point of view, the set of interconnected multimedia pages at a website is a database of interrelated hypermedia page elements, rather than interrelated data records . Figure 5.16 shows how you might use a Web browser on your client PC to connect with a Web network server. This server runs Web server software to access and transfer theFIGURE 5.16 The components of a Web-based information system include Web browsers,servers, and hypermedia databases. The Internet Intranets Web Extranets HTML Browser Web XML Server Web Pages Software Image Files Video Files Audio Files Client PCs Network Hypermedia Server Database
Chapter 5 / Data Resource Management ● 169FIGURE 5.17 The components of a complete data warehouse system.Operational, External,and Other Databases Analytical Data Store Data Enterprise Management Warehouse Data Marts Data Acquisition Data Analysis (Capture, clean, (Query, report, transform, transport, analyze, mine, load/apply) deliver) Metadata Metadata Directory Management Warehouse Metadata Repository Web Information Design SystemsSource: Adapted courtesy of Hewlett-Packard. Web pages you request. The website illustrated in Figure 5.17 uses a hypermedia database consisting of Web page content described by HTML (Hypertext Markup Language) code or XML (Extensible Markup Language) labels, image files, video files, and audio. The Web server software acts as a database management system to manage the transfer of hypermedia files for downloading by the multimedia plug-ins of your Web browser.Data A data warehouse stores data that have been extracted from the various operational, external, and other databases of an organization. It is a central source of the data thatWarehouses have been cleaned, transformed, and cataloged so they can be used by managers andand Data other business professionals for data mining, online analytical processing, and otherMining forms of business analysis, market research, and decision support. (We’ll talk in depth about all of these activities in Chapter 9.) Data warehouses may be subdivided into data marts, which hold subsets of data from the warehouse that focus on specific aspects of a company, such as a department or a business process. Figure 5.17 illustrates the components of a complete data warehouse system. No- tice how data from various operational and external databases are captured, cleaned, and transformed into data that can be better used for analysis. This acquisition process might include activities like consolidating data from several sources, filtering out un- wanted data, correcting incorrect data, converting data to new data elements, and aggregating data into new data subsets. This data is then stored in the enterprise data warehouse, from where it can be moved into data marts or to an analytical data store that holds data in a more useful form for cer- tain types of analysis. Metadata (data that defines the data in the data warehouse) is stored in a metadata repository and cataloged by a metadata directory. Finally, a variety of ana- lytical software tools can be provided to query, report, mine, and analyze the data for delivery via Internet and intranet Web systems to business end users. See Figure 5.18. Revenue: Closing In the late 1990s the state of Iowa had a tax gap, a polite way of describing compa- the Gap with a nies and individuals who either didn’t file state tax returns or who underreported their earnings. To identify noncompliant taxpayers, the Iowa Department of Data Warehouse Revenue and Finance (IDRF) relied on a jumble of nonintegrated mainframe applications, file extracts, and over 20 disparate stand-alone systems (databases,