SlideShare a Scribd company logo
1
Business Intelligence
March 22nd
1999
Prepared by Federico Schiavio
2
Table of Contents
Summary ………………………3
Introduction ………………………4
Designing Databases for End-User Access ………………………8
TENET Business Intelligence ………………………10
Pbar & Patcom ………………………11
Pbar ………………………12
Pbar Consolidated Version ………………………15
Interfacing Applications ………………………18
Current Enhancement Project ………………………19
Pbar & Patcom ………………………20
Patcom ………………………21
Proposed Project ………………………24
Growth Estimates ………………………25
Methodology ……………………….29
Additional Enhancements ……………………….33
Testimonials ………………………..37
Conclusion ………………………..39
3
Summary
Why are databases not designed right from the start for end-user-access? Wouldn’t it be a
lot easier to implement query products if the data were built with the users in mind?
Could it be that both the production databases and the end-user databases could be the
same databases if the database design were for both? Or is there too much of a difference
between the requirements of the users and the requirements of the production systems?
I will examine each of these questions in detail to formulate a strategy on how and when
to design databases for end-user access. I will then introduce TENET’s existing business
intelligence system, together with the current enhancement project, which will be
completed on March 31st
, 1999. Subsequently, I will list what I believe to be major
deficiencies in both the existing system and the newly enhanced one. Finally, I will
outline my vision of a universal data warehouse architecture, using TENET as the
prototype.
4
Introduction
The basic notion of production databases is to create atomic row designs with single fact
rows according to third normal form normalization. The sparse rows designed in this
process do not naturally lend themselves to end-user access. There are several reasons for
this: The more data is spread between multiple tables with various relationships, the less
intuitive is the meaning of the data. And the more tables involved in a database, the more
complex functions such as join are required to first put the data in the proper shape.
A simple demographic database for a fraternal association can be used as an example to
demonstrate this. See (Figures 1 and 2) for row layouts.
Figure 1. Fraternal Organization Database
Personal Residence Occupation
MEMB# RESCODE OCCODE
NAME RESNAME OCDESCRIPTION
RESCODE ADDRESS1 etc.
OCCODE ADDRESS2
Figure 2. Denormalized Fraternal Organization Database
PersonalInfo
MEMB#
NAME
RESCODE
RESNAME
ADDRESS1
ADDRESS2
OCCODE
OCDESCRIPTION
Assume that in this organization, one to 20 people live in the same residence owned by
the organization on behalf of the members. If the address information were kept in the
personnel rows, there would be database anomalies in the form of transitive
dependencies, since the address information depends on the residence not on the person.
So the natural solution is to split the personnel row into both a residence row and a
personnel row. Now, if we are storing the various occupations in which members may be
employed, many of the members can be employed in the same occupation, or the same
company. We may elect to segregate occupation information because it does not depend
on the person but on the occupation. In this case, we would have a personnel table, a
residence table, and an occupation table.
To print a report that shows all of the nurses in the organization with the occupation data
included along with the address of the employer and the member’s name and address, we
would need to join three tables together. Although joining such data is not very difficult
for a trained professional, it is not a talent the average user possesses. Consequently,
there would be an undue degree of difficulty for the user to effect database joins to
produce simple reports.
5
Ideally, all of the data that a user desires should be in one row. In our demographic
database, is this possible? Yes, certainly, the data for residences and occupations can be
packed into the same row that contains the member’s name. However, this design is not
suitable for the production system since it would require non-normalized data. But this
design would be ideal for the end-user database because it allows for all data to be
queried without requiring complex functions such as database joins.
Is the solution someplace in-between? In this situation, there is not much that can be in-
between. There are multiple entities each demanding their own database table for
production purposes, yet each wanting to be combined into a larger entity to enable end-
user access. The potential relational database solution for this phenomenon is called a
view. A view is simply a prepackaged projection of a database providing a different look
than the physical data would normally dictate.
So, in the above scenario, a row containing all the required data can typically be shaped
and provide the desired view of the data for the end users. If this is so easy, then why is it
not always done? Let’s ask a hypothetical IS manager why this is not always done, since
it seems like it should be part of production system design.
Typical IS manager: “Well, that’s easy for you to say! My first responsibility around here
is to develop and maintain applications that provide major value for the organization.
What I work on is prioritized by the steering committee and there is no time left to be
worrying about what would be good for end-user access. Besides, if I start throwing join
logical tables all over my production database to support ad-hoc queries from end-users,
the maintenance of these indexes will slow down my production system… and that is
even if there never is one query actually run. Phew! If they actually run queries against
those joined tables; I’ve got an even bigger problem. How can I keep my less than one
second response time promise if I don’t even know who will be using the system. Sure.
I’d like to help, but there are too many opportunities for this thing to fail… and bring me
down with it!
In different words perhaps, but many IS managers would echo those sentiments exactly.
They have a difficult balancing act to perform. They are expected to provide high-quality
clerical function with great performance characteristics. They accept the challenge and do
their jobs on a daily basis. It is not that IS does not want to be part of the team and
provide the users with whatever they want. The very mission of IS is to please. But the
production mission and the user mission are at odds with each other. From the dialogue
above, we can see two fundamental problems in trying to design and implement
databases for production end-user access:
If end –user access were treated as its own application, or at a minimum was woven into
a production system’s set of requirements, it could be accommodated more readily.
End-User access is not treated as an application.
Accommodating end-user access can create major performance
problems with production users and query users.
6
However, very few assimilate this into design thinking. Management never suggests that
we treat it as an application, so, by default, end user access is an afterthought. It is
assumed to be a by-product of application system design, not something that should be
the object of system design. And this is exactly what the end user gets… by-products.
Would we ever consider designing two databases, one for the production application and
one for end-user access? How could we? We just spent all of our database lives living by
one of the major reasons for database: avoiding data redundancy and duplicity. How
could we possibly conceive that the solution to any database problem would be to
duplicate the data or the design? Instead, we very efficiently label the production
database and move on to the next application. If the database design does not fit the end-
user’s requirements, that’s a problem for another day, a day that hopefully will never
come because we are now on to the next application.
But what if the steering committee includes end-user access as one of the requirements of
the application. What would we do? Where would we start? Would we just plan to build
the production database as in the past and declare the end-user job to be done when the
production system goes live? Or would we take our database design to third normal form
for production, and then work with the users to define the best possible views of the data
for ad-hoc queries. Hopefully we would do the latter!
But we do not have to wait for the steering committee to say that end-user access is a
requirement. It is our job to suggest new ways of doing things. We are the change agents.
We intuitively understand this. But heretofore, we have not taken the initiative to justify
the additional design and implementation time necessary to build the proper shaped data
for end users into our production designs. Consequently, we do not design and code the
join logical table rows to support end-user access. If these were part of the perceived real
application requirement set, we would factor their impact on time and performance, and
would propose different time projections and hardware requirements than that necessary
for only the production system.
The fact is that end-user access must be treated as a real application to get real results.
We cannot very well design an invoicing system and propose hardware that would
perform well enough to calculate all but the invoice total. This would be at best
incomplete, at worst useless. We wouldn’t be satisfied with something that almost did the
job! So also with end-user access. If we put nothing into planning for it and building the
proper structures to support it, then we can likewise expect to get nothing out of it.
It can be argued that the paradigm of the information age was brought on by a shift in
emphasis from the clerical benefit derived from an application to the information rewards
that can be gained by harnessing the power of all the information collected on behalf of
these clerical applications. We made the shift from data processing to management
information systems almost solely on the backs of programmers. The users demand
results and the results are delivered by MIS in the form of report programs.
7
Now, the industry experts have theoretically carved out and differentiated some
potentially new paradigms, these being Decision Support Systems (DSS) and Executive
Support Systems (ESS). However, the world of information processing in reality has not
yet made the shift. The primary focus today is MIS with lip service to DSS and ESS.
End-user access is part of this lip service. Companies budget and buy tools to provide ad-
hoc information needed by knowledge workers. But there is rarely an end-user project
associated with the purchased tool. The purchase of the tool is an acknowledgement that
management is serious about a solution, but there is no associated funding for the proper
design or re-engineering of the application’s database. And MIS has not caught on to the
fact that end-user access must be treated as another application to succeed. Once we do,
then we will allocate the proper resources for its successful implementation, just as we
allocate the resources necessary for the production applications.
If we were to begin immediately to treat end-user access as another application, we
would be forced to devise some innovative ways of assessing its performance impact on
the production applications in concert with the end-user access application. Although this
would be a difficult task, such work could help determine what additional capacity and
power would be necessary to support end-user computing. This would have the double
benefit of providing a more accurate cost for this service, plus it would give management
the opportunity to vote yes or no without thinking that end-user access was free.
Moreover, if the additional hardware to support the users were installed, there would be
little reason for IS to be as concerned about the impact of the users on the system.
Analysts know that the mere presence of logical tables adds overhead to applications. We
also know that if there is a reasonable amount of queries against these logical views,
those views should be given immediate maintenance. But when we give views immediate
access maintenance, we also add a burden to each and every production transaction since,
in addition to doing its normal work, it must also carry the performance hit of access path
maintenance (index updating) for the end-user system. How much better are we therefore
to recognize end-user access as a valuable application of its own with an associated
system burden? In this way, we can have the horsepower necessary without the fear that
unfunded queries will be the undoing of an otherwise effective IS manager.
And so, if end-user access is treated as an application, and its potential performance
impact is factored into the decision to move forward, there are great prospects for
resounding success. If, on the other hand, end-user access continues to be allowed to be
the leftover potential of an under-designed production application, it will continue to
haunt IS management. Until someone, perhaps even a PC heritage person without the
outmoded beliefs that data normalization rules all, gladly will take over the reins. And
when this time comes, the DSS and ESS driven information paradigms will have begun
the shift.
Given the charge to produce well-designed tables and rows for end-users, certain design
criteria should be followed with the major objective to make it easier for the user.
8
Designing Databases for End-User Access
Minimize number of separate tables and eliminate all multitable dependencies.
Going back to the demographic database issue presented at the beginning of this
document, it does not require a substantial amount of thought to conclude that it would be
easier to access data such as name, address, and occupation description from one row
rather than three. If a user faces three times the number necessary to do the job, his or her
productivity will be impacted by more than a factor of three. Instead of concentrating on
the data to be queried, a user joining tables must be concerned about how a join works,
and whether the product supports inner, outer, natural, or other join types. Who cares?
Not the user looking for data, that’s for sure! Let the MIS department worry about the
data, and let the user worry about getting information.
Make attributes similar to archival data.
End-user data is often a combination of master and transaction data. Typical transaction
table row layouts are sparse, at best. In the production system, once the master row has
been accessed, take the information and place it in the transaction row for better archival
information. Such end-user rows by definition must be designed to be comprehensive.
Design to first normal form.
Repeating groups are not conducive to production data nor are they conducive to end-
user access. Design the data for users to the first normal form, but do not go any further.
If the first normal form of the data can be achieved by pre-joining logicals, this is an
effective and easy way to test the validity of the row design without a major amount of
effort. Keep as much data in each row as possible without compromising the one-to-one
data element to key relationship. If logicals do not do the trick, a physical table can be
extracted periodically from all of the underlying production data sources to provide an
effective row layout for user queries. This also has the benefit of being a great performer.
Design with complete information.
Along the way to single fact rows, production databases are split and split and split again.
Related one-to-one attributes like customer data and balance data can be designed into
one row with no production loss.
Design completed rows.
There are two reasons to design completed rows. First, the objective of an end-user
database is to make it easier. Completed rows make it easier too. The second reason is to
enhance performance. It is better to capture the customer name into the transaction table
rather than access the customer master each time it must be retrieved. Also, calculations,
such as extended price, can be performed once during production processing, and the
results can be stored in completed rows rather than performing the calculations each time
the row is read. Besides helping the end user access data, this approach also helps the
production system run more efficiently.
9
Capture point-in-time data.
If a piece of data, such as the price in a transaction row, is dependent on the price in the
master row, the price we pay today will be reported as a different price in the future as
the price in the master row changes. This design suggestion is related to completed rows
above. But the intent of this is to assure the accuracy of data through time. When a
transaction occurs involving price and/or discount, it is good systems design to capture
the point-in-time values for price and discount, rather than rely on the master row, or a
calculation to provide such data. This assures the constancy of data.
Avoid ubiquitous codes.
Poorly codified data is another reason why normal users find production data difficult to
use. When time is spent developing self-evident codes such as M for male and F for
female, the user’s job is simplified. Contrast this to the design that codes male as a10 and
female as an 11. The more intuitive the coding structure, the easier it will be for users to
access and select data in meaningful ways.
Give columns meaningful names and descriptions.
One of the advantages of column names I learned was that you could call a column
anything. If you chose to call the address column COW3, and you knew what COW3
stood for, you were golden … and it would work. Don’t forget about security. It offered
security since nobody could guess what COW3 or SEGGH meant in a million years. (I
guess that was job security.) Just as we want to pick meaningful codes for the contents of
our columns, we want to pick meaningful names and descriptions for our columns. End
users like to know what the data elements are in a table. It does not cost much more to
use a good column heading or some nice text to help a user understand the intent and
purpose of a column.
Expand codes to meaningful text.
In joined rows or in the building of new physical rows to support end users, take the
codes and create a code table or table. When doing the join for the user, also join to the
code tables. In this manner, through the joined logical table or through an extraction, the
description of the codes can also be included in the user’s row layout. In the earlier
example, the occupation code in our demographic table could be expanded through a join
or an extraction to also include the description of the occupational code. This gives users
more meaningful data with much less work than performing the joins themselves. If the
code tables do not exist, build them. They are worth the investment both for query and for
further documenting the production system. I would also suggest leaving the codes in the
row design for narrow report queries and deeper analysis.
Of course, we should always use the basic principles of good system design, which
suggest that we start with requirements first. Since end-user requirements are always in
the form of report and display outputs, we use this as a staring point to assure that our
well-designed, first normal form rows provide the information for the end users in a form
they can easily use. In most end-user design projects, however, we are not alone. There is
already a production system in place that maintains the data our users wish to access. The
next part expands the design techniques we have discussed to apply to the most common
databases of all: existing databases.
10
TENET Business Intelligence
TENET’s business intelligence is obtained from the data of the fifty-nine PBAR and
thirty-three PATCOM hospitals it owns. This data is contained in a combination of
normalized and denormalized tables residing in one hundred fifty-one collections. These
collections are subdivided between PBAR and PATCOM as follows.
PBAR is represented by one hundred eighteen collections. Fifty-nine containing Online
Transaction Processing (OLTP) databases, and fifty-nine containing primary repository
databases, distributed over seven production AS/400s.
PATCOM is represented by thirty-three collections containing both OLTP and primary
repository databases on one non-production AS/400.
In addition to this, we have a consolidated PBAR version, obtained by merging the fifty-
nine OLTP collections, at the table level, into a unique collection residing on a separate
non-production AS/400. To summarize, each collection is equivalent to a data warehouse
therefore TENET’s business intelligence is composed of one hundred fifty-two data
warehouses across nine systems. End-user analysis against these data warehouses is made
possible by four distinct applications. Being familiar with the adage that a picture is
worth a thousand words, I will breakdown the architecture into the following levels of
ever increasing visual detail.
PBAR & PATCOM
 Enterprise Level
PBAR
 System Level
 Data Warehouse Level
PBAR Consolidated Version
 System Level
 Data Warehouse Level

PATCOM
Interfacing Applications
 Showcase Vista
 CASEMIX Reports
 PQS
 Cost Accounting

PATCOM will be detailed in the “Current Enhancement Project” portion of this document
11
PBAR & PATCOM
HDCA
59 hospitals
1
OLTP
Interfacing
Applications
PBAR system Consolidated version, 1 non-
production AS/400, 59 OLTP warehouses merged
into one.
PATCOM system, 1 non-production AS/400,
33 OLTP/PR warehouses.
DAAC
33 hospitals
33
OLTP/PR
Interfacing
Applications
ENTERPRISE LEVEL NETWORK
HDCF
12 hospitals
12
OLTP
MODB
12 hospitals
12
OLTP
USCA
8 hospitals
8
OLTP
SIEB
2 hospitals
2
OLTP
HOLA
8 hospitals
8
OLTP
DHFB
7 hospitals
7
OLTP
MEAB
10 hospitals
10
OLTP
Interfacing
Applications
Interfacing
Applications
Interfacing
Applications
Interfacing
Applications
Interfacing
Applications
Interfacing
Applications
Interfacing
Applications
PBAR system, 7 production AS/400s, 59 hospitals, 59 OLTP warehouses, and 59 primary repository
warehouses.
12
PR
12
PR
8
PR
2
PR
8
PR
7
PR
10
PR
12
PBAR
System Level
The following depicts the AS/400 (DHFB), system. Because the architecture at the
system and warehouse level is identical for all PBAR hospitals, any PBAR AS/400 could
have been chosen to represent the following.
The following table lists the fourteen warehouses and the seven hospitals they represent,
residing on AS/400 (DHFB) with their respective storage requirements in bytes.
Figure 3. AS/400 DHFB data warehouses
Hospital Collection Size in Bytes Purpose
Trinity DATRI 952,636,024 Normalized end-user access with joins (OLTP)
DATRICDD 239,185,120 Denormalized end-user access (Primary Repository)
Memorial DADED 767,639,552 Normalized end-user access with joins
DADEDCDD 133,165,056 Denormalized end-user access
Doctor’s DADHF 1,165,406,208 Normalized end-user access with joins
DADHFCDD 190,091,264 Denormalized end-user access
Harton DAHAR 741,367,808 Normalized end-user access with joins
DAHARCDD 150,687,744 Denormalized end-user access
Methodist DAJON 440,741,888 Normalized end-user access with joins
DAJONCDD 92,037,120 Denormalized end-user access
Medical Center DAMAH 221,548,544 Normalized end-user access with joins
DAMAHCDD 51,011,584 Denormalized end-user access
University DAUNV 1,177,239,552 Normalized end-user access with joins
DAUNVCDD 229,093,376 Denormalized end-user access
DHFB
7 hospitals
7
OLTP
Interfacing
Applications
7
PR
13
Warehouse Level
Figure 4 lists the tables that make up the OLTP warehouse representing Trinity hospital.
Figure 4.
Object Type Collection Attribute Text
ABSTRACT *TABLE DATRI PF DA: Patient Abstract table.
ACTIVITY *TABLE DATRI PF CA: Activity Master
ACTIVJOIN1 *JOIN DATRI LF DA: VISIT/CHARGES/ACTIVITY
APRDESC *TABLE DATRI PF APRDRG Description Table
BROKER S *TABLE DATRI PF DA: Broker Table Table
CDMDESC *TABLE DATRI PF DA: CDM description table
CHARGES *TABLE DATRI PF DA: Patient Charges
CLINIC *TABLE DATRI PF DA: Clinic Code Table
CLINSPTY *TABLE DATRI PF DA: CMM Clinical Specialty
CMMPAYORS *TABLE DATRI PF DA: CMM Payor Group
COSTCTR *TABLE DATRI PF DA: Cost center name
CPT4SURG *TABLE DATRI PF DA: Patient Surgical CPT4
DEMOG *TABLE DATRI PF DA: Patient Demographics
DIAGDESC *TABLE DATRI PF DA: Diagnosis description
DIAGL1 *VIEW DATRI LF DA: Patient Diagnosis by Di
DRGDESC *TABLE DATRI PF DA: DRG Descriptions Table
DRGWR *TABLE DATRI PF DA: DRG Weight & Rate Table
EDLOG *TABLE DATRI PF DA: Emergency Department Lo
FINSUM *TABLE DATRI PF DA: Patient Visit Financial
FUR *TABLE DATRI PF DA: Patient notes detail.
ICD9DIAG *TABLE DATRI PF DA: Patient Diagnosis
ICD9PROC *TABLE DATRI PF DA: Patient Procedure
MDCDESC *TABLE DATRI PF MDC DescriptionTable
MDTABLE *TABLE DATRI PF DA: Physician Table
MDTABLL1 *VIEW DATRI LF DA: Physican Group Code
NC2625P *TABLE DATRI PF MaCS: Work table for program
NONSTFMD *TABLE DATRI PF DA: Patient Physician (Non-
PATDIAG *TABLE DATRI PF DA: All patient diagnosis c
PATINS *TABLE DATRI PF DA: Patient Insurance
PATINSL1 *VIEW DATRI LF DA: Patient Insurance by Pl
PATMDS *TABLE DATRI PF DA: All Patient Physicians
PATPHYS *TABLE DATRI PF DA: Patient Physician
PATPROC *TABLE DATRI PF DA: All patient procedure c
PATTYPE *TABLE DATRI PF DA: Patient type table table
PAYCDDES *TABLE DATRI PF DA: CMM Payor Code Descript
PAYGPDES *TABLE DATRI PF DA: CMM Payor Group Descrip
PAYMENT *TABLE DATRI PF DA: Patient Account Payment
PHYSL1 *VIEW DATRI LF DA: Patient Physician by Ph
PROCDESC *TABLE DATRI PF DA: Procedure description
PROCL1 *VIEW DATRI LF DA: Patient Procedure by Pr
REHABGEN *TABLE DATRI PF DA: Rehab General
REHABREF *TABLE DATRI PF DA: Rehab Referring Facilit
14
Figure 4. continued.
Object Type Collection Attribute Text
REHABTRN *TABLE DATRI PF DA: Rehab Transferring Facility
VISIT *TABLE DATRI PF DA: Patient Visit
VISITJOIN1 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS/CHA
VISITJOIN2 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS
VISITJOIN3 *JOIN DATRI LF DA: VISIT/PATPHYS/ICD9DIAG
VISITJOIN4 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS
VISITJOIN5 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN6 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN7 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN8 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS/CHA
VISITL1 *VIEW DATRI LF DA: Patient Visit by DRG
VISIT2 *VIEW DATRI LF DA: Patient Visit in Discharge
VISIT3 *VIEW DATRI LF DA: Patient Visit in MRC#
Figure 5 lists the tables that make up the primary repository warehouse representing
Trinity hospital.
Figure 5.
Object Type Collection Attribute Text
PATACTV *TABLE DATRICDD PF CDD: Patient Information Active
PATFULL *TABLE DATRICDD PF CDD: Patient Information Full
PATLIMT *TABLE DATRICDD PF CDD: Patient Information Limited
PATROOM *TABLE DATRICDD PF CDD: Patient Info Room
.
15
PBAR CONSOLIDATED VERSION
System Level
The following depicts the AS/400 (HDCA) system.
The following table lists the consolidated PBAR warehouse representing fifty-nine
hospitals, residing on AS/400 (HDCA) with its storage requirement in bytes. The primary
repositories have not been consolidated, nor do they exist on this system. And there is no
plan to do so that I am aware of.
Figure 6. AS/400 HDCA consolidated data warehouse
Hospital Collection Size in Bytes Purpose
PBAR DACONS 22,123,130,880 Normalized end-user access with joins (OLTP)
HDCA
59 hospitals
1
OLTP
Interfacing
Applications
16
Warehouse Level
Figure 7 lists the tables that make up the consolidated OLTP warehouse representing
fifty-nine hospitals.
Figure 7.
Object Type Collection Attribute Text
ABSTRACT *TABLE DACONS PF DA: Patient Abstract table.
ACTIVITY *TABLE DACONS PF CA: Activity Master
ACTIVJOIN1 *JOIN DACONS LF DA: VISIT/CHARGES/ACTIVITY
APRDESC *TABLE DACONS PF APRDRG Description Table
BROKER S *TABLE DACONS PF DA: Broker Table Table
CDMDESC *TABLE DACONS PF DA: CDM description table
CHARGES *TABLE DACONS PF DA: Patient Charges
CLINIC *TABLE DACONS PF DA: Clinic Code Table
CLINSPTY *TABLE DACONS PF DA: CMM Clinical Specialty
CMMPAYORS *TABLE DACONS PF DA: CMM Payor Group
COSTCTR *TABLE DACONS PF DA: Cost center name
CPT4SURG *TABLE DACONS PF DA: Patient Surgical CPT4
DEMOG *TABLE DACONS PF DA: Patient Demographics
DIAGDESC *TABLE DACONS PF DA: Diagnosis description
DIAGL1 *VIEW DACONS LF DA: Patient Diagnosis by Di
DRGDESC *TABLE DACONS PF DA: DRG Descriptions Table
DRGWR *TABLE DACONS PF DA: DRG Weight & Rate Table
EDLOG *TABLE DACONS PF DA: Emergency Department Lo
FINSUM *TABLE DACONS PF DA: Patient Visit Financial
FUR *TABLE DACONS PF DA: Patient notes detail.
ICD9DIAG *TABLE DACONS PF DA: Patient Diagnosis
ICD9PROC *TABLE DACONS PF DA: Patient Procedure
MDCDESC *TABLE DACONS PF MDC DescriptionTable
MDTABLE *TABLE DACONS PF DA: Physician Table
MDTABLL1 *VIEW DACONS LF DA: Physican Group Code
NC2625P *TABLE DACONS PF MaCS: Work table for program
NONSTFMD *TABLE DACONS PF DA: Patient Physician (Non-
PATDIAG *TABLE DACONS PF DA: All patient diagnosis c
PATINS *TABLE DACONS PF DA: Patient Insurance
PATINSL1 *VIEW DACONS LF DA: Patient Insurance by Pl
PATMDS *TABLE DACONS PF DA: All Patient Physicians
PATPHYS *TABLE DACONS PF DA: Patient Physician
PATPROC *TABLE DACONS PF DA: All patient procedure c
PATTYPE *TABLE DACONS PF DA: Patient type table table
PAYCDDES *TABLE DACONS PF DA: CMM Payor Code Descript
PAYGPDES *TABLE DACONS PF DA: CMM Payor Group Descrip
PAYMENT *TABLE DACONS PF DA: Patient Account Payment
PHYSL1 *VIEW DACONS LF DA: Patient Physician by Ph
PROCDESC *TABLE DACONS PF DA: Procedure description
PROCL1 *VIEW DACONS LF DA: Patient Procedure by Pr
REHABGEN *TABLE DACONS PF DA: Rehab General
REHABREF *TABLE DACONS PF DA: Rehab Referring Facilit
17
Figure 7, continued.
Object Type Collection Attribute Text
REHABTRN *TABLE DACONS PF DA: Rehab Transferring Facility
VISIT *TABLE DACONS PF DA: Patient Visit
VISITJOIN1 *JOIN DACONS LF DA: VISIT/FINSUM/PATINS/CHA
VISITJOIN2 *JOIN DACONS LF DA: VISIT/FINSUM/PATINS
VISITJOIN3 *JOIN DACONS LF DA: VISIT/PATPHYS/ICD9DIAG
VISITJOIN4 *JOIN DACONS LF DA: VISIT/FINSUM/PATINS
VISITJOIN5 *JOIN DACONS LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN6 *JOIN DACONS LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN7 *JOIN DACONS LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN8 *JOIN DACONS LF DA: VISIT/FINSUM/PATINS/CHA
VISITL1 *VIEW DACONS LF DA: Patient Visit by DRG
VISIT2 *VIEW DACONS LF DA: Patient Visit in Discharge
VISIT3 *VIEW DACONS LF DA: Patient Visit in MRC#
18
Interfacing Applications
The following applications currently interface with the TENET data warehouse and are
supported by us:
 Showcase Vista
Third party PC based end-user analysis tool.
 CASEMIX Reports
Homegrown menu driven reporting system, if users require modifications of existing
reports, programmer intervention is required. Ranging from modifying an existing one to
creating one from scratch.
 PQS System
Homegrown menu driven reporting system that allows users to play out “what if
scenarios”.
 Cost Accounting
Homegrown menu driven reporting system for cost accounting purposes.
None of these systems actually modify the data residing in the individual collections that
make up TENET’s business Intelligence. This creates the opportunity for us to re-design
the current data warehouse architecture to better take advantage of the hardware,
homegrown applications, PC based end-user tools, and to create more powerful
applications.
The re-design is addressed with in-depth detail in the “Proposed Project” portion of this
document.
19
Current Enhancement Project
Due to the negative impact on production response times brought about by the PBAR
warehouses residing on production systems, an enhancement project is currently being
undertaken to move the PBAR warehouses off their current production systems, and on to
the non-production PATCOM system.
Before we proceed any further, I would like to state for the record, that I was not
involved in any way, in any and all phases of this project.
Because PATCOM’s primary repository and OLTP databases are located within the same
collection, PBAR will be modified accordingly to create a consistent architecture. The
PBAR primary repository databases will be placed into their respective OLTP
collections, and the primary repository collections will be removed. Consequently the
enterprise wide network will be reduced to two non-production AS/400s, the number of
warehouses will be reduced to ninety-three, and the consolidated warehouse will remain
as is. Confusing? Don’t worry, visual aid is a page away.
PATCOM
 Enterprise Level
 System Level
 Data Warehouse Level
 Estimates
20
PBAR & PATCOM
HDCA
59 hospitals
1
OLTP
Interfacing
Applications
PBAR system Consolidated version, 1
non-production AS/400, 59 OLTP warehouses
merged into one.
PATCOM/PBAR system, 1 non-
productionAS/400, 92 OLTP/PR
warehouses.
DAAC
92 hospitals
92
OLTP/PR
Interfacing
Applications
ENTERPRISE LEVEL NETWORK
HDCF
12 hospitals
12
OLTP
MODB
12 hospitals
12
OLTP
USCA
8 hospitals
8
OLTP
SIEB
2 hospitals
2
OLTP
HOLA
8 hospitals
8
OLTP
DHFB
7 hospitals
7
OLTP
MEAB
10 hospitals
10
OLTP
Interfacing
Applications
Interfacing
Applications
Interfacing
Applications
Interfacing
Applications
Interfacing
Applications
Interfacing
Applications
Interfacing
Applications
PBAR system, 7 production AS/400s, 59 hospitals, 59 OLTP warehouses, and 59 primary
repository warehouses.
12
PR
12
PR
8
PR
2
PR
8
PR
7
PR
10
PRPBAR system merged with PATCOM system on to AS/400 (DAAC).
21
PATCOM
System Level
The following depicts the AS/400 (DAAC) system.
For the sake of keeping it short the following table lists the same sample of seven
hospitals depicted earlier at the PBAR system level. The increased size reflects the
inclusion of the primary repository databases in the OLTP collections.
Figure 8. AS/400 DAAC merged data warehouses
Hospital Collection Size in Bytes Purpose
Trinity DATRI 1,191,821,144 Normalized and denormalized end-user access with joins
Memorial DADED 823,873,536 Normalized and denormalized end-user access with joins
Doctor’s DADHF 1,216,704,512 Normalized and denormalized end-user access with joins
Harton DAHAR 818,868,224 Normalized and denormalized end-user access with joins
Methodist DAJON 527,355,904 Normalized and denormalized end-user access with joins
Medical Center DAMAH 244,506,624 Normalized and denormalized end-user access with joins
University DAUNV 1,278,439,424 Normalized and denormalized end-user access with joins
DAAC
92 hospitals
92
OLTP/PR
Interfacing
Applications
22
Warehouse Level
Figure 9 lists the tables that make up of the OLTP warehouse representing Trinity
hospital, which now includes the merged primary repository tables.
Figure 9.
Object Type Collection Attribute Text
ABSTRACT *TABLE DATRI PF DA: Patient Abstract table.
ACTIVITY *TABLE DATRI PF CA: Activity Master
ACTIVJOIN1 *JOIN DATRI LF DA: VISIT/CHARGES/ACTIVITY
APRDESC *TABLE DATRI PF APRDRG Description Table
BROKER S *TABLE DATRI PF DA: Broker Table Table
CDMDESC *TABLE DATRI PF DA: CDM description table
CHARGES *TABLE DATRI PF DA: Patient Charges
CLINIC *TABLE DATRI PF DA: Clinic Code Table
CLINSPTY *TABLE DATRI PF DA: CMM Clinical Specialty
CMMPAYOR *TABLE DATRI PF DA: CMM Payor Group
COSTCTR *TABLE DATRI PF DA: Cost center name
CPT4SURG *TABLE DATRI PF DA: Patient Surgical CPT4
DEMOG *TABLE DATRI PF DA: Patient Demographics
DIAGDESC *TABLE DATRI PF DA: Diagnosis description
DIAGL1 *VIEW DATRI LF DA: Patient Diagnosis by Di
DRGDESC *TABLE DATRI PF DA: DRG Descriptions Table
DRGWR *TABLE DATRI PF DA: DRG Weight & Rate Table
EDLOG *TABLE DATRI PF DA: Emergency Department Lo
FINSUM *TABLE DATRI PF DA: Patient Visit Financial
FUR *TABLE DATRI PF DA: Patient notes detail.
ICD9DIAG *TABLE DATRI PF DA: Patient Diagnosis
ICD9PROC *TABLE DATRI PF DA: Patient Procedure
MDCDESC *TABLE DATRI PF MDC DescriptionTable
MDTABLE *TABLE DATRI PF DA: Physician Table
MDTABLL1 *VIEW DATRI LF DA: Physican Group Code
NC2625P *TABLE DATRI PF MaCS: Work table for program
NONSTFMD *TABLE DATRI PF DA: Patient Physician (Non-
PATDIAG *TABLE DATRI PF DA: All patient diagnosis c
PATINS *TABLE DATRI PF DA: Patient Insurance
PATINSL1 *VIEW DATRI LF DA: Patient Insurance by Pl
PATMDS *TABLE DATRI PF DA: All Patient Physicians
PATPHYS *TABLE DATRI PF DA: Patient Physician
PATPROC *TABLE DATRI PF DA: All patient procedure c
PATTYPE *TABLE DATRI PF DA: Patient type table table
PAYCDDES *TABLE DATRI PF DA: CMM Payor Code Descript
PAYGPDES *TABLE DATRI PF DA: CMM Payor Group Descrip
PAYMENT *TABLE DATRI PF DA: Patient Account Payment
PHYSL1 *VIEW DATRI LF DA: Patient Physician by Ph
PROCDESC *TABLE DATRI PF DA: Procedure description
PROCL1 *VIEW DATRI LF DA: Patient Procedure by Pr
REHABGEN *TABLE DATRI PF DA: Rehab General
REHABREF *TABLE DATRI PF DA: Rehab Referring Facilit
23
Figure 9. continued.
Object Type Collection Attribute Text
REHABTRN *TABLE DATRI PF DA: Rehab Transferring Facility
VISIT *TABLE DATRI PF DA: Patient Visit
VISITJOIN1 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS/CHA
VISITJOIN2 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS
VISITJOIN3 *JOIN DATRI LF DA: VISIT/PATPHYS/ICD9DIAG
VISITJOIN4 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS
VISITJOIN5 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN6 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN7 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN8 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS/CHA
VISITL1 *VIEW DATRI LF DA: Patient Visit by DRG
VISIT2 *VIEW DATRI LF DA: Patient Visit in Discharge
VISIT3 *VIEW DATRI LF DA: Patient Visit in MRC#
PATACTV *TABLE DATRI PF CDD: Patient Information Active
PATFULL *TABLE DATRI PF CDD: Patient Information Full
PATLIMT *TABLE DATRI PF CDD: Patient Information Limited
PATROOM *TABLE DATRI PF CDD: Patient Info Room
1
1
Included primary repository tables
24
Proposed Project
To create a data warehouse architecture that is independent of the platform upon which it
resides, and that takes advantage of the hardware to the utmost.
The platform independent architecture can be achieved by creating collections; base
tables, views, and indexes using Structured Query Language (SQL). Taking advantage of
the hardware is relative to the hardware platform itself. In light of this, I will list a few
highlights regarding AS/400 hardware, and remand the detail to a future document.
SQL is an industry-standard language for defining and manipulating data contained in a
relational database. An IBM research lab developed SQL in the 1970s to explore an
implementation of the relational database model. Since that time, SQL has become a
widely used language that’s included in most relational Database Management Systems
(DBMS), including IBM’s family of DB2 products. Several national and international
standards organizations have published SQL standards, which the major relational DBMS
(including DB2/400) follow for their versions of SQL.
Two advantages come to mind when discussing SQL based architectures, one, is the fact
that with relatively few modifications, it can be transferred to other platforms. Two, it
defines views that allow you to query the structure of the database. What this means is,
that we can use of the shelf packages such as MS Access to automatically draw up a
database map, showing such information as primary keys, indexes, referential integrity,
and referential constraints.
Where the AS/400 is concerned, I truly believe it is the platform of choice for the
following reasons, (remember I promised to keep it short here). The main strengths are
in two areas. The first area is scalability. It continues to be the only database in the
industry that is fully 64-bit enabled. When you combine that with the new hardware that
is about to ship, especially when we look at main memory sizes, it has a tremendous
competitive advantage over the other people in the industry, especially in a data
warehouse environment. Today the machines come with 40 GB of main memory. In the
next 2-3 years, those machines will ship with half a terabyte of main memory. And that’s
something that’s possible only through 64-bit technology.
The other major area where there is a competitive advantage is ease of use and
administration. It’s fairly common knowledge that there aren’t many AS/400 installations
that have a database administrator. They just don’t require one. A lot of activities that a
normal database administrator would go through just aren’t done on the AS/400. You
manage users from a system perspective, not a database perspective. The majority of
things that you’d normally do as a DBA are fully automated and optimized by the system.
Let us begin by taking a look at some growth estimates and storage requirements.
25
Growth Estimates
Extrapolations
 Average Per Hospital
Figure 10 depicts a representative sample of seven of the ninety-two hospitals that make
up the TENET data warehouse and their size in bytes as of 3/1/1999. Currently we store
four years worth of data on-line, for each hospital.
Figure 10. Sampling of seven hospitals to extrapolate average hospital size.
Hospital Collection Size in Bytes Purpose
Trinity DATRI 1,191,821,144 Normalized and denormalized end-user access with joins
Memorial DADED 823,873,536 Normalized and denormalized end-user access with joins
Doctor’s DADHF 1,216,704,512 Normalized and denormalized end-user access with joins
Harton DAHAR 818,868,224 Normalized and denormalized end-user access with joins
Methodist DAJON 527,355,904 Normalized and denormalized end-user access with joins
Medical Center DAMAH 244,506,624 Normalized and denormalized end-user access with joins
University DAUNV 1,278,439,424 Normalized and denormalized end-user access with joins
Total 6,101,569,368 Sum of sampled hospitals.
Figure 11 extrapolates the average hospital size based on the seven sample hospitals.
Figure 11. Average hospital size, calculation table.
Calculation Value Result/Description
6,101,569,368 Sum of sampled hospitals.
Divide by 7 Number of sampled hospitals.
Equals 871,652,767 Extrapolated average of 870 Megs per hospital.
 Average Transaction Volume Per Hospital
Figure 12 extrapolates the percentage increase per month from transaction volume, for
the period 3/1/1999 to 4/1/1999, for Trinity hospital. This percentage will be used as a
median to calculate the yearly growth of TENET’s data warehouse from transaction
volume.
Figure 12. Trinity data warehouse monthly transaction volume increase, calculation table.
Calculation Value Result/Description
1,228,697,600 New Trinity Hospital size as of 4/1/1999.
Subtract 1,191,821,144 Old Trinity Hospital size as of 3/1/1999.
Equals 36,876,456 37 Megs increase per month from transaction volume.
Divide by 1,191,821,144 Old Trinity Hospital Size.
Equals .031 Extrapolated average of 3.1% increase per month per hospital.
26
Current
 Size as of 3/1/1999
Figure 13 estimates the size of the TENET data warehouse as of 3/1/1999.
Figure 13. TENET data warehouse size, calculation table.
Calculation Value Result/Description
871,652,767 Estimated average of 870 Megs per hospital.
Multiply by 92 Number of TENET hospitals.
Equals 80,192,054,564 Estimated 80 Gigs TENET data warehouse size, as of 3/1/1999.
Estimated
Critical success factors for estimating growth are:
 Additional Hospitals
Figure 14 estimates the yearly growth of the TENET data warehouse from additional
hospitals. Currently we are adding 12 hospitals a year.
Figure 14. TENET data warehouse yearly additional hospitals size increase, calculation table.
Calculation Value Result/Description
871,652,767 Average of 870 Megs per hospital.
Multiply by 12 Number of additional hospitals per year.
Equals 10,459,833,204 Estimated 10.5 Gigs increase from additional hospitals per year.
 Transaction Volume
Figure 15 estimates the yearly growth of the TENET data warehouse from the transaction
volume, using the median calculated in figure 12.
Figure 15. TENET data warehouse yearly transaction volume size increase, calculation table.
Calculation Value Result/Description
80,192,054,564 Estimated 80 Gigs TENET data warehouse size, as of 3/1/1999.
Multiply .031 Average of 3.1% increase per month per hospital.
Equals 2,485,953,691 Estimated 2.5 Gigs increase per month from transaction volume.
Multiply 12 Months in a year.
Equals 29,831,444,298 Estimated 30 Gigs increase per year from transaction volume.
27
Historical
The current warehouse holds four years worth of data on-line. Say we want to hold ten
years worth. Can you imagine the potential of ten years worth of data on-line? What
would it take storage space wise to achieve this goal?
 Ten Years Worth Applied to Current Size.
Figure 16. TENET data warehouse current historical size, calculation table.
Calculation Value Result/Description
80,192,054,564 Estimated 80 Gigs TENET data warehouse size, as of 3/1/1999.
Multiply 2.5 Additional 6 years worth.
Equals 200,480,136,410 Estimated 200 Gigs total, for proposed 10 years worth of data.
28
Storage Requirements
Considering the estimated yearly growth of the TENET data warehouse, let us determine
the duration of the currently available storage on the AS/400 DAAC upon which it
resides, for both four years worth and ten years worth of data.
 Average Yearly Growth
Figure 17 estimates the current yearly growth.
Figure 17. TENET data warehouse total yearly size increase, calculation table.
Calculation Value Result/Description
10,459,833,204 Estimated 10.5 Gigs increase from additional hospitals per year.
Add 29,831,444,298 Estimated 30 Gigs increase per year from transaction volume.
Equals 40,291,277,502 Estimated 40 Gigs increase per year total.
Figure 18. AS/400 DAAC duration of current storage with four years worth of data, calculation table.
Calculation Value Result/Description
390,000,000,000 390 Gigs, current AS/400 DAAC size.
Subtract 80,192,054,564 Estimated 80 Gigs for current 4 years worth of data.
Equals 309,807,945,436 Estimated 310 Gigs available storage size.
Divide by 40,291,277,502 Estimated 40 Gigs increase per year total.
Equals 7.5 Estimated 7.5 years duration for current storage with 4 years of data.
Figure 19. AS/400 DAAC duration of current storage with ten years worth of data, calculation table.
Calculation Value Result/Description
390,000,000,000 390 Gigs, current AS/400 DAAC size.
Subtract 200,480,136,410 Estimated 200 Gigs total, for proposed 10 years worth of data.
Equals 189,519,863,590 Estimated 190 Gigs available storage size.
Divide by 40,291,277,502 Estimated 40 Gigs increase per year total.
Equals 5 Estimated 5 years duration for current storage with 10 years of data.
29
Methodology
To briefly recap, the current architecture consists of ninety-three data warehouses
distributed across two, non-production AS/400s. These should be reduced to one data
warehouse on one AS/400.
Proof of Concept
What follows is a partial prototype involving four small transactional type tables from the
Trinity hospital collection. The end result is a single table containing the data previously
stored in four different ones. You may call this the denormalization of normalcy. As you
go through the technicalities of the prototype you will come across some of the
previously discussed transformation procedures. Specifically data aggregation, data
standardization, and data cleansing. Figure 21 details four transaction-oriented tables.
Figure 22 details one end-user-oriented table.
GREEN: Identifies key columns, which combined, uniquely identify the row.
Figure 21. Trinity Hospital Normalized Tables
Patient Diagnosis Clinic Code Table Patient Procedure Patient Surgical
PATACCT# PATACCT# PATACCT# PATACCT#
SEQUENCE# SEQUENCE# SEQUENCE# SEQUENCE#
DIAGCODE CLINICCODE PROCCODE CPT4CODE
DIAGMODI DATELSTCHG PROCMODI CPT4MODI
DATELSTCHG HOSPITAL PROCDATE CPT4MODI2
HOSPITAL DATELSTCHG CPT4DATE
HOSPITAL DATELSTCHG
HOSPITAL
Figure 22. Trinity Hospital Denormalized Table
Clinical Data
PATACCT#
SEQUENCE#
HOSPITAL
DATELSTCHG
DIAGCODE
DIAGMODI
CLINICCODE
PROCCODE
PROCMODI
PROCDATE
CPT4CODE
CPT4MODI
CPT4MODI2
CPT4DATE
30
Figure 23 reproduces rows from the normalized tables. Figure 24 reproduces rows from
the denormalized table.
Figure 23. Trinity Hospital Normalized Tables Rows
Patient Diagnosis
Patient Account
Number
Diag Seq
Num
Diagnosis
Code
Diagnosis
Modifier
Last
Change Date
Hospital
Code
4307914 1 53510 19960619 TRI
4307914 2 4019 19960619 TRI
4307914 3 56984 19960619 TRI
4307914 4 5303 19960619 TRI
4307914 5 04186 19960619 TRI
4307914 16 5781 19960618 TRI
Clinic Code Table
Patient Account
Number
Clinic Seq
Num
Clinic
Code
Last Change
Date
Hospital
Code
4307914 1 SO 19960619 TRI
Patient Procedure
Patient Account
Number
Proc Seq
Num
Proc
Code
Proc
Mod
Procedure
Date
Last Change
Date
Hospital
Code
4307914 1 4516 19960614 19960619 TRI
4307914 2 4523 19960614 19960619 TRI
Patient Surgical
Patient Account
Number
CPT4 Seq
Num
CPT4
Code
CPT4
Mod.
CPT4
Modifier 2
CPT4
Date
Last Change
Date
Hospital
Code
4307914 1 43239 19960614 19960619 TRI
4307914 2 45378 19960614 19960619 TRI
Figure 24. Trinity Hospital Denormalized Table Rows
Clinical
Patact# Seq Hos
Cod
Change
Date
Diag
Code
Diag
Mod
Clinic
Code
Proc
Cod
Proc
Mod
Procedure
Date
Cpt4
Code
Cpt4
Mod
Cpt4
Mod2
Cpt4 Date
4307914 1 TRI 1996-06-19 53510 SO 4516 1996-06-14 43239 1996-06-14
4307914 2 TRI 1996-06-19 4019 4523 1996-06-14 45378 1996-06-14
4307914 3 TRI 1996-06-19 56984 0001-01-01 0001-01-01
4307914 4 TRI 1996-06-19 5303 0001-01-01 0001-01-01
4307914 5 TRI 1996-06-19 04186 0001-01-01 0001-01-01
4307914 16 TRI 1996-06-18 5781 0001-01-01 0001-01-01
As you can see from the preceding layout, the retrieval of all of the clinical data
regarding patient 4307914, against the Trinity Hospital normalized tables, requires eleven
distinct disk accesses. Whereas the Trinity Hospital denormalized table, requires only six
distinct accesses. In addition to this, the denormalized version allows row blocking to
gather all six rows in main memory at once reducing the disk accesses to one. This is not
possible in the normalized version due to the random access algorithms necessary to
retrieve rows from multiple tables.
31
If we leave the architecture as is, we have yes, achieved an improvement in access times
and as you will see later a storage saving. But it still leaves us with ninety-two data
warehouses. Time for a quick recap if you will. The tables’ architecture is identical for all
hospitals therefore we can consolidate the like tables into one table. And if we can do it
for all tables, as is the case, we can reduce ninety-three warehouses into one like so.
Since I cannot reproduce all ninety-two hospitals and expect you to keep your sanity, I
have chosen two, to demonstrate what the layout looks like.
Figure 25. Trinity and Alvarado Hospitals Denormalized Table Rows
Clinical
Patact# Seq Hos
Cod
Change
Date
Diag
Code
Diag
Mod
Clinic
Code
Proc
Cod
Proc
Mod
Procedure
Date
Cpt4
Code
Cpt4
Mod
Cpt4
Mod2
Cpt4 Date
4307914 1 TRI 1996-06-19 53510 SO 4516 1996-06-14 43239 1996-06-14
4307914 2 TRI 1996-06-19 4019 4523 1996-06-14 45378 1996-06-14
4307914 3 TRI 1996-06-19 56984 0001-01-01 0001-01-01
4307914 4 TRI 1996-06-19 5303 0001-01-01 0001-01-01
4307914 5 TRI 1996-06-19 04186 0001-01-01 0001-01-01
4307914 16 TRI 1996-06-18 5781 0001-01-01 0001-01-01
4307914 1 ALV 1996-06-19 53510 SO 4516 1996-06-14 43239 1996-06-14
4307914 2 ALV 1996-06-19 4019 4523 1996-06-14 45378 1996-06-14
4307914 3 ALV 1996-06-19 56984 0001-01-01 0001-01-01
4307914 4 ALV 1996-06-19 5303 0001-01-01 0001-01-01
4307914 5 ALV 1996-06-19 04186 0001-01-01 0001-01-01
4307914 16 ALV 1996-06-18 5781 0001-01-01 0001-01-01
Notice the additional key field, Hospital Code. This is necessary to maintain each row’s
uniqueness in the consolidated data warehouse, in the remote eventuality that identical
Patient Numbers are used in different hospitals, and to be able to distinguish between
hospitals for queries.
If you look closely at the data you will notice the date values represented with dashes and
what in the world is 0001-01-01? Introducing the ‘L’ date data type. This attribute
enforces data integrity by allowing only valid dates. Since a string of blanks or zeroes is
not a valid date, the system automatically defaults to the earliest date. Ergo, 0001-01-01.
You can specify any default date as long as it is a valid date, and you may also use the
NULL value. The advantages of utilizing this attribute on all date columns are threefold.
One, automatic editing of the value, two, requires only 4 bytes of disk storage versus the
8 required by current zone decimal definition. Three allows the use of special operation
codes to simplify date manipulation and date arithmetic within programs.
Other enhancements to the denormalized table include the elimination of repeating
columns, “Patient Account Number”, “Last Change Date”, “Hospital Code”, and
“Sequence Number”. And, the optimization of numeric columns from the standpoint of
disk storage and CPU processing. The AS/400 stores numeric values in a packed format.
If you define the numeric columns as zoned decimal, you are incurring additional CPU
processing time for the translation from one format to the other each time that column is
accessed. You are also approximately doubling disk storage requirements for your
numeric columns.
32
By simply applying these design criteria, and without any end-user input I have
effectively achieved a much more efficient end-user access table. The additional end-user
input will result in an even more efficient design.
An additional benefit of this design is the amount of storage space reclaimed as shown in
figures 26 and 27.
Figure 26. Trinity Hospital Normalized Tables
Required Storage Space in Bytes
TABLE NAME DATA ACCESS PATH TOTAL
Patient Diagnosis 17,827,840 13,438,946 31,275,008
Clinic Code Table 4,722,176 4,526,080 9,254,400
Patient Procedure 2,633,216 1,511,424 2,250,752
Patient Surgical 1,322,496 921,600 2,250,752
Total 26,505,728 20,398,080 46,931,456
Figure 27. Trinity Hospital Denormalized Table
Required Storage Space in Bytes
TABLE NAME DATA ACCESS PATH TOTAL
Clinical01 28,313,600 13,438,976 41,766,912
We see a net saving of five Megs for this one hospital, multiply it times ninety-two, our
currently supported hospitals and the savings start getting more interesting, 485-Megs.
And that’s not all, I have performed a simple prototyping demonstration on four, small,
transaction tables. There are thirty-four additional tables that require more in-depth
analysis to determine further aggregation possibilities. I did some quick analysis and can
tell you that there are ten more that can be aggregated. Not to mention the elimination of
most, if not all joins, depending on the results of the aforementioned aggregations. That
will translate into hefty storage savings. In addition, we will have a tremendous
performance throughput improvement as described previously.
33
Additional Enhancements
The enhancement project discussed previously accomplishes one objective. The
improvement of response times on the production systems by the removal of the data
warehouses from those production systems. If I may say so, it is akin to cutting off ones
hand because your finger hurts. Other functional areas that need to be addressed are:
 Data Extraction
 Report Mining
 Data Transformation
 Data Propagation
 Data Verification
Data Extraction
Currently we are populating the data warehouses with daily feeds from a mainframe.
Historical data is also obtained from the same mainframe on an as needed basis. We are
also operating in an environment that requires continuous enhancements to the existing
warehouses as users request additional fields upon which to query. As a perfect example I
would like to cite the last two projects I was involved in. The first involved adding one
field. The second involved adding six fields and a full historical reload from the
mainframe. Together both projects lasted about three months. In addition, there is another
project on the sidelines called the “Field Add Project”. Which leads me to believe we will
be adding more fields, “why, elementary my dear Watson”! It would seem that the initial
user requirements were somewhat incomplete. If so, let’s be proactive and interview the
users now, so we may identify all the specific elements that warrant inclusion in the
warehouse up front. In doing so, we will have killed two birds with one stone. Pleased the
users who will be able to extract additional information and, MIS will automatically
become and be perceived as being much more productive.
Report Mining
Spooled files (reports that have not yet printed) contain data that has already been
extracted from operational databases and report mining can be used to access this data.
Almost every OLTP application, whether canned or homegrown generates a
comprehensive suite of reports. Because they provide valuable information to end users
in a relatively intuitive way, reports mask the complexity of underlying OLTP databases.
Furthermore, report programs have already located, accessed, extracted, and consolidated
valuable operational information. Reports also maintain metadata in the form of column
headings, date ranges, titles, and other descriptive text. It may be worthwhile to invest in
software that allows the integration of data obtained from spooled files into the
warehouse. Currently this option is not even on the drawing board. Yet, all four
interfacing applications produce reports.
34
Data Transformation
Once raw data has been extracted from OLTP databases, it must be reformatted and
refined for the data warehouse. The transformation of this raw data comprises five related
activities: data aggregation, data filtering, data combining, data standardization, and data
cleansing.
Data Aggregation
Aggregation is an essential transformation function that summarizes operational data.
The aggregation process should combine the header and detail records into one record
(interfile aggregation). TENET’s current warehouses don’t make use of this
transformation technique.
Data Filtering
Transformational processes also may filter relevant information from OLTP databases.
For example, an executive looking for net revenues would probably have no interest in
patient account numbers. This and other extraneous data elements would not be
transferred to a data warehouse. To the best of my knowledge no data filtering is done for
TENET.
Data Combination
A third transformation function may combine OLTP data from separate applications and
platforms. The growth of distributed-processing environments has resulted in operational
databases that are scattered around the world. Data warehouses must be able to combine
data elements from these disparate systems.
This issue has already reared its head, as we are supposed to integrate the ORNDA
system made up of another group of hospitals which have an 11 byte patient account
number versus the current TENET standard of 9 bytes.
Data Standardization
Data-transformation processes standardize data elements and the metadata that describes
those elements. The difficulties caused by poor or even nonexistent documentation
underscore the need for consistency. Basic field attributes such as content, size, type, and
descriptions often differ across multiple applications, or even within a single application.
The inconsistent use of codes is a frequent problem as well. Out of all the transformation
activities this one is the one TENET lags the least behind in. Nonetheless, there are no
technical metadata repositories, nor are there any business metadata repositories.
Data Cleansing
Another function data-transformation programs perform is that of data cleansing. Data
transformation procedures to ensure the accuracy of warehouse repositories must be in
place. TENET’s warehouse has none.
35
Data Propagation
Data-propagation procedures physically move transformed OLTP data to data
warehouses. TENET’s data-propagation procedures are performed periodically on a daily
basis. The biggest problem with TENET’s propagation procedures is that the procedures
need to be constantly monitored manually. Someone on the mainframe side must monitor
the successful completion of the propagation jobs there, and someone on the AS/400 side
must verify the successful completion of the propagation jobs there. This is done five
days a week multiple times a day. This area needs to be reviewed ASAP.
Data Verification
To maintain warehouse integrity, systematic procedures to periodically compare
warehouse information to operational data must be in place. TENET has none, therefore
we only know of problems when the users call us on them, or the propagation procedure
crashes because it encountered some unreadable data.
As I have shown you, extraction, transformation, and propagation, the three processes
that move data into warehouses, as well as verification procedures, are all important
elements in creating and maintaining effective data warehouses. Figure 20 on the
following page presents an overview of these interrelated processes.
36
Figure 20.
Data-Warehouse Maintenance Procedures
Operational Database
Raw Data
Extraction Phase: Custom programs and/or replication tools access raw data.
Extracted Data
Transformation Phase: Custom programs and/or replication tools cleanse, decode,
standardize, and aggregate extracted data.
Transformed Data
Propagation Phase: Custom programs and/or replication tools move transformed data
to the data warehouse.
Data Warehouse
Warehouse Data
Verification Phase: Customized programs regularlycompare warehouse data to source data.
Printed Results
Verification Reports
Operational Data
37
Testimonials
Data Warehouse
If you or a friend have a mortgage loan with Countrywide, feel free to go to
WWW.Countrywide.com , and pull up your loan or any other information on any of the
other products such as HELOC, Credit Card and various insurance offerings. The
information displayed is retrieved from the back-end I designed on an AS/400 using the
previously detailed techniques and procedures. The exceptionally good response time is
due mostly to the denormalization technique that allowed me to reduce a database
composed of eighteen normalized tables into a database of two denormalized ones. We
also came up with a live demo hosting WWW.countrywide.com on the same AS/400 that
was hosting the warehouse. Unfortunately at the time we had some unresolved security
issues with the firewall and no time to work them out. So we decided to host the website
on the NT server. This actually worked out to my favor in that it emphasized the power of
denormalized tables. When you request any loan, or other product information at
WWW.countrywide.com, your request is received by a JAVA program on the NT server
where the site is hosted. The JAVA program then submits a SQL read to the
denormalized database on the AS/400 back-end, and returns the requested information
with sub-second response time thanks to the minimized disk IO. Keep in mind though,
that if the AS/400 that hosts the warehouse would also host the website, the response time
would be further increased by the elimination of the middle layer (NT server).
38
DSS & EIS
The initial project requirements also included the creation of a Decision Support System
(DSS) and an Executive Information System (EIS). Unfortunately, due to political
turmoil only the “Retrieval of Loan and Related Information on the Web” requirement
survived. Nonetheless we were able to come up with a live demo of the DSS/EIS system.
The prototype consisted of the not so hypothetical query, “How many of our customers
have multiple products”? We had to come up with a way of satisfying the executives
thirst for knowledge and their impatience with long response times. After some
brainstorming I came up with the following solution:
1. Develop a “Customer to Product Relationship Table” updated daily from the Product
Warehouse.
2. Creation of a temporary table using the Relationship table as input, containing two
fields.
Customer# # of Products
12345 2
45689 5 Creation time for this table against a 17 million row warehouse
56423 3 on an AS/400 was 2 minutes.
etc. etc.
3. Count the number of records in the above temporary table thereby obtaining the
answer.
Steps two and three were obtained through the following SQL code:
CREATE TABLE TEMP1 (COL1 INT, COL2 INT)
INSERT INTO TEMP1
SELECT T03CUSTNUM, COUNT(T03PRODCOD) FROM WEBT030P
GROUP BY T03CUSTNUM HAVING COUNT (T03PRODCOD) > 1
SELECT COUNT (*) FROM TEMP1
Assume for a moment that we are able to convince the executives to sit down with us and
help us pre-define their queries. We could then set up a series of jobs that would run at
night whose sole purpose was to create a series of temporary tables, one for each
executive query. At that point, come morning the executives would have but to press a
key or click a button to obtain answers with sub-second response time.
39
Conclusion
Time is of the essence. We need to overhaul TENET’s data warehouse now. Not
tomorrow, or the day after, but now. The current architecture does not promote itself to
unhindered growth. What confirms this are the difficulties we are encountering to add the
ORNDA hospitals, due to the different “patient account number” field sizes. In addition,
there are the lack of edits, which cause the notorious garbage in garbage out situation.
Wasted storage space, no metadata, and inefficient propagation procedures. If we
maintain the status quo it is only a matter of time before we jeopardize our relationship
with TENET. This scenario is particularly undesirable in light of the recent IPO.
There are several major roadblocks to implementing this overhaul. First and foremost,
“If it ain’t broke, don’t fix it”. Well, its about to break. Second, if we get the go ahead to
modernize the architecture, the three homegrown applications, CASEMIX, PQS, and
Cost Accounting will have to be re-written, and the users will have to be re-trained to
access the new data warehouse. So we are looking at embarking on a project of epic
proportions which frankly, TENET may not be interested with. In which case, amen. But,
if my vision makes sense to you and you feel, like I do, that there is a need for powerful
data warehousing solutions such as the one I have depicted in this novel, then we can lay
down the foundations for PEROT Systems to become a major player in providing
customized data warehousing solutions for our current and future clients.

More Related Content

What's hot

Deployment guide-for-share point-2013
Deployment guide-for-share point-2013Deployment guide-for-share point-2013
Deployment guide-for-share point-2013prconcepcion
 
Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...
Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...
Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...
Dana Gardner
 
Migrating data centric applications to windows azure
Migrating data centric applications to windows azureMigrating data centric applications to windows azure
Migrating data centric applications to windows azureSteve Xu
 
Data mining extensions dmx - reference
Data mining extensions   dmx - referenceData mining extensions   dmx - reference
Data mining extensions dmx - referenceSteve Xu
 
Sap on windows_server_2012_and_sql_server_2012_white_paper_final
Sap on windows_server_2012_and_sql_server_2012_white_paper_finalSap on windows_server_2012_and_sql_server_2012_white_paper_final
Sap on windows_server_2012_and_sql_server_2012_white_paper_final
Manikanta Kota
 
BetterCloud Whitepaper: Fixing IT's Blindspots – 8 Critical Security and Mana...
BetterCloud Whitepaper: Fixing IT's Blindspots – 8 Critical Security and Mana...BetterCloud Whitepaper: Fixing IT's Blindspots – 8 Critical Security and Mana...
BetterCloud Whitepaper: Fixing IT's Blindspots – 8 Critical Security and Mana...
BetterCloud
 
Deployment guide for Microsoft Office 2010 for IT professionals.
Deployment guide for Microsoft Office 2010 for IT professionals.Deployment guide for Microsoft Office 2010 for IT professionals.
Deployment guide for Microsoft Office 2010 for IT professionals.
Компания Робот Икс
 
Integration services extending packages with scripting
Integration services   extending packages with scriptingIntegration services   extending packages with scripting
Integration services extending packages with scriptingSteve Xu
 
Deployment guide-for-office-2013
Deployment guide-for-office-2013Deployment guide-for-office-2013
Deployment guide-for-office-2013
Heo Gòm
 
Volunteer Management Reporting System
Volunteer Management Reporting SystemVolunteer Management Reporting System
Volunteer Management Reporting SystemDainSanye
 
Data models and ro
Data models and roData models and ro
Data models and ro
Diana Diana
 
Sql server bi poweredby pw_v16
Sql server bi poweredby pw_v16Sql server bi poweredby pw_v16
Sql server bi poweredby pw_v16
MILL5
 
Deployment guide-for-office-2013
Deployment guide-for-office-2013Deployment guide-for-office-2013
Deployment guide-for-office-2013Steve Xu
 
Migrating Data-Centric Applications to Windows Azure
Migrating Data-Centric Applications to Windows AzureMigrating Data-Centric Applications to Windows Azure
Migrating Data-Centric Applications to Windows AzureBrian Bendera
 
BI Project report
BI Project reportBI Project report
BI Project report
hlel
 
DotNetnuke
DotNetnukeDotNetnuke
DotNetnuke
kaushal123
 
The analytics-stack-guidebook
The analytics-stack-guidebookThe analytics-stack-guidebook
The analytics-stack-guidebook
Ashish Tiwari
 
Agm application virtualization_(app-v)_5.0
Agm application virtualization_(app-v)_5.0Agm application virtualization_(app-v)_5.0
Agm application virtualization_(app-v)_5.0Steve Xu
 
Agm bit locker_administration_and_monitoring_1.0
Agm bit locker_administration_and_monitoring_1.0Agm bit locker_administration_and_monitoring_1.0
Agm bit locker_administration_and_monitoring_1.0Steve Xu
 
Sql server community_fa_qs_manual
Sql server community_fa_qs_manualSql server community_fa_qs_manual
Sql server community_fa_qs_manualSteve Xu
 

What's hot (20)

Deployment guide-for-share point-2013
Deployment guide-for-share point-2013Deployment guide-for-share point-2013
Deployment guide-for-share point-2013
 
Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...
Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...
Complex Carrier Network Performance Data on Vertica Yields Performance and Cu...
 
Migrating data centric applications to windows azure
Migrating data centric applications to windows azureMigrating data centric applications to windows azure
Migrating data centric applications to windows azure
 
Data mining extensions dmx - reference
Data mining extensions   dmx - referenceData mining extensions   dmx - reference
Data mining extensions dmx - reference
 
Sap on windows_server_2012_and_sql_server_2012_white_paper_final
Sap on windows_server_2012_and_sql_server_2012_white_paper_finalSap on windows_server_2012_and_sql_server_2012_white_paper_final
Sap on windows_server_2012_and_sql_server_2012_white_paper_final
 
BetterCloud Whitepaper: Fixing IT's Blindspots – 8 Critical Security and Mana...
BetterCloud Whitepaper: Fixing IT's Blindspots – 8 Critical Security and Mana...BetterCloud Whitepaper: Fixing IT's Blindspots – 8 Critical Security and Mana...
BetterCloud Whitepaper: Fixing IT's Blindspots – 8 Critical Security and Mana...
 
Deployment guide for Microsoft Office 2010 for IT professionals.
Deployment guide for Microsoft Office 2010 for IT professionals.Deployment guide for Microsoft Office 2010 for IT professionals.
Deployment guide for Microsoft Office 2010 for IT professionals.
 
Integration services extending packages with scripting
Integration services   extending packages with scriptingIntegration services   extending packages with scripting
Integration services extending packages with scripting
 
Deployment guide-for-office-2013
Deployment guide-for-office-2013Deployment guide-for-office-2013
Deployment guide-for-office-2013
 
Volunteer Management Reporting System
Volunteer Management Reporting SystemVolunteer Management Reporting System
Volunteer Management Reporting System
 
Data models and ro
Data models and roData models and ro
Data models and ro
 
Sql server bi poweredby pw_v16
Sql server bi poweredby pw_v16Sql server bi poweredby pw_v16
Sql server bi poweredby pw_v16
 
Deployment guide-for-office-2013
Deployment guide-for-office-2013Deployment guide-for-office-2013
Deployment guide-for-office-2013
 
Migrating Data-Centric Applications to Windows Azure
Migrating Data-Centric Applications to Windows AzureMigrating Data-Centric Applications to Windows Azure
Migrating Data-Centric Applications to Windows Azure
 
BI Project report
BI Project reportBI Project report
BI Project report
 
DotNetnuke
DotNetnukeDotNetnuke
DotNetnuke
 
The analytics-stack-guidebook
The analytics-stack-guidebookThe analytics-stack-guidebook
The analytics-stack-guidebook
 
Agm application virtualization_(app-v)_5.0
Agm application virtualization_(app-v)_5.0Agm application virtualization_(app-v)_5.0
Agm application virtualization_(app-v)_5.0
 
Agm bit locker_administration_and_monitoring_1.0
Agm bit locker_administration_and_monitoring_1.0Agm bit locker_administration_and_monitoring_1.0
Agm bit locker_administration_and_monitoring_1.0
 
Sql server community_fa_qs_manual
Sql server community_fa_qs_manualSql server community_fa_qs_manual
Sql server community_fa_qs_manual
 

Viewers also liked

impuestos sobre la renta
impuestos sobre la rentaimpuestos sobre la renta
impuestos sobre la renta
selenajaimes
 
Impacto de las_tic´s_en_la_sociedad-1[1]
Impacto de las_tic´s_en_la_sociedad-1[1]Impacto de las_tic´s_en_la_sociedad-1[1]
Impacto de las_tic´s_en_la_sociedad-1[1]
amodegon1003
 
ofimática
ofimática ofimática
ofimática
EduardoForeroPaez
 
Hayez - Estratto saggio Mazzocca
Hayez - Estratto saggio MazzoccaHayez - Estratto saggio Mazzocca
Hayez - Estratto saggio Mazzocca
Iniziativa 21058
 
Universidad fermín toro
Universidad fermín toroUniversidad fermín toro
Universidad fermín toro
Jaime Alvarez
 
Alejandro rivas
Alejandro rivasAlejandro rivas
Alejandro rivas
Necronec4l
 
Industry Summit Keynote - Evolution Of Car Dealership's Profit Centers
Industry Summit Keynote - Evolution Of Car Dealership's Profit CentersIndustry Summit Keynote - Evolution Of Car Dealership's Profit Centers
Industry Summit Keynote - Evolution Of Car Dealership's Profit Centers
Sean Bradley
 

Viewers also liked (8)

impuestos sobre la renta
impuestos sobre la rentaimpuestos sobre la renta
impuestos sobre la renta
 
Impacto de las_tic´s_en_la_sociedad-1[1]
Impacto de las_tic´s_en_la_sociedad-1[1]Impacto de las_tic´s_en_la_sociedad-1[1]
Impacto de las_tic´s_en_la_sociedad-1[1]
 
ofimática
ofimática ofimática
ofimática
 
Award of excellence and report
Award of excellence and reportAward of excellence and report
Award of excellence and report
 
Hayez - Estratto saggio Mazzocca
Hayez - Estratto saggio MazzoccaHayez - Estratto saggio Mazzocca
Hayez - Estratto saggio Mazzocca
 
Universidad fermín toro
Universidad fermín toroUniversidad fermín toro
Universidad fermín toro
 
Alejandro rivas
Alejandro rivasAlejandro rivas
Alejandro rivas
 
Industry Summit Keynote - Evolution Of Car Dealership's Profit Centers
Industry Summit Keynote - Evolution Of Car Dealership's Profit CentersIndustry Summit Keynote - Evolution Of Car Dealership's Profit Centers
Industry Summit Keynote - Evolution Of Car Dealership's Profit Centers
 

Similar to BusinessIntelligence

Essay Database
Essay DatabaseEssay Database
Make compliance fulfillment count double
Make compliance fulfillment count doubleMake compliance fulfillment count double
Make compliance fulfillment count double
Dirk Ortloff
 
How To Plan a Software Project
How To Plan a Software ProjectHow To Plan a Software Project
How To Plan a Software Project
HowToPlanASoftwareProject
 
How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management
Abhishek Sood
 
Project report
Project report Project report
Project report
MansiKulkarni18
 
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
Thomas Rones
 
The Case for Business Modeling
The Case for Business ModelingThe Case for Business Modeling
The Case for Business Modeling
Neil Raden
 
Business Case for Data Mashup
Business Case for Data MashupBusiness Case for Data Mashup
Business Case for Data Mashup
ArleneWatson
 
Mr bi
Mr biMr bi
Mr bi
renjan131
 
Acc 340 Preview Full Course
Acc 340 Preview Full CourseAcc 340 Preview Full Course
Acc 340 Preview Full Course
fasthomeworkhelpdotcome
 
Lean approach to IT development
Lean approach to IT developmentLean approach to IT development
Lean approach to IT developmentMark Krebs
 
Acc 340 Preview Full Course
Acc 340 Preview Full Course Acc 340 Preview Full Course
Acc 340 Preview Full Course
fasthomeworkhelpdotcome
 
Top Three Data Modeling Tools Usability Comparsion
Top Three Data Modeling Tools Usability ComparsionTop Three Data Modeling Tools Usability Comparsion
Top Three Data Modeling Tools Usability Comparsion
Erin
 
Top Three Data Modeling Tools Usability Comparsion
Top Three Data Modeling Tools Usability ComparsionTop Three Data Modeling Tools Usability Comparsion
Top Three Data Modeling Tools Usability Comparsion
Erin
 
Big data and hadoop ecosystem essentials for managers
Big data and hadoop ecosystem essentials for managersBig data and hadoop ecosystem essentials for managers
Big data and hadoop ecosystem essentials for managers
Manjeet Singh Nagi
 
INTRODUCTION TO Database Management System (DBMS)
INTRODUCTION TO Database Management System (DBMS)INTRODUCTION TO Database Management System (DBMS)
INTRODUCTION TO Database Management System (DBMS)
Prof Ansari
 
Ems
EmsEms
Working on Tasks in Microsoft Project Web Access
Working on Tasks in Microsoft Project Web AccessWorking on Tasks in Microsoft Project Web Access
Working on Tasks in Microsoft Project Web AccessDavid J Rosenthal
 
Structure of Database MAnagement System
Structure of Database MAnagement SystemStructure of Database MAnagement System
Structure of Database MAnagement System
nitish sandhawar
 

Similar to BusinessIntelligence (20)

Essay Database
Essay DatabaseEssay Database
Essay Database
 
Make compliance fulfillment count double
Make compliance fulfillment count doubleMake compliance fulfillment count double
Make compliance fulfillment count double
 
How To Plan a Software Project
How To Plan a Software ProjectHow To Plan a Software Project
How To Plan a Software Project
 
How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management
 
Project report
Project report Project report
Project report
 
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
 
The Case for Business Modeling
The Case for Business ModelingThe Case for Business Modeling
The Case for Business Modeling
 
Business Case for Data Mashup
Business Case for Data MashupBusiness Case for Data Mashup
Business Case for Data Mashup
 
Mr bi
Mr biMr bi
Mr bi
 
Acc 340 Preview Full Course
Acc 340 Preview Full CourseAcc 340 Preview Full Course
Acc 340 Preview Full Course
 
Lean approach to IT development
Lean approach to IT developmentLean approach to IT development
Lean approach to IT development
 
Acc 340 Preview Full Course
Acc 340 Preview Full Course Acc 340 Preview Full Course
Acc 340 Preview Full Course
 
Top Three Data Modeling Tools Usability Comparsion
Top Three Data Modeling Tools Usability ComparsionTop Three Data Modeling Tools Usability Comparsion
Top Three Data Modeling Tools Usability Comparsion
 
Top Three Data Modeling Tools Usability Comparsion
Top Three Data Modeling Tools Usability ComparsionTop Three Data Modeling Tools Usability Comparsion
Top Three Data Modeling Tools Usability Comparsion
 
Big data and hadoop ecosystem essentials for managers
Big data and hadoop ecosystem essentials for managersBig data and hadoop ecosystem essentials for managers
Big data and hadoop ecosystem essentials for managers
 
INTRODUCTION TO Database Management System (DBMS)
INTRODUCTION TO Database Management System (DBMS)INTRODUCTION TO Database Management System (DBMS)
INTRODUCTION TO Database Management System (DBMS)
 
Fulltext01
Fulltext01Fulltext01
Fulltext01
 
Ems
EmsEms
Ems
 
Working on Tasks in Microsoft Project Web Access
Working on Tasks in Microsoft Project Web AccessWorking on Tasks in Microsoft Project Web Access
Working on Tasks in Microsoft Project Web Access
 
Structure of Database MAnagement System
Structure of Database MAnagement SystemStructure of Database MAnagement System
Structure of Database MAnagement System
 

BusinessIntelligence

  • 2. 2 Table of Contents Summary ………………………3 Introduction ………………………4 Designing Databases for End-User Access ………………………8 TENET Business Intelligence ………………………10 Pbar & Patcom ………………………11 Pbar ………………………12 Pbar Consolidated Version ………………………15 Interfacing Applications ………………………18 Current Enhancement Project ………………………19 Pbar & Patcom ………………………20 Patcom ………………………21 Proposed Project ………………………24 Growth Estimates ………………………25 Methodology ……………………….29 Additional Enhancements ……………………….33 Testimonials ………………………..37 Conclusion ………………………..39
  • 3. 3 Summary Why are databases not designed right from the start for end-user-access? Wouldn’t it be a lot easier to implement query products if the data were built with the users in mind? Could it be that both the production databases and the end-user databases could be the same databases if the database design were for both? Or is there too much of a difference between the requirements of the users and the requirements of the production systems? I will examine each of these questions in detail to formulate a strategy on how and when to design databases for end-user access. I will then introduce TENET’s existing business intelligence system, together with the current enhancement project, which will be completed on March 31st , 1999. Subsequently, I will list what I believe to be major deficiencies in both the existing system and the newly enhanced one. Finally, I will outline my vision of a universal data warehouse architecture, using TENET as the prototype.
  • 4. 4 Introduction The basic notion of production databases is to create atomic row designs with single fact rows according to third normal form normalization. The sparse rows designed in this process do not naturally lend themselves to end-user access. There are several reasons for this: The more data is spread between multiple tables with various relationships, the less intuitive is the meaning of the data. And the more tables involved in a database, the more complex functions such as join are required to first put the data in the proper shape. A simple demographic database for a fraternal association can be used as an example to demonstrate this. See (Figures 1 and 2) for row layouts. Figure 1. Fraternal Organization Database Personal Residence Occupation MEMB# RESCODE OCCODE NAME RESNAME OCDESCRIPTION RESCODE ADDRESS1 etc. OCCODE ADDRESS2 Figure 2. Denormalized Fraternal Organization Database PersonalInfo MEMB# NAME RESCODE RESNAME ADDRESS1 ADDRESS2 OCCODE OCDESCRIPTION Assume that in this organization, one to 20 people live in the same residence owned by the organization on behalf of the members. If the address information were kept in the personnel rows, there would be database anomalies in the form of transitive dependencies, since the address information depends on the residence not on the person. So the natural solution is to split the personnel row into both a residence row and a personnel row. Now, if we are storing the various occupations in which members may be employed, many of the members can be employed in the same occupation, or the same company. We may elect to segregate occupation information because it does not depend on the person but on the occupation. In this case, we would have a personnel table, a residence table, and an occupation table. To print a report that shows all of the nurses in the organization with the occupation data included along with the address of the employer and the member’s name and address, we would need to join three tables together. Although joining such data is not very difficult for a trained professional, it is not a talent the average user possesses. Consequently, there would be an undue degree of difficulty for the user to effect database joins to produce simple reports.
  • 5. 5 Ideally, all of the data that a user desires should be in one row. In our demographic database, is this possible? Yes, certainly, the data for residences and occupations can be packed into the same row that contains the member’s name. However, this design is not suitable for the production system since it would require non-normalized data. But this design would be ideal for the end-user database because it allows for all data to be queried without requiring complex functions such as database joins. Is the solution someplace in-between? In this situation, there is not much that can be in- between. There are multiple entities each demanding their own database table for production purposes, yet each wanting to be combined into a larger entity to enable end- user access. The potential relational database solution for this phenomenon is called a view. A view is simply a prepackaged projection of a database providing a different look than the physical data would normally dictate. So, in the above scenario, a row containing all the required data can typically be shaped and provide the desired view of the data for the end users. If this is so easy, then why is it not always done? Let’s ask a hypothetical IS manager why this is not always done, since it seems like it should be part of production system design. Typical IS manager: “Well, that’s easy for you to say! My first responsibility around here is to develop and maintain applications that provide major value for the organization. What I work on is prioritized by the steering committee and there is no time left to be worrying about what would be good for end-user access. Besides, if I start throwing join logical tables all over my production database to support ad-hoc queries from end-users, the maintenance of these indexes will slow down my production system… and that is even if there never is one query actually run. Phew! If they actually run queries against those joined tables; I’ve got an even bigger problem. How can I keep my less than one second response time promise if I don’t even know who will be using the system. Sure. I’d like to help, but there are too many opportunities for this thing to fail… and bring me down with it! In different words perhaps, but many IS managers would echo those sentiments exactly. They have a difficult balancing act to perform. They are expected to provide high-quality clerical function with great performance characteristics. They accept the challenge and do their jobs on a daily basis. It is not that IS does not want to be part of the team and provide the users with whatever they want. The very mission of IS is to please. But the production mission and the user mission are at odds with each other. From the dialogue above, we can see two fundamental problems in trying to design and implement databases for production end-user access: If end –user access were treated as its own application, or at a minimum was woven into a production system’s set of requirements, it could be accommodated more readily. End-User access is not treated as an application. Accommodating end-user access can create major performance problems with production users and query users.
  • 6. 6 However, very few assimilate this into design thinking. Management never suggests that we treat it as an application, so, by default, end user access is an afterthought. It is assumed to be a by-product of application system design, not something that should be the object of system design. And this is exactly what the end user gets… by-products. Would we ever consider designing two databases, one for the production application and one for end-user access? How could we? We just spent all of our database lives living by one of the major reasons for database: avoiding data redundancy and duplicity. How could we possibly conceive that the solution to any database problem would be to duplicate the data or the design? Instead, we very efficiently label the production database and move on to the next application. If the database design does not fit the end- user’s requirements, that’s a problem for another day, a day that hopefully will never come because we are now on to the next application. But what if the steering committee includes end-user access as one of the requirements of the application. What would we do? Where would we start? Would we just plan to build the production database as in the past and declare the end-user job to be done when the production system goes live? Or would we take our database design to third normal form for production, and then work with the users to define the best possible views of the data for ad-hoc queries. Hopefully we would do the latter! But we do not have to wait for the steering committee to say that end-user access is a requirement. It is our job to suggest new ways of doing things. We are the change agents. We intuitively understand this. But heretofore, we have not taken the initiative to justify the additional design and implementation time necessary to build the proper shaped data for end users into our production designs. Consequently, we do not design and code the join logical table rows to support end-user access. If these were part of the perceived real application requirement set, we would factor their impact on time and performance, and would propose different time projections and hardware requirements than that necessary for only the production system. The fact is that end-user access must be treated as a real application to get real results. We cannot very well design an invoicing system and propose hardware that would perform well enough to calculate all but the invoice total. This would be at best incomplete, at worst useless. We wouldn’t be satisfied with something that almost did the job! So also with end-user access. If we put nothing into planning for it and building the proper structures to support it, then we can likewise expect to get nothing out of it. It can be argued that the paradigm of the information age was brought on by a shift in emphasis from the clerical benefit derived from an application to the information rewards that can be gained by harnessing the power of all the information collected on behalf of these clerical applications. We made the shift from data processing to management information systems almost solely on the backs of programmers. The users demand results and the results are delivered by MIS in the form of report programs.
  • 7. 7 Now, the industry experts have theoretically carved out and differentiated some potentially new paradigms, these being Decision Support Systems (DSS) and Executive Support Systems (ESS). However, the world of information processing in reality has not yet made the shift. The primary focus today is MIS with lip service to DSS and ESS. End-user access is part of this lip service. Companies budget and buy tools to provide ad- hoc information needed by knowledge workers. But there is rarely an end-user project associated with the purchased tool. The purchase of the tool is an acknowledgement that management is serious about a solution, but there is no associated funding for the proper design or re-engineering of the application’s database. And MIS has not caught on to the fact that end-user access must be treated as another application to succeed. Once we do, then we will allocate the proper resources for its successful implementation, just as we allocate the resources necessary for the production applications. If we were to begin immediately to treat end-user access as another application, we would be forced to devise some innovative ways of assessing its performance impact on the production applications in concert with the end-user access application. Although this would be a difficult task, such work could help determine what additional capacity and power would be necessary to support end-user computing. This would have the double benefit of providing a more accurate cost for this service, plus it would give management the opportunity to vote yes or no without thinking that end-user access was free. Moreover, if the additional hardware to support the users were installed, there would be little reason for IS to be as concerned about the impact of the users on the system. Analysts know that the mere presence of logical tables adds overhead to applications. We also know that if there is a reasonable amount of queries against these logical views, those views should be given immediate maintenance. But when we give views immediate access maintenance, we also add a burden to each and every production transaction since, in addition to doing its normal work, it must also carry the performance hit of access path maintenance (index updating) for the end-user system. How much better are we therefore to recognize end-user access as a valuable application of its own with an associated system burden? In this way, we can have the horsepower necessary without the fear that unfunded queries will be the undoing of an otherwise effective IS manager. And so, if end-user access is treated as an application, and its potential performance impact is factored into the decision to move forward, there are great prospects for resounding success. If, on the other hand, end-user access continues to be allowed to be the leftover potential of an under-designed production application, it will continue to haunt IS management. Until someone, perhaps even a PC heritage person without the outmoded beliefs that data normalization rules all, gladly will take over the reins. And when this time comes, the DSS and ESS driven information paradigms will have begun the shift. Given the charge to produce well-designed tables and rows for end-users, certain design criteria should be followed with the major objective to make it easier for the user.
  • 8. 8 Designing Databases for End-User Access Minimize number of separate tables and eliminate all multitable dependencies. Going back to the demographic database issue presented at the beginning of this document, it does not require a substantial amount of thought to conclude that it would be easier to access data such as name, address, and occupation description from one row rather than three. If a user faces three times the number necessary to do the job, his or her productivity will be impacted by more than a factor of three. Instead of concentrating on the data to be queried, a user joining tables must be concerned about how a join works, and whether the product supports inner, outer, natural, or other join types. Who cares? Not the user looking for data, that’s for sure! Let the MIS department worry about the data, and let the user worry about getting information. Make attributes similar to archival data. End-user data is often a combination of master and transaction data. Typical transaction table row layouts are sparse, at best. In the production system, once the master row has been accessed, take the information and place it in the transaction row for better archival information. Such end-user rows by definition must be designed to be comprehensive. Design to first normal form. Repeating groups are not conducive to production data nor are they conducive to end- user access. Design the data for users to the first normal form, but do not go any further. If the first normal form of the data can be achieved by pre-joining logicals, this is an effective and easy way to test the validity of the row design without a major amount of effort. Keep as much data in each row as possible without compromising the one-to-one data element to key relationship. If logicals do not do the trick, a physical table can be extracted periodically from all of the underlying production data sources to provide an effective row layout for user queries. This also has the benefit of being a great performer. Design with complete information. Along the way to single fact rows, production databases are split and split and split again. Related one-to-one attributes like customer data and balance data can be designed into one row with no production loss. Design completed rows. There are two reasons to design completed rows. First, the objective of an end-user database is to make it easier. Completed rows make it easier too. The second reason is to enhance performance. It is better to capture the customer name into the transaction table rather than access the customer master each time it must be retrieved. Also, calculations, such as extended price, can be performed once during production processing, and the results can be stored in completed rows rather than performing the calculations each time the row is read. Besides helping the end user access data, this approach also helps the production system run more efficiently.
  • 9. 9 Capture point-in-time data. If a piece of data, such as the price in a transaction row, is dependent on the price in the master row, the price we pay today will be reported as a different price in the future as the price in the master row changes. This design suggestion is related to completed rows above. But the intent of this is to assure the accuracy of data through time. When a transaction occurs involving price and/or discount, it is good systems design to capture the point-in-time values for price and discount, rather than rely on the master row, or a calculation to provide such data. This assures the constancy of data. Avoid ubiquitous codes. Poorly codified data is another reason why normal users find production data difficult to use. When time is spent developing self-evident codes such as M for male and F for female, the user’s job is simplified. Contrast this to the design that codes male as a10 and female as an 11. The more intuitive the coding structure, the easier it will be for users to access and select data in meaningful ways. Give columns meaningful names and descriptions. One of the advantages of column names I learned was that you could call a column anything. If you chose to call the address column COW3, and you knew what COW3 stood for, you were golden … and it would work. Don’t forget about security. It offered security since nobody could guess what COW3 or SEGGH meant in a million years. (I guess that was job security.) Just as we want to pick meaningful codes for the contents of our columns, we want to pick meaningful names and descriptions for our columns. End users like to know what the data elements are in a table. It does not cost much more to use a good column heading or some nice text to help a user understand the intent and purpose of a column. Expand codes to meaningful text. In joined rows or in the building of new physical rows to support end users, take the codes and create a code table or table. When doing the join for the user, also join to the code tables. In this manner, through the joined logical table or through an extraction, the description of the codes can also be included in the user’s row layout. In the earlier example, the occupation code in our demographic table could be expanded through a join or an extraction to also include the description of the occupational code. This gives users more meaningful data with much less work than performing the joins themselves. If the code tables do not exist, build them. They are worth the investment both for query and for further documenting the production system. I would also suggest leaving the codes in the row design for narrow report queries and deeper analysis. Of course, we should always use the basic principles of good system design, which suggest that we start with requirements first. Since end-user requirements are always in the form of report and display outputs, we use this as a staring point to assure that our well-designed, first normal form rows provide the information for the end users in a form they can easily use. In most end-user design projects, however, we are not alone. There is already a production system in place that maintains the data our users wish to access. The next part expands the design techniques we have discussed to apply to the most common databases of all: existing databases.
  • 10. 10 TENET Business Intelligence TENET’s business intelligence is obtained from the data of the fifty-nine PBAR and thirty-three PATCOM hospitals it owns. This data is contained in a combination of normalized and denormalized tables residing in one hundred fifty-one collections. These collections are subdivided between PBAR and PATCOM as follows. PBAR is represented by one hundred eighteen collections. Fifty-nine containing Online Transaction Processing (OLTP) databases, and fifty-nine containing primary repository databases, distributed over seven production AS/400s. PATCOM is represented by thirty-three collections containing both OLTP and primary repository databases on one non-production AS/400. In addition to this, we have a consolidated PBAR version, obtained by merging the fifty- nine OLTP collections, at the table level, into a unique collection residing on a separate non-production AS/400. To summarize, each collection is equivalent to a data warehouse therefore TENET’s business intelligence is composed of one hundred fifty-two data warehouses across nine systems. End-user analysis against these data warehouses is made possible by four distinct applications. Being familiar with the adage that a picture is worth a thousand words, I will breakdown the architecture into the following levels of ever increasing visual detail. PBAR & PATCOM  Enterprise Level PBAR  System Level  Data Warehouse Level PBAR Consolidated Version  System Level  Data Warehouse Level  PATCOM Interfacing Applications  Showcase Vista  CASEMIX Reports  PQS  Cost Accounting  PATCOM will be detailed in the “Current Enhancement Project” portion of this document
  • 11. 11 PBAR & PATCOM HDCA 59 hospitals 1 OLTP Interfacing Applications PBAR system Consolidated version, 1 non- production AS/400, 59 OLTP warehouses merged into one. PATCOM system, 1 non-production AS/400, 33 OLTP/PR warehouses. DAAC 33 hospitals 33 OLTP/PR Interfacing Applications ENTERPRISE LEVEL NETWORK HDCF 12 hospitals 12 OLTP MODB 12 hospitals 12 OLTP USCA 8 hospitals 8 OLTP SIEB 2 hospitals 2 OLTP HOLA 8 hospitals 8 OLTP DHFB 7 hospitals 7 OLTP MEAB 10 hospitals 10 OLTP Interfacing Applications Interfacing Applications Interfacing Applications Interfacing Applications Interfacing Applications Interfacing Applications Interfacing Applications PBAR system, 7 production AS/400s, 59 hospitals, 59 OLTP warehouses, and 59 primary repository warehouses. 12 PR 12 PR 8 PR 2 PR 8 PR 7 PR 10 PR
  • 12. 12 PBAR System Level The following depicts the AS/400 (DHFB), system. Because the architecture at the system and warehouse level is identical for all PBAR hospitals, any PBAR AS/400 could have been chosen to represent the following. The following table lists the fourteen warehouses and the seven hospitals they represent, residing on AS/400 (DHFB) with their respective storage requirements in bytes. Figure 3. AS/400 DHFB data warehouses Hospital Collection Size in Bytes Purpose Trinity DATRI 952,636,024 Normalized end-user access with joins (OLTP) DATRICDD 239,185,120 Denormalized end-user access (Primary Repository) Memorial DADED 767,639,552 Normalized end-user access with joins DADEDCDD 133,165,056 Denormalized end-user access Doctor’s DADHF 1,165,406,208 Normalized end-user access with joins DADHFCDD 190,091,264 Denormalized end-user access Harton DAHAR 741,367,808 Normalized end-user access with joins DAHARCDD 150,687,744 Denormalized end-user access Methodist DAJON 440,741,888 Normalized end-user access with joins DAJONCDD 92,037,120 Denormalized end-user access Medical Center DAMAH 221,548,544 Normalized end-user access with joins DAMAHCDD 51,011,584 Denormalized end-user access University DAUNV 1,177,239,552 Normalized end-user access with joins DAUNVCDD 229,093,376 Denormalized end-user access DHFB 7 hospitals 7 OLTP Interfacing Applications 7 PR
  • 13. 13 Warehouse Level Figure 4 lists the tables that make up the OLTP warehouse representing Trinity hospital. Figure 4. Object Type Collection Attribute Text ABSTRACT *TABLE DATRI PF DA: Patient Abstract table. ACTIVITY *TABLE DATRI PF CA: Activity Master ACTIVJOIN1 *JOIN DATRI LF DA: VISIT/CHARGES/ACTIVITY APRDESC *TABLE DATRI PF APRDRG Description Table BROKER S *TABLE DATRI PF DA: Broker Table Table CDMDESC *TABLE DATRI PF DA: CDM description table CHARGES *TABLE DATRI PF DA: Patient Charges CLINIC *TABLE DATRI PF DA: Clinic Code Table CLINSPTY *TABLE DATRI PF DA: CMM Clinical Specialty CMMPAYORS *TABLE DATRI PF DA: CMM Payor Group COSTCTR *TABLE DATRI PF DA: Cost center name CPT4SURG *TABLE DATRI PF DA: Patient Surgical CPT4 DEMOG *TABLE DATRI PF DA: Patient Demographics DIAGDESC *TABLE DATRI PF DA: Diagnosis description DIAGL1 *VIEW DATRI LF DA: Patient Diagnosis by Di DRGDESC *TABLE DATRI PF DA: DRG Descriptions Table DRGWR *TABLE DATRI PF DA: DRG Weight & Rate Table EDLOG *TABLE DATRI PF DA: Emergency Department Lo FINSUM *TABLE DATRI PF DA: Patient Visit Financial FUR *TABLE DATRI PF DA: Patient notes detail. ICD9DIAG *TABLE DATRI PF DA: Patient Diagnosis ICD9PROC *TABLE DATRI PF DA: Patient Procedure MDCDESC *TABLE DATRI PF MDC DescriptionTable MDTABLE *TABLE DATRI PF DA: Physician Table MDTABLL1 *VIEW DATRI LF DA: Physican Group Code NC2625P *TABLE DATRI PF MaCS: Work table for program NONSTFMD *TABLE DATRI PF DA: Patient Physician (Non- PATDIAG *TABLE DATRI PF DA: All patient diagnosis c PATINS *TABLE DATRI PF DA: Patient Insurance PATINSL1 *VIEW DATRI LF DA: Patient Insurance by Pl PATMDS *TABLE DATRI PF DA: All Patient Physicians PATPHYS *TABLE DATRI PF DA: Patient Physician PATPROC *TABLE DATRI PF DA: All patient procedure c PATTYPE *TABLE DATRI PF DA: Patient type table table PAYCDDES *TABLE DATRI PF DA: CMM Payor Code Descript PAYGPDES *TABLE DATRI PF DA: CMM Payor Group Descrip PAYMENT *TABLE DATRI PF DA: Patient Account Payment PHYSL1 *VIEW DATRI LF DA: Patient Physician by Ph PROCDESC *TABLE DATRI PF DA: Procedure description PROCL1 *VIEW DATRI LF DA: Patient Procedure by Pr REHABGEN *TABLE DATRI PF DA: Rehab General REHABREF *TABLE DATRI PF DA: Rehab Referring Facilit
  • 14. 14 Figure 4. continued. Object Type Collection Attribute Text REHABTRN *TABLE DATRI PF DA: Rehab Transferring Facility VISIT *TABLE DATRI PF DA: Patient Visit VISITJOIN1 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS/CHA VISITJOIN2 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS VISITJOIN3 *JOIN DATRI LF DA: VISIT/PATPHYS/ICD9DIAG VISITJOIN4 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS VISITJOIN5 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT VISITJOIN6 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT VISITJOIN7 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT VISITJOIN8 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS/CHA VISITL1 *VIEW DATRI LF DA: Patient Visit by DRG VISIT2 *VIEW DATRI LF DA: Patient Visit in Discharge VISIT3 *VIEW DATRI LF DA: Patient Visit in MRC# Figure 5 lists the tables that make up the primary repository warehouse representing Trinity hospital. Figure 5. Object Type Collection Attribute Text PATACTV *TABLE DATRICDD PF CDD: Patient Information Active PATFULL *TABLE DATRICDD PF CDD: Patient Information Full PATLIMT *TABLE DATRICDD PF CDD: Patient Information Limited PATROOM *TABLE DATRICDD PF CDD: Patient Info Room .
  • 15. 15 PBAR CONSOLIDATED VERSION System Level The following depicts the AS/400 (HDCA) system. The following table lists the consolidated PBAR warehouse representing fifty-nine hospitals, residing on AS/400 (HDCA) with its storage requirement in bytes. The primary repositories have not been consolidated, nor do they exist on this system. And there is no plan to do so that I am aware of. Figure 6. AS/400 HDCA consolidated data warehouse Hospital Collection Size in Bytes Purpose PBAR DACONS 22,123,130,880 Normalized end-user access with joins (OLTP) HDCA 59 hospitals 1 OLTP Interfacing Applications
  • 16. 16 Warehouse Level Figure 7 lists the tables that make up the consolidated OLTP warehouse representing fifty-nine hospitals. Figure 7. Object Type Collection Attribute Text ABSTRACT *TABLE DACONS PF DA: Patient Abstract table. ACTIVITY *TABLE DACONS PF CA: Activity Master ACTIVJOIN1 *JOIN DACONS LF DA: VISIT/CHARGES/ACTIVITY APRDESC *TABLE DACONS PF APRDRG Description Table BROKER S *TABLE DACONS PF DA: Broker Table Table CDMDESC *TABLE DACONS PF DA: CDM description table CHARGES *TABLE DACONS PF DA: Patient Charges CLINIC *TABLE DACONS PF DA: Clinic Code Table CLINSPTY *TABLE DACONS PF DA: CMM Clinical Specialty CMMPAYORS *TABLE DACONS PF DA: CMM Payor Group COSTCTR *TABLE DACONS PF DA: Cost center name CPT4SURG *TABLE DACONS PF DA: Patient Surgical CPT4 DEMOG *TABLE DACONS PF DA: Patient Demographics DIAGDESC *TABLE DACONS PF DA: Diagnosis description DIAGL1 *VIEW DACONS LF DA: Patient Diagnosis by Di DRGDESC *TABLE DACONS PF DA: DRG Descriptions Table DRGWR *TABLE DACONS PF DA: DRG Weight & Rate Table EDLOG *TABLE DACONS PF DA: Emergency Department Lo FINSUM *TABLE DACONS PF DA: Patient Visit Financial FUR *TABLE DACONS PF DA: Patient notes detail. ICD9DIAG *TABLE DACONS PF DA: Patient Diagnosis ICD9PROC *TABLE DACONS PF DA: Patient Procedure MDCDESC *TABLE DACONS PF MDC DescriptionTable MDTABLE *TABLE DACONS PF DA: Physician Table MDTABLL1 *VIEW DACONS LF DA: Physican Group Code NC2625P *TABLE DACONS PF MaCS: Work table for program NONSTFMD *TABLE DACONS PF DA: Patient Physician (Non- PATDIAG *TABLE DACONS PF DA: All patient diagnosis c PATINS *TABLE DACONS PF DA: Patient Insurance PATINSL1 *VIEW DACONS LF DA: Patient Insurance by Pl PATMDS *TABLE DACONS PF DA: All Patient Physicians PATPHYS *TABLE DACONS PF DA: Patient Physician PATPROC *TABLE DACONS PF DA: All patient procedure c PATTYPE *TABLE DACONS PF DA: Patient type table table PAYCDDES *TABLE DACONS PF DA: CMM Payor Code Descript PAYGPDES *TABLE DACONS PF DA: CMM Payor Group Descrip PAYMENT *TABLE DACONS PF DA: Patient Account Payment PHYSL1 *VIEW DACONS LF DA: Patient Physician by Ph PROCDESC *TABLE DACONS PF DA: Procedure description PROCL1 *VIEW DACONS LF DA: Patient Procedure by Pr REHABGEN *TABLE DACONS PF DA: Rehab General REHABREF *TABLE DACONS PF DA: Rehab Referring Facilit
  • 17. 17 Figure 7, continued. Object Type Collection Attribute Text REHABTRN *TABLE DACONS PF DA: Rehab Transferring Facility VISIT *TABLE DACONS PF DA: Patient Visit VISITJOIN1 *JOIN DACONS LF DA: VISIT/FINSUM/PATINS/CHA VISITJOIN2 *JOIN DACONS LF DA: VISIT/FINSUM/PATINS VISITJOIN3 *JOIN DACONS LF DA: VISIT/PATPHYS/ICD9DIAG VISITJOIN4 *JOIN DACONS LF DA: VISIT/FINSUM/PATINS VISITJOIN5 *JOIN DACONS LF DA: VISIT/DEMOG/FINSUM/PAT VISITJOIN6 *JOIN DACONS LF DA: VISIT/DEMOG/FINSUM/PAT VISITJOIN7 *JOIN DACONS LF DA: VISIT/DEMOG/FINSUM/PAT VISITJOIN8 *JOIN DACONS LF DA: VISIT/FINSUM/PATINS/CHA VISITL1 *VIEW DACONS LF DA: Patient Visit by DRG VISIT2 *VIEW DACONS LF DA: Patient Visit in Discharge VISIT3 *VIEW DACONS LF DA: Patient Visit in MRC#
  • 18. 18 Interfacing Applications The following applications currently interface with the TENET data warehouse and are supported by us:  Showcase Vista Third party PC based end-user analysis tool.  CASEMIX Reports Homegrown menu driven reporting system, if users require modifications of existing reports, programmer intervention is required. Ranging from modifying an existing one to creating one from scratch.  PQS System Homegrown menu driven reporting system that allows users to play out “what if scenarios”.  Cost Accounting Homegrown menu driven reporting system for cost accounting purposes. None of these systems actually modify the data residing in the individual collections that make up TENET’s business Intelligence. This creates the opportunity for us to re-design the current data warehouse architecture to better take advantage of the hardware, homegrown applications, PC based end-user tools, and to create more powerful applications. The re-design is addressed with in-depth detail in the “Proposed Project” portion of this document.
  • 19. 19 Current Enhancement Project Due to the negative impact on production response times brought about by the PBAR warehouses residing on production systems, an enhancement project is currently being undertaken to move the PBAR warehouses off their current production systems, and on to the non-production PATCOM system. Before we proceed any further, I would like to state for the record, that I was not involved in any way, in any and all phases of this project. Because PATCOM’s primary repository and OLTP databases are located within the same collection, PBAR will be modified accordingly to create a consistent architecture. The PBAR primary repository databases will be placed into their respective OLTP collections, and the primary repository collections will be removed. Consequently the enterprise wide network will be reduced to two non-production AS/400s, the number of warehouses will be reduced to ninety-three, and the consolidated warehouse will remain as is. Confusing? Don’t worry, visual aid is a page away. PATCOM  Enterprise Level  System Level  Data Warehouse Level  Estimates
  • 20. 20 PBAR & PATCOM HDCA 59 hospitals 1 OLTP Interfacing Applications PBAR system Consolidated version, 1 non-production AS/400, 59 OLTP warehouses merged into one. PATCOM/PBAR system, 1 non- productionAS/400, 92 OLTP/PR warehouses. DAAC 92 hospitals 92 OLTP/PR Interfacing Applications ENTERPRISE LEVEL NETWORK HDCF 12 hospitals 12 OLTP MODB 12 hospitals 12 OLTP USCA 8 hospitals 8 OLTP SIEB 2 hospitals 2 OLTP HOLA 8 hospitals 8 OLTP DHFB 7 hospitals 7 OLTP MEAB 10 hospitals 10 OLTP Interfacing Applications Interfacing Applications Interfacing Applications Interfacing Applications Interfacing Applications Interfacing Applications Interfacing Applications PBAR system, 7 production AS/400s, 59 hospitals, 59 OLTP warehouses, and 59 primary repository warehouses. 12 PR 12 PR 8 PR 2 PR 8 PR 7 PR 10 PRPBAR system merged with PATCOM system on to AS/400 (DAAC).
  • 21. 21 PATCOM System Level The following depicts the AS/400 (DAAC) system. For the sake of keeping it short the following table lists the same sample of seven hospitals depicted earlier at the PBAR system level. The increased size reflects the inclusion of the primary repository databases in the OLTP collections. Figure 8. AS/400 DAAC merged data warehouses Hospital Collection Size in Bytes Purpose Trinity DATRI 1,191,821,144 Normalized and denormalized end-user access with joins Memorial DADED 823,873,536 Normalized and denormalized end-user access with joins Doctor’s DADHF 1,216,704,512 Normalized and denormalized end-user access with joins Harton DAHAR 818,868,224 Normalized and denormalized end-user access with joins Methodist DAJON 527,355,904 Normalized and denormalized end-user access with joins Medical Center DAMAH 244,506,624 Normalized and denormalized end-user access with joins University DAUNV 1,278,439,424 Normalized and denormalized end-user access with joins DAAC 92 hospitals 92 OLTP/PR Interfacing Applications
  • 22. 22 Warehouse Level Figure 9 lists the tables that make up of the OLTP warehouse representing Trinity hospital, which now includes the merged primary repository tables. Figure 9. Object Type Collection Attribute Text ABSTRACT *TABLE DATRI PF DA: Patient Abstract table. ACTIVITY *TABLE DATRI PF CA: Activity Master ACTIVJOIN1 *JOIN DATRI LF DA: VISIT/CHARGES/ACTIVITY APRDESC *TABLE DATRI PF APRDRG Description Table BROKER S *TABLE DATRI PF DA: Broker Table Table CDMDESC *TABLE DATRI PF DA: CDM description table CHARGES *TABLE DATRI PF DA: Patient Charges CLINIC *TABLE DATRI PF DA: Clinic Code Table CLINSPTY *TABLE DATRI PF DA: CMM Clinical Specialty CMMPAYOR *TABLE DATRI PF DA: CMM Payor Group COSTCTR *TABLE DATRI PF DA: Cost center name CPT4SURG *TABLE DATRI PF DA: Patient Surgical CPT4 DEMOG *TABLE DATRI PF DA: Patient Demographics DIAGDESC *TABLE DATRI PF DA: Diagnosis description DIAGL1 *VIEW DATRI LF DA: Patient Diagnosis by Di DRGDESC *TABLE DATRI PF DA: DRG Descriptions Table DRGWR *TABLE DATRI PF DA: DRG Weight & Rate Table EDLOG *TABLE DATRI PF DA: Emergency Department Lo FINSUM *TABLE DATRI PF DA: Patient Visit Financial FUR *TABLE DATRI PF DA: Patient notes detail. ICD9DIAG *TABLE DATRI PF DA: Patient Diagnosis ICD9PROC *TABLE DATRI PF DA: Patient Procedure MDCDESC *TABLE DATRI PF MDC DescriptionTable MDTABLE *TABLE DATRI PF DA: Physician Table MDTABLL1 *VIEW DATRI LF DA: Physican Group Code NC2625P *TABLE DATRI PF MaCS: Work table for program NONSTFMD *TABLE DATRI PF DA: Patient Physician (Non- PATDIAG *TABLE DATRI PF DA: All patient diagnosis c PATINS *TABLE DATRI PF DA: Patient Insurance PATINSL1 *VIEW DATRI LF DA: Patient Insurance by Pl PATMDS *TABLE DATRI PF DA: All Patient Physicians PATPHYS *TABLE DATRI PF DA: Patient Physician PATPROC *TABLE DATRI PF DA: All patient procedure c PATTYPE *TABLE DATRI PF DA: Patient type table table PAYCDDES *TABLE DATRI PF DA: CMM Payor Code Descript PAYGPDES *TABLE DATRI PF DA: CMM Payor Group Descrip PAYMENT *TABLE DATRI PF DA: Patient Account Payment PHYSL1 *VIEW DATRI LF DA: Patient Physician by Ph PROCDESC *TABLE DATRI PF DA: Procedure description PROCL1 *VIEW DATRI LF DA: Patient Procedure by Pr REHABGEN *TABLE DATRI PF DA: Rehab General REHABREF *TABLE DATRI PF DA: Rehab Referring Facilit
  • 23. 23 Figure 9. continued. Object Type Collection Attribute Text REHABTRN *TABLE DATRI PF DA: Rehab Transferring Facility VISIT *TABLE DATRI PF DA: Patient Visit VISITJOIN1 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS/CHA VISITJOIN2 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS VISITJOIN3 *JOIN DATRI LF DA: VISIT/PATPHYS/ICD9DIAG VISITJOIN4 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS VISITJOIN5 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT VISITJOIN6 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT VISITJOIN7 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT VISITJOIN8 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS/CHA VISITL1 *VIEW DATRI LF DA: Patient Visit by DRG VISIT2 *VIEW DATRI LF DA: Patient Visit in Discharge VISIT3 *VIEW DATRI LF DA: Patient Visit in MRC# PATACTV *TABLE DATRI PF CDD: Patient Information Active PATFULL *TABLE DATRI PF CDD: Patient Information Full PATLIMT *TABLE DATRI PF CDD: Patient Information Limited PATROOM *TABLE DATRI PF CDD: Patient Info Room 1 1 Included primary repository tables
  • 24. 24 Proposed Project To create a data warehouse architecture that is independent of the platform upon which it resides, and that takes advantage of the hardware to the utmost. The platform independent architecture can be achieved by creating collections; base tables, views, and indexes using Structured Query Language (SQL). Taking advantage of the hardware is relative to the hardware platform itself. In light of this, I will list a few highlights regarding AS/400 hardware, and remand the detail to a future document. SQL is an industry-standard language for defining and manipulating data contained in a relational database. An IBM research lab developed SQL in the 1970s to explore an implementation of the relational database model. Since that time, SQL has become a widely used language that’s included in most relational Database Management Systems (DBMS), including IBM’s family of DB2 products. Several national and international standards organizations have published SQL standards, which the major relational DBMS (including DB2/400) follow for their versions of SQL. Two advantages come to mind when discussing SQL based architectures, one, is the fact that with relatively few modifications, it can be transferred to other platforms. Two, it defines views that allow you to query the structure of the database. What this means is, that we can use of the shelf packages such as MS Access to automatically draw up a database map, showing such information as primary keys, indexes, referential integrity, and referential constraints. Where the AS/400 is concerned, I truly believe it is the platform of choice for the following reasons, (remember I promised to keep it short here). The main strengths are in two areas. The first area is scalability. It continues to be the only database in the industry that is fully 64-bit enabled. When you combine that with the new hardware that is about to ship, especially when we look at main memory sizes, it has a tremendous competitive advantage over the other people in the industry, especially in a data warehouse environment. Today the machines come with 40 GB of main memory. In the next 2-3 years, those machines will ship with half a terabyte of main memory. And that’s something that’s possible only through 64-bit technology. The other major area where there is a competitive advantage is ease of use and administration. It’s fairly common knowledge that there aren’t many AS/400 installations that have a database administrator. They just don’t require one. A lot of activities that a normal database administrator would go through just aren’t done on the AS/400. You manage users from a system perspective, not a database perspective. The majority of things that you’d normally do as a DBA are fully automated and optimized by the system. Let us begin by taking a look at some growth estimates and storage requirements.
  • 25. 25 Growth Estimates Extrapolations  Average Per Hospital Figure 10 depicts a representative sample of seven of the ninety-two hospitals that make up the TENET data warehouse and their size in bytes as of 3/1/1999. Currently we store four years worth of data on-line, for each hospital. Figure 10. Sampling of seven hospitals to extrapolate average hospital size. Hospital Collection Size in Bytes Purpose Trinity DATRI 1,191,821,144 Normalized and denormalized end-user access with joins Memorial DADED 823,873,536 Normalized and denormalized end-user access with joins Doctor’s DADHF 1,216,704,512 Normalized and denormalized end-user access with joins Harton DAHAR 818,868,224 Normalized and denormalized end-user access with joins Methodist DAJON 527,355,904 Normalized and denormalized end-user access with joins Medical Center DAMAH 244,506,624 Normalized and denormalized end-user access with joins University DAUNV 1,278,439,424 Normalized and denormalized end-user access with joins Total 6,101,569,368 Sum of sampled hospitals. Figure 11 extrapolates the average hospital size based on the seven sample hospitals. Figure 11. Average hospital size, calculation table. Calculation Value Result/Description 6,101,569,368 Sum of sampled hospitals. Divide by 7 Number of sampled hospitals. Equals 871,652,767 Extrapolated average of 870 Megs per hospital.  Average Transaction Volume Per Hospital Figure 12 extrapolates the percentage increase per month from transaction volume, for the period 3/1/1999 to 4/1/1999, for Trinity hospital. This percentage will be used as a median to calculate the yearly growth of TENET’s data warehouse from transaction volume. Figure 12. Trinity data warehouse monthly transaction volume increase, calculation table. Calculation Value Result/Description 1,228,697,600 New Trinity Hospital size as of 4/1/1999. Subtract 1,191,821,144 Old Trinity Hospital size as of 3/1/1999. Equals 36,876,456 37 Megs increase per month from transaction volume. Divide by 1,191,821,144 Old Trinity Hospital Size. Equals .031 Extrapolated average of 3.1% increase per month per hospital.
  • 26. 26 Current  Size as of 3/1/1999 Figure 13 estimates the size of the TENET data warehouse as of 3/1/1999. Figure 13. TENET data warehouse size, calculation table. Calculation Value Result/Description 871,652,767 Estimated average of 870 Megs per hospital. Multiply by 92 Number of TENET hospitals. Equals 80,192,054,564 Estimated 80 Gigs TENET data warehouse size, as of 3/1/1999. Estimated Critical success factors for estimating growth are:  Additional Hospitals Figure 14 estimates the yearly growth of the TENET data warehouse from additional hospitals. Currently we are adding 12 hospitals a year. Figure 14. TENET data warehouse yearly additional hospitals size increase, calculation table. Calculation Value Result/Description 871,652,767 Average of 870 Megs per hospital. Multiply by 12 Number of additional hospitals per year. Equals 10,459,833,204 Estimated 10.5 Gigs increase from additional hospitals per year.  Transaction Volume Figure 15 estimates the yearly growth of the TENET data warehouse from the transaction volume, using the median calculated in figure 12. Figure 15. TENET data warehouse yearly transaction volume size increase, calculation table. Calculation Value Result/Description 80,192,054,564 Estimated 80 Gigs TENET data warehouse size, as of 3/1/1999. Multiply .031 Average of 3.1% increase per month per hospital. Equals 2,485,953,691 Estimated 2.5 Gigs increase per month from transaction volume. Multiply 12 Months in a year. Equals 29,831,444,298 Estimated 30 Gigs increase per year from transaction volume.
  • 27. 27 Historical The current warehouse holds four years worth of data on-line. Say we want to hold ten years worth. Can you imagine the potential of ten years worth of data on-line? What would it take storage space wise to achieve this goal?  Ten Years Worth Applied to Current Size. Figure 16. TENET data warehouse current historical size, calculation table. Calculation Value Result/Description 80,192,054,564 Estimated 80 Gigs TENET data warehouse size, as of 3/1/1999. Multiply 2.5 Additional 6 years worth. Equals 200,480,136,410 Estimated 200 Gigs total, for proposed 10 years worth of data.
  • 28. 28 Storage Requirements Considering the estimated yearly growth of the TENET data warehouse, let us determine the duration of the currently available storage on the AS/400 DAAC upon which it resides, for both four years worth and ten years worth of data.  Average Yearly Growth Figure 17 estimates the current yearly growth. Figure 17. TENET data warehouse total yearly size increase, calculation table. Calculation Value Result/Description 10,459,833,204 Estimated 10.5 Gigs increase from additional hospitals per year. Add 29,831,444,298 Estimated 30 Gigs increase per year from transaction volume. Equals 40,291,277,502 Estimated 40 Gigs increase per year total. Figure 18. AS/400 DAAC duration of current storage with four years worth of data, calculation table. Calculation Value Result/Description 390,000,000,000 390 Gigs, current AS/400 DAAC size. Subtract 80,192,054,564 Estimated 80 Gigs for current 4 years worth of data. Equals 309,807,945,436 Estimated 310 Gigs available storage size. Divide by 40,291,277,502 Estimated 40 Gigs increase per year total. Equals 7.5 Estimated 7.5 years duration for current storage with 4 years of data. Figure 19. AS/400 DAAC duration of current storage with ten years worth of data, calculation table. Calculation Value Result/Description 390,000,000,000 390 Gigs, current AS/400 DAAC size. Subtract 200,480,136,410 Estimated 200 Gigs total, for proposed 10 years worth of data. Equals 189,519,863,590 Estimated 190 Gigs available storage size. Divide by 40,291,277,502 Estimated 40 Gigs increase per year total. Equals 5 Estimated 5 years duration for current storage with 10 years of data.
  • 29. 29 Methodology To briefly recap, the current architecture consists of ninety-three data warehouses distributed across two, non-production AS/400s. These should be reduced to one data warehouse on one AS/400. Proof of Concept What follows is a partial prototype involving four small transactional type tables from the Trinity hospital collection. The end result is a single table containing the data previously stored in four different ones. You may call this the denormalization of normalcy. As you go through the technicalities of the prototype you will come across some of the previously discussed transformation procedures. Specifically data aggregation, data standardization, and data cleansing. Figure 21 details four transaction-oriented tables. Figure 22 details one end-user-oriented table. GREEN: Identifies key columns, which combined, uniquely identify the row. Figure 21. Trinity Hospital Normalized Tables Patient Diagnosis Clinic Code Table Patient Procedure Patient Surgical PATACCT# PATACCT# PATACCT# PATACCT# SEQUENCE# SEQUENCE# SEQUENCE# SEQUENCE# DIAGCODE CLINICCODE PROCCODE CPT4CODE DIAGMODI DATELSTCHG PROCMODI CPT4MODI DATELSTCHG HOSPITAL PROCDATE CPT4MODI2 HOSPITAL DATELSTCHG CPT4DATE HOSPITAL DATELSTCHG HOSPITAL Figure 22. Trinity Hospital Denormalized Table Clinical Data PATACCT# SEQUENCE# HOSPITAL DATELSTCHG DIAGCODE DIAGMODI CLINICCODE PROCCODE PROCMODI PROCDATE CPT4CODE CPT4MODI CPT4MODI2 CPT4DATE
  • 30. 30 Figure 23 reproduces rows from the normalized tables. Figure 24 reproduces rows from the denormalized table. Figure 23. Trinity Hospital Normalized Tables Rows Patient Diagnosis Patient Account Number Diag Seq Num Diagnosis Code Diagnosis Modifier Last Change Date Hospital Code 4307914 1 53510 19960619 TRI 4307914 2 4019 19960619 TRI 4307914 3 56984 19960619 TRI 4307914 4 5303 19960619 TRI 4307914 5 04186 19960619 TRI 4307914 16 5781 19960618 TRI Clinic Code Table Patient Account Number Clinic Seq Num Clinic Code Last Change Date Hospital Code 4307914 1 SO 19960619 TRI Patient Procedure Patient Account Number Proc Seq Num Proc Code Proc Mod Procedure Date Last Change Date Hospital Code 4307914 1 4516 19960614 19960619 TRI 4307914 2 4523 19960614 19960619 TRI Patient Surgical Patient Account Number CPT4 Seq Num CPT4 Code CPT4 Mod. CPT4 Modifier 2 CPT4 Date Last Change Date Hospital Code 4307914 1 43239 19960614 19960619 TRI 4307914 2 45378 19960614 19960619 TRI Figure 24. Trinity Hospital Denormalized Table Rows Clinical Patact# Seq Hos Cod Change Date Diag Code Diag Mod Clinic Code Proc Cod Proc Mod Procedure Date Cpt4 Code Cpt4 Mod Cpt4 Mod2 Cpt4 Date 4307914 1 TRI 1996-06-19 53510 SO 4516 1996-06-14 43239 1996-06-14 4307914 2 TRI 1996-06-19 4019 4523 1996-06-14 45378 1996-06-14 4307914 3 TRI 1996-06-19 56984 0001-01-01 0001-01-01 4307914 4 TRI 1996-06-19 5303 0001-01-01 0001-01-01 4307914 5 TRI 1996-06-19 04186 0001-01-01 0001-01-01 4307914 16 TRI 1996-06-18 5781 0001-01-01 0001-01-01 As you can see from the preceding layout, the retrieval of all of the clinical data regarding patient 4307914, against the Trinity Hospital normalized tables, requires eleven distinct disk accesses. Whereas the Trinity Hospital denormalized table, requires only six distinct accesses. In addition to this, the denormalized version allows row blocking to gather all six rows in main memory at once reducing the disk accesses to one. This is not possible in the normalized version due to the random access algorithms necessary to retrieve rows from multiple tables.
  • 31. 31 If we leave the architecture as is, we have yes, achieved an improvement in access times and as you will see later a storage saving. But it still leaves us with ninety-two data warehouses. Time for a quick recap if you will. The tables’ architecture is identical for all hospitals therefore we can consolidate the like tables into one table. And if we can do it for all tables, as is the case, we can reduce ninety-three warehouses into one like so. Since I cannot reproduce all ninety-two hospitals and expect you to keep your sanity, I have chosen two, to demonstrate what the layout looks like. Figure 25. Trinity and Alvarado Hospitals Denormalized Table Rows Clinical Patact# Seq Hos Cod Change Date Diag Code Diag Mod Clinic Code Proc Cod Proc Mod Procedure Date Cpt4 Code Cpt4 Mod Cpt4 Mod2 Cpt4 Date 4307914 1 TRI 1996-06-19 53510 SO 4516 1996-06-14 43239 1996-06-14 4307914 2 TRI 1996-06-19 4019 4523 1996-06-14 45378 1996-06-14 4307914 3 TRI 1996-06-19 56984 0001-01-01 0001-01-01 4307914 4 TRI 1996-06-19 5303 0001-01-01 0001-01-01 4307914 5 TRI 1996-06-19 04186 0001-01-01 0001-01-01 4307914 16 TRI 1996-06-18 5781 0001-01-01 0001-01-01 4307914 1 ALV 1996-06-19 53510 SO 4516 1996-06-14 43239 1996-06-14 4307914 2 ALV 1996-06-19 4019 4523 1996-06-14 45378 1996-06-14 4307914 3 ALV 1996-06-19 56984 0001-01-01 0001-01-01 4307914 4 ALV 1996-06-19 5303 0001-01-01 0001-01-01 4307914 5 ALV 1996-06-19 04186 0001-01-01 0001-01-01 4307914 16 ALV 1996-06-18 5781 0001-01-01 0001-01-01 Notice the additional key field, Hospital Code. This is necessary to maintain each row’s uniqueness in the consolidated data warehouse, in the remote eventuality that identical Patient Numbers are used in different hospitals, and to be able to distinguish between hospitals for queries. If you look closely at the data you will notice the date values represented with dashes and what in the world is 0001-01-01? Introducing the ‘L’ date data type. This attribute enforces data integrity by allowing only valid dates. Since a string of blanks or zeroes is not a valid date, the system automatically defaults to the earliest date. Ergo, 0001-01-01. You can specify any default date as long as it is a valid date, and you may also use the NULL value. The advantages of utilizing this attribute on all date columns are threefold. One, automatic editing of the value, two, requires only 4 bytes of disk storage versus the 8 required by current zone decimal definition. Three allows the use of special operation codes to simplify date manipulation and date arithmetic within programs. Other enhancements to the denormalized table include the elimination of repeating columns, “Patient Account Number”, “Last Change Date”, “Hospital Code”, and “Sequence Number”. And, the optimization of numeric columns from the standpoint of disk storage and CPU processing. The AS/400 stores numeric values in a packed format. If you define the numeric columns as zoned decimal, you are incurring additional CPU processing time for the translation from one format to the other each time that column is accessed. You are also approximately doubling disk storage requirements for your numeric columns.
  • 32. 32 By simply applying these design criteria, and without any end-user input I have effectively achieved a much more efficient end-user access table. The additional end-user input will result in an even more efficient design. An additional benefit of this design is the amount of storage space reclaimed as shown in figures 26 and 27. Figure 26. Trinity Hospital Normalized Tables Required Storage Space in Bytes TABLE NAME DATA ACCESS PATH TOTAL Patient Diagnosis 17,827,840 13,438,946 31,275,008 Clinic Code Table 4,722,176 4,526,080 9,254,400 Patient Procedure 2,633,216 1,511,424 2,250,752 Patient Surgical 1,322,496 921,600 2,250,752 Total 26,505,728 20,398,080 46,931,456 Figure 27. Trinity Hospital Denormalized Table Required Storage Space in Bytes TABLE NAME DATA ACCESS PATH TOTAL Clinical01 28,313,600 13,438,976 41,766,912 We see a net saving of five Megs for this one hospital, multiply it times ninety-two, our currently supported hospitals and the savings start getting more interesting, 485-Megs. And that’s not all, I have performed a simple prototyping demonstration on four, small, transaction tables. There are thirty-four additional tables that require more in-depth analysis to determine further aggregation possibilities. I did some quick analysis and can tell you that there are ten more that can be aggregated. Not to mention the elimination of most, if not all joins, depending on the results of the aforementioned aggregations. That will translate into hefty storage savings. In addition, we will have a tremendous performance throughput improvement as described previously.
  • 33. 33 Additional Enhancements The enhancement project discussed previously accomplishes one objective. The improvement of response times on the production systems by the removal of the data warehouses from those production systems. If I may say so, it is akin to cutting off ones hand because your finger hurts. Other functional areas that need to be addressed are:  Data Extraction  Report Mining  Data Transformation  Data Propagation  Data Verification Data Extraction Currently we are populating the data warehouses with daily feeds from a mainframe. Historical data is also obtained from the same mainframe on an as needed basis. We are also operating in an environment that requires continuous enhancements to the existing warehouses as users request additional fields upon which to query. As a perfect example I would like to cite the last two projects I was involved in. The first involved adding one field. The second involved adding six fields and a full historical reload from the mainframe. Together both projects lasted about three months. In addition, there is another project on the sidelines called the “Field Add Project”. Which leads me to believe we will be adding more fields, “why, elementary my dear Watson”! It would seem that the initial user requirements were somewhat incomplete. If so, let’s be proactive and interview the users now, so we may identify all the specific elements that warrant inclusion in the warehouse up front. In doing so, we will have killed two birds with one stone. Pleased the users who will be able to extract additional information and, MIS will automatically become and be perceived as being much more productive. Report Mining Spooled files (reports that have not yet printed) contain data that has already been extracted from operational databases and report mining can be used to access this data. Almost every OLTP application, whether canned or homegrown generates a comprehensive suite of reports. Because they provide valuable information to end users in a relatively intuitive way, reports mask the complexity of underlying OLTP databases. Furthermore, report programs have already located, accessed, extracted, and consolidated valuable operational information. Reports also maintain metadata in the form of column headings, date ranges, titles, and other descriptive text. It may be worthwhile to invest in software that allows the integration of data obtained from spooled files into the warehouse. Currently this option is not even on the drawing board. Yet, all four interfacing applications produce reports.
  • 34. 34 Data Transformation Once raw data has been extracted from OLTP databases, it must be reformatted and refined for the data warehouse. The transformation of this raw data comprises five related activities: data aggregation, data filtering, data combining, data standardization, and data cleansing. Data Aggregation Aggregation is an essential transformation function that summarizes operational data. The aggregation process should combine the header and detail records into one record (interfile aggregation). TENET’s current warehouses don’t make use of this transformation technique. Data Filtering Transformational processes also may filter relevant information from OLTP databases. For example, an executive looking for net revenues would probably have no interest in patient account numbers. This and other extraneous data elements would not be transferred to a data warehouse. To the best of my knowledge no data filtering is done for TENET. Data Combination A third transformation function may combine OLTP data from separate applications and platforms. The growth of distributed-processing environments has resulted in operational databases that are scattered around the world. Data warehouses must be able to combine data elements from these disparate systems. This issue has already reared its head, as we are supposed to integrate the ORNDA system made up of another group of hospitals which have an 11 byte patient account number versus the current TENET standard of 9 bytes. Data Standardization Data-transformation processes standardize data elements and the metadata that describes those elements. The difficulties caused by poor or even nonexistent documentation underscore the need for consistency. Basic field attributes such as content, size, type, and descriptions often differ across multiple applications, or even within a single application. The inconsistent use of codes is a frequent problem as well. Out of all the transformation activities this one is the one TENET lags the least behind in. Nonetheless, there are no technical metadata repositories, nor are there any business metadata repositories. Data Cleansing Another function data-transformation programs perform is that of data cleansing. Data transformation procedures to ensure the accuracy of warehouse repositories must be in place. TENET’s warehouse has none.
  • 35. 35 Data Propagation Data-propagation procedures physically move transformed OLTP data to data warehouses. TENET’s data-propagation procedures are performed periodically on a daily basis. The biggest problem with TENET’s propagation procedures is that the procedures need to be constantly monitored manually. Someone on the mainframe side must monitor the successful completion of the propagation jobs there, and someone on the AS/400 side must verify the successful completion of the propagation jobs there. This is done five days a week multiple times a day. This area needs to be reviewed ASAP. Data Verification To maintain warehouse integrity, systematic procedures to periodically compare warehouse information to operational data must be in place. TENET has none, therefore we only know of problems when the users call us on them, or the propagation procedure crashes because it encountered some unreadable data. As I have shown you, extraction, transformation, and propagation, the three processes that move data into warehouses, as well as verification procedures, are all important elements in creating and maintaining effective data warehouses. Figure 20 on the following page presents an overview of these interrelated processes.
  • 36. 36 Figure 20. Data-Warehouse Maintenance Procedures Operational Database Raw Data Extraction Phase: Custom programs and/or replication tools access raw data. Extracted Data Transformation Phase: Custom programs and/or replication tools cleanse, decode, standardize, and aggregate extracted data. Transformed Data Propagation Phase: Custom programs and/or replication tools move transformed data to the data warehouse. Data Warehouse Warehouse Data Verification Phase: Customized programs regularlycompare warehouse data to source data. Printed Results Verification Reports Operational Data
  • 37. 37 Testimonials Data Warehouse If you or a friend have a mortgage loan with Countrywide, feel free to go to WWW.Countrywide.com , and pull up your loan or any other information on any of the other products such as HELOC, Credit Card and various insurance offerings. The information displayed is retrieved from the back-end I designed on an AS/400 using the previously detailed techniques and procedures. The exceptionally good response time is due mostly to the denormalization technique that allowed me to reduce a database composed of eighteen normalized tables into a database of two denormalized ones. We also came up with a live demo hosting WWW.countrywide.com on the same AS/400 that was hosting the warehouse. Unfortunately at the time we had some unresolved security issues with the firewall and no time to work them out. So we decided to host the website on the NT server. This actually worked out to my favor in that it emphasized the power of denormalized tables. When you request any loan, or other product information at WWW.countrywide.com, your request is received by a JAVA program on the NT server where the site is hosted. The JAVA program then submits a SQL read to the denormalized database on the AS/400 back-end, and returns the requested information with sub-second response time thanks to the minimized disk IO. Keep in mind though, that if the AS/400 that hosts the warehouse would also host the website, the response time would be further increased by the elimination of the middle layer (NT server).
  • 38. 38 DSS & EIS The initial project requirements also included the creation of a Decision Support System (DSS) and an Executive Information System (EIS). Unfortunately, due to political turmoil only the “Retrieval of Loan and Related Information on the Web” requirement survived. Nonetheless we were able to come up with a live demo of the DSS/EIS system. The prototype consisted of the not so hypothetical query, “How many of our customers have multiple products”? We had to come up with a way of satisfying the executives thirst for knowledge and their impatience with long response times. After some brainstorming I came up with the following solution: 1. Develop a “Customer to Product Relationship Table” updated daily from the Product Warehouse. 2. Creation of a temporary table using the Relationship table as input, containing two fields. Customer# # of Products 12345 2 45689 5 Creation time for this table against a 17 million row warehouse 56423 3 on an AS/400 was 2 minutes. etc. etc. 3. Count the number of records in the above temporary table thereby obtaining the answer. Steps two and three were obtained through the following SQL code: CREATE TABLE TEMP1 (COL1 INT, COL2 INT) INSERT INTO TEMP1 SELECT T03CUSTNUM, COUNT(T03PRODCOD) FROM WEBT030P GROUP BY T03CUSTNUM HAVING COUNT (T03PRODCOD) > 1 SELECT COUNT (*) FROM TEMP1 Assume for a moment that we are able to convince the executives to sit down with us and help us pre-define their queries. We could then set up a series of jobs that would run at night whose sole purpose was to create a series of temporary tables, one for each executive query. At that point, come morning the executives would have but to press a key or click a button to obtain answers with sub-second response time.
  • 39. 39 Conclusion Time is of the essence. We need to overhaul TENET’s data warehouse now. Not tomorrow, or the day after, but now. The current architecture does not promote itself to unhindered growth. What confirms this are the difficulties we are encountering to add the ORNDA hospitals, due to the different “patient account number” field sizes. In addition, there are the lack of edits, which cause the notorious garbage in garbage out situation. Wasted storage space, no metadata, and inefficient propagation procedures. If we maintain the status quo it is only a matter of time before we jeopardize our relationship with TENET. This scenario is particularly undesirable in light of the recent IPO. There are several major roadblocks to implementing this overhaul. First and foremost, “If it ain’t broke, don’t fix it”. Well, its about to break. Second, if we get the go ahead to modernize the architecture, the three homegrown applications, CASEMIX, PQS, and Cost Accounting will have to be re-written, and the users will have to be re-trained to access the new data warehouse. So we are looking at embarking on a project of epic proportions which frankly, TENET may not be interested with. In which case, amen. But, if my vision makes sense to you and you feel, like I do, that there is a need for powerful data warehousing solutions such as the one I have depicted in this novel, then we can lay down the foundations for PEROT Systems to become a major player in providing customized data warehousing solutions for our current and future clients.