This document discusses designing databases for both production and end-user access. It argues that end-user access should be treated as its own application, with databases designed to support both production and user needs. The document outlines strategies for database design that minimize tables, include complete information in rows, and make the data intuitive for users. These strategies are aimed at making it easier for users to access and understand the data without requiring complex queries. The document uses an example database for a fraternal organization to illustrate challenges with production database designs for end-user access.
In present times any marketing or customer strategy is incomplete without a social media presence. With customers depending all the more on social media channels to access and disseminate information and reviews, it becomes all the more important for organizations to tap social media channels for actionable insights.
Why Microsoft Office 365 over Google Apps.
Advantages of Microsoft Office 365 over Google Apps.
Disadvantages of Google Apps over Microsoft Office 365
Evaluate Office 365 and Evaluate Google Apps.
In present times any marketing or customer strategy is incomplete without a social media presence. With customers depending all the more on social media channels to access and disseminate information and reviews, it becomes all the more important for organizations to tap social media channels for actionable insights.
Why Microsoft Office 365 over Google Apps.
Advantages of Microsoft Office 365 over Google Apps.
Disadvantages of Google Apps over Microsoft Office 365
Evaluate Office 365 and Evaluate Google Apps.
IT is faced with more and more security vulnerabilities that they do not even know about, thanks to the influx of SaaS apps in the workplace. Learn how to combat these vulnerabilities to create a secured SaaS environment.
Deployment guide for Microsoft Office 2010 for IT professionals.Компания Робот Икс
This book contains information about how to install, configure, and upgrade to Microsoft Office 2010. The audience for this book includes IT generalists, IT operations, help desk and deployment staff, IT messaging administrators, consultants, and other IT professionals.
Business executive leadership, who typically drive the need for BI solutions, are primarily focused on the end user aspect of BI: OLAP reporting and dashboards. However it is vital for businesses to understand that ETL, Integration, Data Modeling, and Data Warehousing form the cornerstones of a successful BI solution. The time and energy spent on selecting an enterprise ETL solution along with designing finely tuned and highly performing ETL processes will ultimately produce “clean” data ready to be consumed by the business. Traditionally, ETL Tools have been extremely expensive, and some of them still are. While these tools have superb functionality and support, the question remains, “does every organization need all the functionality they provide or are there cheaper alternatives that would do the job just as well?”
IT is faced with more and more security vulnerabilities that they do not even know about, thanks to the influx of SaaS apps in the workplace. Learn how to combat these vulnerabilities to create a secured SaaS environment.
Deployment guide for Microsoft Office 2010 for IT professionals.Компания Робот Икс
This book contains information about how to install, configure, and upgrade to Microsoft Office 2010. The audience for this book includes IT generalists, IT operations, help desk and deployment staff, IT messaging administrators, consultants, and other IT professionals.
Business executive leadership, who typically drive the need for BI solutions, are primarily focused on the end user aspect of BI: OLAP reporting and dashboards. However it is vital for businesses to understand that ETL, Integration, Data Modeling, and Data Warehousing form the cornerstones of a successful BI solution. The time and energy spent on selecting an enterprise ETL solution along with designing finely tuned and highly performing ETL processes will ultimately produce “clean” data ready to be consumed by the business. Traditionally, ETL Tools have been extremely expensive, and some of them still are. While these tools have superb functionality and support, the question remains, “does every organization need all the functionality they provide or are there cheaper alternatives that would do the job just as well?”
La ofimática es posibilitada por una combinación entre hardware y software que permite crear, manipular, almacenar y transmitir digital mente la información que se necesita en una oficina para realizar las tareas cotidianas y alcanzar sus objetivos.
HAYEZ
Milano, Gallerie d’Italia - Piazza Scala
7 novembre 2015 – 21 febbraio 2016
Mostra a cura di Fernando Mazzocca
“Il genio democratico” di Hayez
Un grande pittore italiano interprete delle speranze e delle delusioni del Romanticismo
Estratti dal saggio in catalogo di Fernando Mazzocca
E l’opera sua è la Consacrazione della Vita…
(Giuseppe Mazzini, 1841)
Essay on Database
Database Essay
Different Types of Databases Essay
Database Systems Essay
Essay Database
Database Design Essay examples
Database Administrators
Database Research Essay
Essay on Database design process
Essay on Databases
Databases Essay
Make compliance fulfillment count doubleDirk Ortloff
This whitepaper gives an overview about the requirements and the approaches to
make your compliance initiative count double. Not only to fulfill compliance but to go
the next step bringing your documentation and knowledge handling to a stage where
future projects can learn from previous successes and mistakes. This will make your
R&D department ready for future challenges, faster markets and global
partnerships.
Keene Systems latest whitepaper release simplifies the process of planning a software project by comparing it with the phases of building a house. To simplify it even further, Keene also developed a clever infographic that visually walks the viewer through the 10 step process with a conversation between a construction worker and a programmer.
How 3 trends are shaping analytics and data management Abhishek Sood
Explore how 3 current trends are shaping modern data environments and learn about the impact of non-relational databases, big data, cloud data integration, self-service analytics, and more.
Data mashup means in the BI space and makes the business case from both the business-side and the IT-side for enabling this ultimate level of self-service – a unique capability of InetSoft's BI solutions. For more details visit: https://www.inetsoft.com/evaluate/whitepapers/
Top Three Data Modeling Tools Usability ComparsionErin
Today's CIOs must do much more than safeguard company data -- they must also understand it in the context of the business while continuously improving its overall quality. Several tools in today‘s environment contribute to that overall process, including the DBMS itself, front-end applications, ETL tools, back-end reporting tools and data modeling tools.
Top Three Data Modeling Tools Usability ComparsionErin
Today's CIOs must do much more than safeguard company data -- they must also understand it in the context of the business while continuously improving its overall quality. Several tools in today‘s environment contribute to that overall process, including the DBMS itself, front-end applications, ETL tools, back-end reporting tools and data modeling tools.
INTRODUCTION TO Database Management System (DBMS)Prof Ansari
shared collection of logically related data, designed to meet the information needs of multiple users in an organization. The term database is often erroneously referred to as a synonym for a “database management system DBMS)”. They are not equivalent and it will be explained in the next section.
Structure and characteristics of DBMS with graph.
The Functionality of DBMS and it's working methodology
and brief description of all components of DBMS.
3. 3
Summary
Why are databases not designed right from the start for end-user-access? Wouldn’t it be a
lot easier to implement query products if the data were built with the users in mind?
Could it be that both the production databases and the end-user databases could be the
same databases if the database design were for both? Or is there too much of a difference
between the requirements of the users and the requirements of the production systems?
I will examine each of these questions in detail to formulate a strategy on how and when
to design databases for end-user access. I will then introduce TENET’s existing business
intelligence system, together with the current enhancement project, which will be
completed on March 31st
, 1999. Subsequently, I will list what I believe to be major
deficiencies in both the existing system and the newly enhanced one. Finally, I will
outline my vision of a universal data warehouse architecture, using TENET as the
prototype.
4. 4
Introduction
The basic notion of production databases is to create atomic row designs with single fact
rows according to third normal form normalization. The sparse rows designed in this
process do not naturally lend themselves to end-user access. There are several reasons for
this: The more data is spread between multiple tables with various relationships, the less
intuitive is the meaning of the data. And the more tables involved in a database, the more
complex functions such as join are required to first put the data in the proper shape.
A simple demographic database for a fraternal association can be used as an example to
demonstrate this. See (Figures 1 and 2) for row layouts.
Figure 1. Fraternal Organization Database
Personal Residence Occupation
MEMB# RESCODE OCCODE
NAME RESNAME OCDESCRIPTION
RESCODE ADDRESS1 etc.
OCCODE ADDRESS2
Figure 2. Denormalized Fraternal Organization Database
PersonalInfo
MEMB#
NAME
RESCODE
RESNAME
ADDRESS1
ADDRESS2
OCCODE
OCDESCRIPTION
Assume that in this organization, one to 20 people live in the same residence owned by
the organization on behalf of the members. If the address information were kept in the
personnel rows, there would be database anomalies in the form of transitive
dependencies, since the address information depends on the residence not on the person.
So the natural solution is to split the personnel row into both a residence row and a
personnel row. Now, if we are storing the various occupations in which members may be
employed, many of the members can be employed in the same occupation, or the same
company. We may elect to segregate occupation information because it does not depend
on the person but on the occupation. In this case, we would have a personnel table, a
residence table, and an occupation table.
To print a report that shows all of the nurses in the organization with the occupation data
included along with the address of the employer and the member’s name and address, we
would need to join three tables together. Although joining such data is not very difficult
for a trained professional, it is not a talent the average user possesses. Consequently,
there would be an undue degree of difficulty for the user to effect database joins to
produce simple reports.
5. 5
Ideally, all of the data that a user desires should be in one row. In our demographic
database, is this possible? Yes, certainly, the data for residences and occupations can be
packed into the same row that contains the member’s name. However, this design is not
suitable for the production system since it would require non-normalized data. But this
design would be ideal for the end-user database because it allows for all data to be
queried without requiring complex functions such as database joins.
Is the solution someplace in-between? In this situation, there is not much that can be in-
between. There are multiple entities each demanding their own database table for
production purposes, yet each wanting to be combined into a larger entity to enable end-
user access. The potential relational database solution for this phenomenon is called a
view. A view is simply a prepackaged projection of a database providing a different look
than the physical data would normally dictate.
So, in the above scenario, a row containing all the required data can typically be shaped
and provide the desired view of the data for the end users. If this is so easy, then why is it
not always done? Let’s ask a hypothetical IS manager why this is not always done, since
it seems like it should be part of production system design.
Typical IS manager: “Well, that’s easy for you to say! My first responsibility around here
is to develop and maintain applications that provide major value for the organization.
What I work on is prioritized by the steering committee and there is no time left to be
worrying about what would be good for end-user access. Besides, if I start throwing join
logical tables all over my production database to support ad-hoc queries from end-users,
the maintenance of these indexes will slow down my production system… and that is
even if there never is one query actually run. Phew! If they actually run queries against
those joined tables; I’ve got an even bigger problem. How can I keep my less than one
second response time promise if I don’t even know who will be using the system. Sure.
I’d like to help, but there are too many opportunities for this thing to fail… and bring me
down with it!
In different words perhaps, but many IS managers would echo those sentiments exactly.
They have a difficult balancing act to perform. They are expected to provide high-quality
clerical function with great performance characteristics. They accept the challenge and do
their jobs on a daily basis. It is not that IS does not want to be part of the team and
provide the users with whatever they want. The very mission of IS is to please. But the
production mission and the user mission are at odds with each other. From the dialogue
above, we can see two fundamental problems in trying to design and implement
databases for production end-user access:
If end –user access were treated as its own application, or at a minimum was woven into
a production system’s set of requirements, it could be accommodated more readily.
End-User access is not treated as an application.
Accommodating end-user access can create major performance
problems with production users and query users.
6. 6
However, very few assimilate this into design thinking. Management never suggests that
we treat it as an application, so, by default, end user access is an afterthought. It is
assumed to be a by-product of application system design, not something that should be
the object of system design. And this is exactly what the end user gets… by-products.
Would we ever consider designing two databases, one for the production application and
one for end-user access? How could we? We just spent all of our database lives living by
one of the major reasons for database: avoiding data redundancy and duplicity. How
could we possibly conceive that the solution to any database problem would be to
duplicate the data or the design? Instead, we very efficiently label the production
database and move on to the next application. If the database design does not fit the end-
user’s requirements, that’s a problem for another day, a day that hopefully will never
come because we are now on to the next application.
But what if the steering committee includes end-user access as one of the requirements of
the application. What would we do? Where would we start? Would we just plan to build
the production database as in the past and declare the end-user job to be done when the
production system goes live? Or would we take our database design to third normal form
for production, and then work with the users to define the best possible views of the data
for ad-hoc queries. Hopefully we would do the latter!
But we do not have to wait for the steering committee to say that end-user access is a
requirement. It is our job to suggest new ways of doing things. We are the change agents.
We intuitively understand this. But heretofore, we have not taken the initiative to justify
the additional design and implementation time necessary to build the proper shaped data
for end users into our production designs. Consequently, we do not design and code the
join logical table rows to support end-user access. If these were part of the perceived real
application requirement set, we would factor their impact on time and performance, and
would propose different time projections and hardware requirements than that necessary
for only the production system.
The fact is that end-user access must be treated as a real application to get real results.
We cannot very well design an invoicing system and propose hardware that would
perform well enough to calculate all but the invoice total. This would be at best
incomplete, at worst useless. We wouldn’t be satisfied with something that almost did the
job! So also with end-user access. If we put nothing into planning for it and building the
proper structures to support it, then we can likewise expect to get nothing out of it.
It can be argued that the paradigm of the information age was brought on by a shift in
emphasis from the clerical benefit derived from an application to the information rewards
that can be gained by harnessing the power of all the information collected on behalf of
these clerical applications. We made the shift from data processing to management
information systems almost solely on the backs of programmers. The users demand
results and the results are delivered by MIS in the form of report programs.
7. 7
Now, the industry experts have theoretically carved out and differentiated some
potentially new paradigms, these being Decision Support Systems (DSS) and Executive
Support Systems (ESS). However, the world of information processing in reality has not
yet made the shift. The primary focus today is MIS with lip service to DSS and ESS.
End-user access is part of this lip service. Companies budget and buy tools to provide ad-
hoc information needed by knowledge workers. But there is rarely an end-user project
associated with the purchased tool. The purchase of the tool is an acknowledgement that
management is serious about a solution, but there is no associated funding for the proper
design or re-engineering of the application’s database. And MIS has not caught on to the
fact that end-user access must be treated as another application to succeed. Once we do,
then we will allocate the proper resources for its successful implementation, just as we
allocate the resources necessary for the production applications.
If we were to begin immediately to treat end-user access as another application, we
would be forced to devise some innovative ways of assessing its performance impact on
the production applications in concert with the end-user access application. Although this
would be a difficult task, such work could help determine what additional capacity and
power would be necessary to support end-user computing. This would have the double
benefit of providing a more accurate cost for this service, plus it would give management
the opportunity to vote yes or no without thinking that end-user access was free.
Moreover, if the additional hardware to support the users were installed, there would be
little reason for IS to be as concerned about the impact of the users on the system.
Analysts know that the mere presence of logical tables adds overhead to applications. We
also know that if there is a reasonable amount of queries against these logical views,
those views should be given immediate maintenance. But when we give views immediate
access maintenance, we also add a burden to each and every production transaction since,
in addition to doing its normal work, it must also carry the performance hit of access path
maintenance (index updating) for the end-user system. How much better are we therefore
to recognize end-user access as a valuable application of its own with an associated
system burden? In this way, we can have the horsepower necessary without the fear that
unfunded queries will be the undoing of an otherwise effective IS manager.
And so, if end-user access is treated as an application, and its potential performance
impact is factored into the decision to move forward, there are great prospects for
resounding success. If, on the other hand, end-user access continues to be allowed to be
the leftover potential of an under-designed production application, it will continue to
haunt IS management. Until someone, perhaps even a PC heritage person without the
outmoded beliefs that data normalization rules all, gladly will take over the reins. And
when this time comes, the DSS and ESS driven information paradigms will have begun
the shift.
Given the charge to produce well-designed tables and rows for end-users, certain design
criteria should be followed with the major objective to make it easier for the user.
8. 8
Designing Databases for End-User Access
Minimize number of separate tables and eliminate all multitable dependencies.
Going back to the demographic database issue presented at the beginning of this
document, it does not require a substantial amount of thought to conclude that it would be
easier to access data such as name, address, and occupation description from one row
rather than three. If a user faces three times the number necessary to do the job, his or her
productivity will be impacted by more than a factor of three. Instead of concentrating on
the data to be queried, a user joining tables must be concerned about how a join works,
and whether the product supports inner, outer, natural, or other join types. Who cares?
Not the user looking for data, that’s for sure! Let the MIS department worry about the
data, and let the user worry about getting information.
Make attributes similar to archival data.
End-user data is often a combination of master and transaction data. Typical transaction
table row layouts are sparse, at best. In the production system, once the master row has
been accessed, take the information and place it in the transaction row for better archival
information. Such end-user rows by definition must be designed to be comprehensive.
Design to first normal form.
Repeating groups are not conducive to production data nor are they conducive to end-
user access. Design the data for users to the first normal form, but do not go any further.
If the first normal form of the data can be achieved by pre-joining logicals, this is an
effective and easy way to test the validity of the row design without a major amount of
effort. Keep as much data in each row as possible without compromising the one-to-one
data element to key relationship. If logicals do not do the trick, a physical table can be
extracted periodically from all of the underlying production data sources to provide an
effective row layout for user queries. This also has the benefit of being a great performer.
Design with complete information.
Along the way to single fact rows, production databases are split and split and split again.
Related one-to-one attributes like customer data and balance data can be designed into
one row with no production loss.
Design completed rows.
There are two reasons to design completed rows. First, the objective of an end-user
database is to make it easier. Completed rows make it easier too. The second reason is to
enhance performance. It is better to capture the customer name into the transaction table
rather than access the customer master each time it must be retrieved. Also, calculations,
such as extended price, can be performed once during production processing, and the
results can be stored in completed rows rather than performing the calculations each time
the row is read. Besides helping the end user access data, this approach also helps the
production system run more efficiently.
9. 9
Capture point-in-time data.
If a piece of data, such as the price in a transaction row, is dependent on the price in the
master row, the price we pay today will be reported as a different price in the future as
the price in the master row changes. This design suggestion is related to completed rows
above. But the intent of this is to assure the accuracy of data through time. When a
transaction occurs involving price and/or discount, it is good systems design to capture
the point-in-time values for price and discount, rather than rely on the master row, or a
calculation to provide such data. This assures the constancy of data.
Avoid ubiquitous codes.
Poorly codified data is another reason why normal users find production data difficult to
use. When time is spent developing self-evident codes such as M for male and F for
female, the user’s job is simplified. Contrast this to the design that codes male as a10 and
female as an 11. The more intuitive the coding structure, the easier it will be for users to
access and select data in meaningful ways.
Give columns meaningful names and descriptions.
One of the advantages of column names I learned was that you could call a column
anything. If you chose to call the address column COW3, and you knew what COW3
stood for, you were golden … and it would work. Don’t forget about security. It offered
security since nobody could guess what COW3 or SEGGH meant in a million years. (I
guess that was job security.) Just as we want to pick meaningful codes for the contents of
our columns, we want to pick meaningful names and descriptions for our columns. End
users like to know what the data elements are in a table. It does not cost much more to
use a good column heading or some nice text to help a user understand the intent and
purpose of a column.
Expand codes to meaningful text.
In joined rows or in the building of new physical rows to support end users, take the
codes and create a code table or table. When doing the join for the user, also join to the
code tables. In this manner, through the joined logical table or through an extraction, the
description of the codes can also be included in the user’s row layout. In the earlier
example, the occupation code in our demographic table could be expanded through a join
or an extraction to also include the description of the occupational code. This gives users
more meaningful data with much less work than performing the joins themselves. If the
code tables do not exist, build them. They are worth the investment both for query and for
further documenting the production system. I would also suggest leaving the codes in the
row design for narrow report queries and deeper analysis.
Of course, we should always use the basic principles of good system design, which
suggest that we start with requirements first. Since end-user requirements are always in
the form of report and display outputs, we use this as a staring point to assure that our
well-designed, first normal form rows provide the information for the end users in a form
they can easily use. In most end-user design projects, however, we are not alone. There is
already a production system in place that maintains the data our users wish to access. The
next part expands the design techniques we have discussed to apply to the most common
databases of all: existing databases.
10. 10
TENET Business Intelligence
TENET’s business intelligence is obtained from the data of the fifty-nine PBAR and
thirty-three PATCOM hospitals it owns. This data is contained in a combination of
normalized and denormalized tables residing in one hundred fifty-one collections. These
collections are subdivided between PBAR and PATCOM as follows.
PBAR is represented by one hundred eighteen collections. Fifty-nine containing Online
Transaction Processing (OLTP) databases, and fifty-nine containing primary repository
databases, distributed over seven production AS/400s.
PATCOM is represented by thirty-three collections containing both OLTP and primary
repository databases on one non-production AS/400.
In addition to this, we have a consolidated PBAR version, obtained by merging the fifty-
nine OLTP collections, at the table level, into a unique collection residing on a separate
non-production AS/400. To summarize, each collection is equivalent to a data warehouse
therefore TENET’s business intelligence is composed of one hundred fifty-two data
warehouses across nine systems. End-user analysis against these data warehouses is made
possible by four distinct applications. Being familiar with the adage that a picture is
worth a thousand words, I will breakdown the architecture into the following levels of
ever increasing visual detail.
PBAR & PATCOM
Enterprise Level
PBAR
System Level
Data Warehouse Level
PBAR Consolidated Version
System Level
Data Warehouse Level
PATCOM
Interfacing Applications
Showcase Vista
CASEMIX Reports
PQS
Cost Accounting
PATCOM will be detailed in the “Current Enhancement Project” portion of this document
12. 12
PBAR
System Level
The following depicts the AS/400 (DHFB), system. Because the architecture at the
system and warehouse level is identical for all PBAR hospitals, any PBAR AS/400 could
have been chosen to represent the following.
The following table lists the fourteen warehouses and the seven hospitals they represent,
residing on AS/400 (DHFB) with their respective storage requirements in bytes.
Figure 3. AS/400 DHFB data warehouses
Hospital Collection Size in Bytes Purpose
Trinity DATRI 952,636,024 Normalized end-user access with joins (OLTP)
DATRICDD 239,185,120 Denormalized end-user access (Primary Repository)
Memorial DADED 767,639,552 Normalized end-user access with joins
DADEDCDD 133,165,056 Denormalized end-user access
Doctor’s DADHF 1,165,406,208 Normalized end-user access with joins
DADHFCDD 190,091,264 Denormalized end-user access
Harton DAHAR 741,367,808 Normalized end-user access with joins
DAHARCDD 150,687,744 Denormalized end-user access
Methodist DAJON 440,741,888 Normalized end-user access with joins
DAJONCDD 92,037,120 Denormalized end-user access
Medical Center DAMAH 221,548,544 Normalized end-user access with joins
DAMAHCDD 51,011,584 Denormalized end-user access
University DAUNV 1,177,239,552 Normalized end-user access with joins
DAUNVCDD 229,093,376 Denormalized end-user access
DHFB
7 hospitals
7
OLTP
Interfacing
Applications
7
PR
13. 13
Warehouse Level
Figure 4 lists the tables that make up the OLTP warehouse representing Trinity hospital.
Figure 4.
Object Type Collection Attribute Text
ABSTRACT *TABLE DATRI PF DA: Patient Abstract table.
ACTIVITY *TABLE DATRI PF CA: Activity Master
ACTIVJOIN1 *JOIN DATRI LF DA: VISIT/CHARGES/ACTIVITY
APRDESC *TABLE DATRI PF APRDRG Description Table
BROKER S *TABLE DATRI PF DA: Broker Table Table
CDMDESC *TABLE DATRI PF DA: CDM description table
CHARGES *TABLE DATRI PF DA: Patient Charges
CLINIC *TABLE DATRI PF DA: Clinic Code Table
CLINSPTY *TABLE DATRI PF DA: CMM Clinical Specialty
CMMPAYORS *TABLE DATRI PF DA: CMM Payor Group
COSTCTR *TABLE DATRI PF DA: Cost center name
CPT4SURG *TABLE DATRI PF DA: Patient Surgical CPT4
DEMOG *TABLE DATRI PF DA: Patient Demographics
DIAGDESC *TABLE DATRI PF DA: Diagnosis description
DIAGL1 *VIEW DATRI LF DA: Patient Diagnosis by Di
DRGDESC *TABLE DATRI PF DA: DRG Descriptions Table
DRGWR *TABLE DATRI PF DA: DRG Weight & Rate Table
EDLOG *TABLE DATRI PF DA: Emergency Department Lo
FINSUM *TABLE DATRI PF DA: Patient Visit Financial
FUR *TABLE DATRI PF DA: Patient notes detail.
ICD9DIAG *TABLE DATRI PF DA: Patient Diagnosis
ICD9PROC *TABLE DATRI PF DA: Patient Procedure
MDCDESC *TABLE DATRI PF MDC DescriptionTable
MDTABLE *TABLE DATRI PF DA: Physician Table
MDTABLL1 *VIEW DATRI LF DA: Physican Group Code
NC2625P *TABLE DATRI PF MaCS: Work table for program
NONSTFMD *TABLE DATRI PF DA: Patient Physician (Non-
PATDIAG *TABLE DATRI PF DA: All patient diagnosis c
PATINS *TABLE DATRI PF DA: Patient Insurance
PATINSL1 *VIEW DATRI LF DA: Patient Insurance by Pl
PATMDS *TABLE DATRI PF DA: All Patient Physicians
PATPHYS *TABLE DATRI PF DA: Patient Physician
PATPROC *TABLE DATRI PF DA: All patient procedure c
PATTYPE *TABLE DATRI PF DA: Patient type table table
PAYCDDES *TABLE DATRI PF DA: CMM Payor Code Descript
PAYGPDES *TABLE DATRI PF DA: CMM Payor Group Descrip
PAYMENT *TABLE DATRI PF DA: Patient Account Payment
PHYSL1 *VIEW DATRI LF DA: Patient Physician by Ph
PROCDESC *TABLE DATRI PF DA: Procedure description
PROCL1 *VIEW DATRI LF DA: Patient Procedure by Pr
REHABGEN *TABLE DATRI PF DA: Rehab General
REHABREF *TABLE DATRI PF DA: Rehab Referring Facilit
14. 14
Figure 4. continued.
Object Type Collection Attribute Text
REHABTRN *TABLE DATRI PF DA: Rehab Transferring Facility
VISIT *TABLE DATRI PF DA: Patient Visit
VISITJOIN1 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS/CHA
VISITJOIN2 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS
VISITJOIN3 *JOIN DATRI LF DA: VISIT/PATPHYS/ICD9DIAG
VISITJOIN4 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS
VISITJOIN5 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN6 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN7 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN8 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS/CHA
VISITL1 *VIEW DATRI LF DA: Patient Visit by DRG
VISIT2 *VIEW DATRI LF DA: Patient Visit in Discharge
VISIT3 *VIEW DATRI LF DA: Patient Visit in MRC#
Figure 5 lists the tables that make up the primary repository warehouse representing
Trinity hospital.
Figure 5.
Object Type Collection Attribute Text
PATACTV *TABLE DATRICDD PF CDD: Patient Information Active
PATFULL *TABLE DATRICDD PF CDD: Patient Information Full
PATLIMT *TABLE DATRICDD PF CDD: Patient Information Limited
PATROOM *TABLE DATRICDD PF CDD: Patient Info Room
.
15. 15
PBAR CONSOLIDATED VERSION
System Level
The following depicts the AS/400 (HDCA) system.
The following table lists the consolidated PBAR warehouse representing fifty-nine
hospitals, residing on AS/400 (HDCA) with its storage requirement in bytes. The primary
repositories have not been consolidated, nor do they exist on this system. And there is no
plan to do so that I am aware of.
Figure 6. AS/400 HDCA consolidated data warehouse
Hospital Collection Size in Bytes Purpose
PBAR DACONS 22,123,130,880 Normalized end-user access with joins (OLTP)
HDCA
59 hospitals
1
OLTP
Interfacing
Applications
16. 16
Warehouse Level
Figure 7 lists the tables that make up the consolidated OLTP warehouse representing
fifty-nine hospitals.
Figure 7.
Object Type Collection Attribute Text
ABSTRACT *TABLE DACONS PF DA: Patient Abstract table.
ACTIVITY *TABLE DACONS PF CA: Activity Master
ACTIVJOIN1 *JOIN DACONS LF DA: VISIT/CHARGES/ACTIVITY
APRDESC *TABLE DACONS PF APRDRG Description Table
BROKER S *TABLE DACONS PF DA: Broker Table Table
CDMDESC *TABLE DACONS PF DA: CDM description table
CHARGES *TABLE DACONS PF DA: Patient Charges
CLINIC *TABLE DACONS PF DA: Clinic Code Table
CLINSPTY *TABLE DACONS PF DA: CMM Clinical Specialty
CMMPAYORS *TABLE DACONS PF DA: CMM Payor Group
COSTCTR *TABLE DACONS PF DA: Cost center name
CPT4SURG *TABLE DACONS PF DA: Patient Surgical CPT4
DEMOG *TABLE DACONS PF DA: Patient Demographics
DIAGDESC *TABLE DACONS PF DA: Diagnosis description
DIAGL1 *VIEW DACONS LF DA: Patient Diagnosis by Di
DRGDESC *TABLE DACONS PF DA: DRG Descriptions Table
DRGWR *TABLE DACONS PF DA: DRG Weight & Rate Table
EDLOG *TABLE DACONS PF DA: Emergency Department Lo
FINSUM *TABLE DACONS PF DA: Patient Visit Financial
FUR *TABLE DACONS PF DA: Patient notes detail.
ICD9DIAG *TABLE DACONS PF DA: Patient Diagnosis
ICD9PROC *TABLE DACONS PF DA: Patient Procedure
MDCDESC *TABLE DACONS PF MDC DescriptionTable
MDTABLE *TABLE DACONS PF DA: Physician Table
MDTABLL1 *VIEW DACONS LF DA: Physican Group Code
NC2625P *TABLE DACONS PF MaCS: Work table for program
NONSTFMD *TABLE DACONS PF DA: Patient Physician (Non-
PATDIAG *TABLE DACONS PF DA: All patient diagnosis c
PATINS *TABLE DACONS PF DA: Patient Insurance
PATINSL1 *VIEW DACONS LF DA: Patient Insurance by Pl
PATMDS *TABLE DACONS PF DA: All Patient Physicians
PATPHYS *TABLE DACONS PF DA: Patient Physician
PATPROC *TABLE DACONS PF DA: All patient procedure c
PATTYPE *TABLE DACONS PF DA: Patient type table table
PAYCDDES *TABLE DACONS PF DA: CMM Payor Code Descript
PAYGPDES *TABLE DACONS PF DA: CMM Payor Group Descrip
PAYMENT *TABLE DACONS PF DA: Patient Account Payment
PHYSL1 *VIEW DACONS LF DA: Patient Physician by Ph
PROCDESC *TABLE DACONS PF DA: Procedure description
PROCL1 *VIEW DACONS LF DA: Patient Procedure by Pr
REHABGEN *TABLE DACONS PF DA: Rehab General
REHABREF *TABLE DACONS PF DA: Rehab Referring Facilit
17. 17
Figure 7, continued.
Object Type Collection Attribute Text
REHABTRN *TABLE DACONS PF DA: Rehab Transferring Facility
VISIT *TABLE DACONS PF DA: Patient Visit
VISITJOIN1 *JOIN DACONS LF DA: VISIT/FINSUM/PATINS/CHA
VISITJOIN2 *JOIN DACONS LF DA: VISIT/FINSUM/PATINS
VISITJOIN3 *JOIN DACONS LF DA: VISIT/PATPHYS/ICD9DIAG
VISITJOIN4 *JOIN DACONS LF DA: VISIT/FINSUM/PATINS
VISITJOIN5 *JOIN DACONS LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN6 *JOIN DACONS LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN7 *JOIN DACONS LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN8 *JOIN DACONS LF DA: VISIT/FINSUM/PATINS/CHA
VISITL1 *VIEW DACONS LF DA: Patient Visit by DRG
VISIT2 *VIEW DACONS LF DA: Patient Visit in Discharge
VISIT3 *VIEW DACONS LF DA: Patient Visit in MRC#
18. 18
Interfacing Applications
The following applications currently interface with the TENET data warehouse and are
supported by us:
Showcase Vista
Third party PC based end-user analysis tool.
CASEMIX Reports
Homegrown menu driven reporting system, if users require modifications of existing
reports, programmer intervention is required. Ranging from modifying an existing one to
creating one from scratch.
PQS System
Homegrown menu driven reporting system that allows users to play out “what if
scenarios”.
Cost Accounting
Homegrown menu driven reporting system for cost accounting purposes.
None of these systems actually modify the data residing in the individual collections that
make up TENET’s business Intelligence. This creates the opportunity for us to re-design
the current data warehouse architecture to better take advantage of the hardware,
homegrown applications, PC based end-user tools, and to create more powerful
applications.
The re-design is addressed with in-depth detail in the “Proposed Project” portion of this
document.
19. 19
Current Enhancement Project
Due to the negative impact on production response times brought about by the PBAR
warehouses residing on production systems, an enhancement project is currently being
undertaken to move the PBAR warehouses off their current production systems, and on to
the non-production PATCOM system.
Before we proceed any further, I would like to state for the record, that I was not
involved in any way, in any and all phases of this project.
Because PATCOM’s primary repository and OLTP databases are located within the same
collection, PBAR will be modified accordingly to create a consistent architecture. The
PBAR primary repository databases will be placed into their respective OLTP
collections, and the primary repository collections will be removed. Consequently the
enterprise wide network will be reduced to two non-production AS/400s, the number of
warehouses will be reduced to ninety-three, and the consolidated warehouse will remain
as is. Confusing? Don’t worry, visual aid is a page away.
PATCOM
Enterprise Level
System Level
Data Warehouse Level
Estimates
21. 21
PATCOM
System Level
The following depicts the AS/400 (DAAC) system.
For the sake of keeping it short the following table lists the same sample of seven
hospitals depicted earlier at the PBAR system level. The increased size reflects the
inclusion of the primary repository databases in the OLTP collections.
Figure 8. AS/400 DAAC merged data warehouses
Hospital Collection Size in Bytes Purpose
Trinity DATRI 1,191,821,144 Normalized and denormalized end-user access with joins
Memorial DADED 823,873,536 Normalized and denormalized end-user access with joins
Doctor’s DADHF 1,216,704,512 Normalized and denormalized end-user access with joins
Harton DAHAR 818,868,224 Normalized and denormalized end-user access with joins
Methodist DAJON 527,355,904 Normalized and denormalized end-user access with joins
Medical Center DAMAH 244,506,624 Normalized and denormalized end-user access with joins
University DAUNV 1,278,439,424 Normalized and denormalized end-user access with joins
DAAC
92 hospitals
92
OLTP/PR
Interfacing
Applications
22. 22
Warehouse Level
Figure 9 lists the tables that make up of the OLTP warehouse representing Trinity
hospital, which now includes the merged primary repository tables.
Figure 9.
Object Type Collection Attribute Text
ABSTRACT *TABLE DATRI PF DA: Patient Abstract table.
ACTIVITY *TABLE DATRI PF CA: Activity Master
ACTIVJOIN1 *JOIN DATRI LF DA: VISIT/CHARGES/ACTIVITY
APRDESC *TABLE DATRI PF APRDRG Description Table
BROKER S *TABLE DATRI PF DA: Broker Table Table
CDMDESC *TABLE DATRI PF DA: CDM description table
CHARGES *TABLE DATRI PF DA: Patient Charges
CLINIC *TABLE DATRI PF DA: Clinic Code Table
CLINSPTY *TABLE DATRI PF DA: CMM Clinical Specialty
CMMPAYOR *TABLE DATRI PF DA: CMM Payor Group
COSTCTR *TABLE DATRI PF DA: Cost center name
CPT4SURG *TABLE DATRI PF DA: Patient Surgical CPT4
DEMOG *TABLE DATRI PF DA: Patient Demographics
DIAGDESC *TABLE DATRI PF DA: Diagnosis description
DIAGL1 *VIEW DATRI LF DA: Patient Diagnosis by Di
DRGDESC *TABLE DATRI PF DA: DRG Descriptions Table
DRGWR *TABLE DATRI PF DA: DRG Weight & Rate Table
EDLOG *TABLE DATRI PF DA: Emergency Department Lo
FINSUM *TABLE DATRI PF DA: Patient Visit Financial
FUR *TABLE DATRI PF DA: Patient notes detail.
ICD9DIAG *TABLE DATRI PF DA: Patient Diagnosis
ICD9PROC *TABLE DATRI PF DA: Patient Procedure
MDCDESC *TABLE DATRI PF MDC DescriptionTable
MDTABLE *TABLE DATRI PF DA: Physician Table
MDTABLL1 *VIEW DATRI LF DA: Physican Group Code
NC2625P *TABLE DATRI PF MaCS: Work table for program
NONSTFMD *TABLE DATRI PF DA: Patient Physician (Non-
PATDIAG *TABLE DATRI PF DA: All patient diagnosis c
PATINS *TABLE DATRI PF DA: Patient Insurance
PATINSL1 *VIEW DATRI LF DA: Patient Insurance by Pl
PATMDS *TABLE DATRI PF DA: All Patient Physicians
PATPHYS *TABLE DATRI PF DA: Patient Physician
PATPROC *TABLE DATRI PF DA: All patient procedure c
PATTYPE *TABLE DATRI PF DA: Patient type table table
PAYCDDES *TABLE DATRI PF DA: CMM Payor Code Descript
PAYGPDES *TABLE DATRI PF DA: CMM Payor Group Descrip
PAYMENT *TABLE DATRI PF DA: Patient Account Payment
PHYSL1 *VIEW DATRI LF DA: Patient Physician by Ph
PROCDESC *TABLE DATRI PF DA: Procedure description
PROCL1 *VIEW DATRI LF DA: Patient Procedure by Pr
REHABGEN *TABLE DATRI PF DA: Rehab General
REHABREF *TABLE DATRI PF DA: Rehab Referring Facilit
23. 23
Figure 9. continued.
Object Type Collection Attribute Text
REHABTRN *TABLE DATRI PF DA: Rehab Transferring Facility
VISIT *TABLE DATRI PF DA: Patient Visit
VISITJOIN1 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS/CHA
VISITJOIN2 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS
VISITJOIN3 *JOIN DATRI LF DA: VISIT/PATPHYS/ICD9DIAG
VISITJOIN4 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS
VISITJOIN5 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN6 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN7 *JOIN DATRI LF DA: VISIT/DEMOG/FINSUM/PAT
VISITJOIN8 *JOIN DATRI LF DA: VISIT/FINSUM/PATINS/CHA
VISITL1 *VIEW DATRI LF DA: Patient Visit by DRG
VISIT2 *VIEW DATRI LF DA: Patient Visit in Discharge
VISIT3 *VIEW DATRI LF DA: Patient Visit in MRC#
PATACTV *TABLE DATRI PF CDD: Patient Information Active
PATFULL *TABLE DATRI PF CDD: Patient Information Full
PATLIMT *TABLE DATRI PF CDD: Patient Information Limited
PATROOM *TABLE DATRI PF CDD: Patient Info Room
1
1
Included primary repository tables
24. 24
Proposed Project
To create a data warehouse architecture that is independent of the platform upon which it
resides, and that takes advantage of the hardware to the utmost.
The platform independent architecture can be achieved by creating collections; base
tables, views, and indexes using Structured Query Language (SQL). Taking advantage of
the hardware is relative to the hardware platform itself. In light of this, I will list a few
highlights regarding AS/400 hardware, and remand the detail to a future document.
SQL is an industry-standard language for defining and manipulating data contained in a
relational database. An IBM research lab developed SQL in the 1970s to explore an
implementation of the relational database model. Since that time, SQL has become a
widely used language that’s included in most relational Database Management Systems
(DBMS), including IBM’s family of DB2 products. Several national and international
standards organizations have published SQL standards, which the major relational DBMS
(including DB2/400) follow for their versions of SQL.
Two advantages come to mind when discussing SQL based architectures, one, is the fact
that with relatively few modifications, it can be transferred to other platforms. Two, it
defines views that allow you to query the structure of the database. What this means is,
that we can use of the shelf packages such as MS Access to automatically draw up a
database map, showing such information as primary keys, indexes, referential integrity,
and referential constraints.
Where the AS/400 is concerned, I truly believe it is the platform of choice for the
following reasons, (remember I promised to keep it short here). The main strengths are
in two areas. The first area is scalability. It continues to be the only database in the
industry that is fully 64-bit enabled. When you combine that with the new hardware that
is about to ship, especially when we look at main memory sizes, it has a tremendous
competitive advantage over the other people in the industry, especially in a data
warehouse environment. Today the machines come with 40 GB of main memory. In the
next 2-3 years, those machines will ship with half a terabyte of main memory. And that’s
something that’s possible only through 64-bit technology.
The other major area where there is a competitive advantage is ease of use and
administration. It’s fairly common knowledge that there aren’t many AS/400 installations
that have a database administrator. They just don’t require one. A lot of activities that a
normal database administrator would go through just aren’t done on the AS/400. You
manage users from a system perspective, not a database perspective. The majority of
things that you’d normally do as a DBA are fully automated and optimized by the system.
Let us begin by taking a look at some growth estimates and storage requirements.
25. 25
Growth Estimates
Extrapolations
Average Per Hospital
Figure 10 depicts a representative sample of seven of the ninety-two hospitals that make
up the TENET data warehouse and their size in bytes as of 3/1/1999. Currently we store
four years worth of data on-line, for each hospital.
Figure 10. Sampling of seven hospitals to extrapolate average hospital size.
Hospital Collection Size in Bytes Purpose
Trinity DATRI 1,191,821,144 Normalized and denormalized end-user access with joins
Memorial DADED 823,873,536 Normalized and denormalized end-user access with joins
Doctor’s DADHF 1,216,704,512 Normalized and denormalized end-user access with joins
Harton DAHAR 818,868,224 Normalized and denormalized end-user access with joins
Methodist DAJON 527,355,904 Normalized and denormalized end-user access with joins
Medical Center DAMAH 244,506,624 Normalized and denormalized end-user access with joins
University DAUNV 1,278,439,424 Normalized and denormalized end-user access with joins
Total 6,101,569,368 Sum of sampled hospitals.
Figure 11 extrapolates the average hospital size based on the seven sample hospitals.
Figure 11. Average hospital size, calculation table.
Calculation Value Result/Description
6,101,569,368 Sum of sampled hospitals.
Divide by 7 Number of sampled hospitals.
Equals 871,652,767 Extrapolated average of 870 Megs per hospital.
Average Transaction Volume Per Hospital
Figure 12 extrapolates the percentage increase per month from transaction volume, for
the period 3/1/1999 to 4/1/1999, for Trinity hospital. This percentage will be used as a
median to calculate the yearly growth of TENET’s data warehouse from transaction
volume.
Figure 12. Trinity data warehouse monthly transaction volume increase, calculation table.
Calculation Value Result/Description
1,228,697,600 New Trinity Hospital size as of 4/1/1999.
Subtract 1,191,821,144 Old Trinity Hospital size as of 3/1/1999.
Equals 36,876,456 37 Megs increase per month from transaction volume.
Divide by 1,191,821,144 Old Trinity Hospital Size.
Equals .031 Extrapolated average of 3.1% increase per month per hospital.
26. 26
Current
Size as of 3/1/1999
Figure 13 estimates the size of the TENET data warehouse as of 3/1/1999.
Figure 13. TENET data warehouse size, calculation table.
Calculation Value Result/Description
871,652,767 Estimated average of 870 Megs per hospital.
Multiply by 92 Number of TENET hospitals.
Equals 80,192,054,564 Estimated 80 Gigs TENET data warehouse size, as of 3/1/1999.
Estimated
Critical success factors for estimating growth are:
Additional Hospitals
Figure 14 estimates the yearly growth of the TENET data warehouse from additional
hospitals. Currently we are adding 12 hospitals a year.
Figure 14. TENET data warehouse yearly additional hospitals size increase, calculation table.
Calculation Value Result/Description
871,652,767 Average of 870 Megs per hospital.
Multiply by 12 Number of additional hospitals per year.
Equals 10,459,833,204 Estimated 10.5 Gigs increase from additional hospitals per year.
Transaction Volume
Figure 15 estimates the yearly growth of the TENET data warehouse from the transaction
volume, using the median calculated in figure 12.
Figure 15. TENET data warehouse yearly transaction volume size increase, calculation table.
Calculation Value Result/Description
80,192,054,564 Estimated 80 Gigs TENET data warehouse size, as of 3/1/1999.
Multiply .031 Average of 3.1% increase per month per hospital.
Equals 2,485,953,691 Estimated 2.5 Gigs increase per month from transaction volume.
Multiply 12 Months in a year.
Equals 29,831,444,298 Estimated 30 Gigs increase per year from transaction volume.
27. 27
Historical
The current warehouse holds four years worth of data on-line. Say we want to hold ten
years worth. Can you imagine the potential of ten years worth of data on-line? What
would it take storage space wise to achieve this goal?
Ten Years Worth Applied to Current Size.
Figure 16. TENET data warehouse current historical size, calculation table.
Calculation Value Result/Description
80,192,054,564 Estimated 80 Gigs TENET data warehouse size, as of 3/1/1999.
Multiply 2.5 Additional 6 years worth.
Equals 200,480,136,410 Estimated 200 Gigs total, for proposed 10 years worth of data.
28. 28
Storage Requirements
Considering the estimated yearly growth of the TENET data warehouse, let us determine
the duration of the currently available storage on the AS/400 DAAC upon which it
resides, for both four years worth and ten years worth of data.
Average Yearly Growth
Figure 17 estimates the current yearly growth.
Figure 17. TENET data warehouse total yearly size increase, calculation table.
Calculation Value Result/Description
10,459,833,204 Estimated 10.5 Gigs increase from additional hospitals per year.
Add 29,831,444,298 Estimated 30 Gigs increase per year from transaction volume.
Equals 40,291,277,502 Estimated 40 Gigs increase per year total.
Figure 18. AS/400 DAAC duration of current storage with four years worth of data, calculation table.
Calculation Value Result/Description
390,000,000,000 390 Gigs, current AS/400 DAAC size.
Subtract 80,192,054,564 Estimated 80 Gigs for current 4 years worth of data.
Equals 309,807,945,436 Estimated 310 Gigs available storage size.
Divide by 40,291,277,502 Estimated 40 Gigs increase per year total.
Equals 7.5 Estimated 7.5 years duration for current storage with 4 years of data.
Figure 19. AS/400 DAAC duration of current storage with ten years worth of data, calculation table.
Calculation Value Result/Description
390,000,000,000 390 Gigs, current AS/400 DAAC size.
Subtract 200,480,136,410 Estimated 200 Gigs total, for proposed 10 years worth of data.
Equals 189,519,863,590 Estimated 190 Gigs available storage size.
Divide by 40,291,277,502 Estimated 40 Gigs increase per year total.
Equals 5 Estimated 5 years duration for current storage with 10 years of data.
29. 29
Methodology
To briefly recap, the current architecture consists of ninety-three data warehouses
distributed across two, non-production AS/400s. These should be reduced to one data
warehouse on one AS/400.
Proof of Concept
What follows is a partial prototype involving four small transactional type tables from the
Trinity hospital collection. The end result is a single table containing the data previously
stored in four different ones. You may call this the denormalization of normalcy. As you
go through the technicalities of the prototype you will come across some of the
previously discussed transformation procedures. Specifically data aggregation, data
standardization, and data cleansing. Figure 21 details four transaction-oriented tables.
Figure 22 details one end-user-oriented table.
GREEN: Identifies key columns, which combined, uniquely identify the row.
Figure 21. Trinity Hospital Normalized Tables
Patient Diagnosis Clinic Code Table Patient Procedure Patient Surgical
PATACCT# PATACCT# PATACCT# PATACCT#
SEQUENCE# SEQUENCE# SEQUENCE# SEQUENCE#
DIAGCODE CLINICCODE PROCCODE CPT4CODE
DIAGMODI DATELSTCHG PROCMODI CPT4MODI
DATELSTCHG HOSPITAL PROCDATE CPT4MODI2
HOSPITAL DATELSTCHG CPT4DATE
HOSPITAL DATELSTCHG
HOSPITAL
Figure 22. Trinity Hospital Denormalized Table
Clinical Data
PATACCT#
SEQUENCE#
HOSPITAL
DATELSTCHG
DIAGCODE
DIAGMODI
CLINICCODE
PROCCODE
PROCMODI
PROCDATE
CPT4CODE
CPT4MODI
CPT4MODI2
CPT4DATE
30. 30
Figure 23 reproduces rows from the normalized tables. Figure 24 reproduces rows from
the denormalized table.
Figure 23. Trinity Hospital Normalized Tables Rows
Patient Diagnosis
Patient Account
Number
Diag Seq
Num
Diagnosis
Code
Diagnosis
Modifier
Last
Change Date
Hospital
Code
4307914 1 53510 19960619 TRI
4307914 2 4019 19960619 TRI
4307914 3 56984 19960619 TRI
4307914 4 5303 19960619 TRI
4307914 5 04186 19960619 TRI
4307914 16 5781 19960618 TRI
Clinic Code Table
Patient Account
Number
Clinic Seq
Num
Clinic
Code
Last Change
Date
Hospital
Code
4307914 1 SO 19960619 TRI
Patient Procedure
Patient Account
Number
Proc Seq
Num
Proc
Code
Proc
Mod
Procedure
Date
Last Change
Date
Hospital
Code
4307914 1 4516 19960614 19960619 TRI
4307914 2 4523 19960614 19960619 TRI
Patient Surgical
Patient Account
Number
CPT4 Seq
Num
CPT4
Code
CPT4
Mod.
CPT4
Modifier 2
CPT4
Date
Last Change
Date
Hospital
Code
4307914 1 43239 19960614 19960619 TRI
4307914 2 45378 19960614 19960619 TRI
Figure 24. Trinity Hospital Denormalized Table Rows
Clinical
Patact# Seq Hos
Cod
Change
Date
Diag
Code
Diag
Mod
Clinic
Code
Proc
Cod
Proc
Mod
Procedure
Date
Cpt4
Code
Cpt4
Mod
Cpt4
Mod2
Cpt4 Date
4307914 1 TRI 1996-06-19 53510 SO 4516 1996-06-14 43239 1996-06-14
4307914 2 TRI 1996-06-19 4019 4523 1996-06-14 45378 1996-06-14
4307914 3 TRI 1996-06-19 56984 0001-01-01 0001-01-01
4307914 4 TRI 1996-06-19 5303 0001-01-01 0001-01-01
4307914 5 TRI 1996-06-19 04186 0001-01-01 0001-01-01
4307914 16 TRI 1996-06-18 5781 0001-01-01 0001-01-01
As you can see from the preceding layout, the retrieval of all of the clinical data
regarding patient 4307914, against the Trinity Hospital normalized tables, requires eleven
distinct disk accesses. Whereas the Trinity Hospital denormalized table, requires only six
distinct accesses. In addition to this, the denormalized version allows row blocking to
gather all six rows in main memory at once reducing the disk accesses to one. This is not
possible in the normalized version due to the random access algorithms necessary to
retrieve rows from multiple tables.
31. 31
If we leave the architecture as is, we have yes, achieved an improvement in access times
and as you will see later a storage saving. But it still leaves us with ninety-two data
warehouses. Time for a quick recap if you will. The tables’ architecture is identical for all
hospitals therefore we can consolidate the like tables into one table. And if we can do it
for all tables, as is the case, we can reduce ninety-three warehouses into one like so.
Since I cannot reproduce all ninety-two hospitals and expect you to keep your sanity, I
have chosen two, to demonstrate what the layout looks like.
Figure 25. Trinity and Alvarado Hospitals Denormalized Table Rows
Clinical
Patact# Seq Hos
Cod
Change
Date
Diag
Code
Diag
Mod
Clinic
Code
Proc
Cod
Proc
Mod
Procedure
Date
Cpt4
Code
Cpt4
Mod
Cpt4
Mod2
Cpt4 Date
4307914 1 TRI 1996-06-19 53510 SO 4516 1996-06-14 43239 1996-06-14
4307914 2 TRI 1996-06-19 4019 4523 1996-06-14 45378 1996-06-14
4307914 3 TRI 1996-06-19 56984 0001-01-01 0001-01-01
4307914 4 TRI 1996-06-19 5303 0001-01-01 0001-01-01
4307914 5 TRI 1996-06-19 04186 0001-01-01 0001-01-01
4307914 16 TRI 1996-06-18 5781 0001-01-01 0001-01-01
4307914 1 ALV 1996-06-19 53510 SO 4516 1996-06-14 43239 1996-06-14
4307914 2 ALV 1996-06-19 4019 4523 1996-06-14 45378 1996-06-14
4307914 3 ALV 1996-06-19 56984 0001-01-01 0001-01-01
4307914 4 ALV 1996-06-19 5303 0001-01-01 0001-01-01
4307914 5 ALV 1996-06-19 04186 0001-01-01 0001-01-01
4307914 16 ALV 1996-06-18 5781 0001-01-01 0001-01-01
Notice the additional key field, Hospital Code. This is necessary to maintain each row’s
uniqueness in the consolidated data warehouse, in the remote eventuality that identical
Patient Numbers are used in different hospitals, and to be able to distinguish between
hospitals for queries.
If you look closely at the data you will notice the date values represented with dashes and
what in the world is 0001-01-01? Introducing the ‘L’ date data type. This attribute
enforces data integrity by allowing only valid dates. Since a string of blanks or zeroes is
not a valid date, the system automatically defaults to the earliest date. Ergo, 0001-01-01.
You can specify any default date as long as it is a valid date, and you may also use the
NULL value. The advantages of utilizing this attribute on all date columns are threefold.
One, automatic editing of the value, two, requires only 4 bytes of disk storage versus the
8 required by current zone decimal definition. Three allows the use of special operation
codes to simplify date manipulation and date arithmetic within programs.
Other enhancements to the denormalized table include the elimination of repeating
columns, “Patient Account Number”, “Last Change Date”, “Hospital Code”, and
“Sequence Number”. And, the optimization of numeric columns from the standpoint of
disk storage and CPU processing. The AS/400 stores numeric values in a packed format.
If you define the numeric columns as zoned decimal, you are incurring additional CPU
processing time for the translation from one format to the other each time that column is
accessed. You are also approximately doubling disk storage requirements for your
numeric columns.
32. 32
By simply applying these design criteria, and without any end-user input I have
effectively achieved a much more efficient end-user access table. The additional end-user
input will result in an even more efficient design.
An additional benefit of this design is the amount of storage space reclaimed as shown in
figures 26 and 27.
Figure 26. Trinity Hospital Normalized Tables
Required Storage Space in Bytes
TABLE NAME DATA ACCESS PATH TOTAL
Patient Diagnosis 17,827,840 13,438,946 31,275,008
Clinic Code Table 4,722,176 4,526,080 9,254,400
Patient Procedure 2,633,216 1,511,424 2,250,752
Patient Surgical 1,322,496 921,600 2,250,752
Total 26,505,728 20,398,080 46,931,456
Figure 27. Trinity Hospital Denormalized Table
Required Storage Space in Bytes
TABLE NAME DATA ACCESS PATH TOTAL
Clinical01 28,313,600 13,438,976 41,766,912
We see a net saving of five Megs for this one hospital, multiply it times ninety-two, our
currently supported hospitals and the savings start getting more interesting, 485-Megs.
And that’s not all, I have performed a simple prototyping demonstration on four, small,
transaction tables. There are thirty-four additional tables that require more in-depth
analysis to determine further aggregation possibilities. I did some quick analysis and can
tell you that there are ten more that can be aggregated. Not to mention the elimination of
most, if not all joins, depending on the results of the aforementioned aggregations. That
will translate into hefty storage savings. In addition, we will have a tremendous
performance throughput improvement as described previously.
33. 33
Additional Enhancements
The enhancement project discussed previously accomplishes one objective. The
improvement of response times on the production systems by the removal of the data
warehouses from those production systems. If I may say so, it is akin to cutting off ones
hand because your finger hurts. Other functional areas that need to be addressed are:
Data Extraction
Report Mining
Data Transformation
Data Propagation
Data Verification
Data Extraction
Currently we are populating the data warehouses with daily feeds from a mainframe.
Historical data is also obtained from the same mainframe on an as needed basis. We are
also operating in an environment that requires continuous enhancements to the existing
warehouses as users request additional fields upon which to query. As a perfect example I
would like to cite the last two projects I was involved in. The first involved adding one
field. The second involved adding six fields and a full historical reload from the
mainframe. Together both projects lasted about three months. In addition, there is another
project on the sidelines called the “Field Add Project”. Which leads me to believe we will
be adding more fields, “why, elementary my dear Watson”! It would seem that the initial
user requirements were somewhat incomplete. If so, let’s be proactive and interview the
users now, so we may identify all the specific elements that warrant inclusion in the
warehouse up front. In doing so, we will have killed two birds with one stone. Pleased the
users who will be able to extract additional information and, MIS will automatically
become and be perceived as being much more productive.
Report Mining
Spooled files (reports that have not yet printed) contain data that has already been
extracted from operational databases and report mining can be used to access this data.
Almost every OLTP application, whether canned or homegrown generates a
comprehensive suite of reports. Because they provide valuable information to end users
in a relatively intuitive way, reports mask the complexity of underlying OLTP databases.
Furthermore, report programs have already located, accessed, extracted, and consolidated
valuable operational information. Reports also maintain metadata in the form of column
headings, date ranges, titles, and other descriptive text. It may be worthwhile to invest in
software that allows the integration of data obtained from spooled files into the
warehouse. Currently this option is not even on the drawing board. Yet, all four
interfacing applications produce reports.
34. 34
Data Transformation
Once raw data has been extracted from OLTP databases, it must be reformatted and
refined for the data warehouse. The transformation of this raw data comprises five related
activities: data aggregation, data filtering, data combining, data standardization, and data
cleansing.
Data Aggregation
Aggregation is an essential transformation function that summarizes operational data.
The aggregation process should combine the header and detail records into one record
(interfile aggregation). TENET’s current warehouses don’t make use of this
transformation technique.
Data Filtering
Transformational processes also may filter relevant information from OLTP databases.
For example, an executive looking for net revenues would probably have no interest in
patient account numbers. This and other extraneous data elements would not be
transferred to a data warehouse. To the best of my knowledge no data filtering is done for
TENET.
Data Combination
A third transformation function may combine OLTP data from separate applications and
platforms. The growth of distributed-processing environments has resulted in operational
databases that are scattered around the world. Data warehouses must be able to combine
data elements from these disparate systems.
This issue has already reared its head, as we are supposed to integrate the ORNDA
system made up of another group of hospitals which have an 11 byte patient account
number versus the current TENET standard of 9 bytes.
Data Standardization
Data-transformation processes standardize data elements and the metadata that describes
those elements. The difficulties caused by poor or even nonexistent documentation
underscore the need for consistency. Basic field attributes such as content, size, type, and
descriptions often differ across multiple applications, or even within a single application.
The inconsistent use of codes is a frequent problem as well. Out of all the transformation
activities this one is the one TENET lags the least behind in. Nonetheless, there are no
technical metadata repositories, nor are there any business metadata repositories.
Data Cleansing
Another function data-transformation programs perform is that of data cleansing. Data
transformation procedures to ensure the accuracy of warehouse repositories must be in
place. TENET’s warehouse has none.
35. 35
Data Propagation
Data-propagation procedures physically move transformed OLTP data to data
warehouses. TENET’s data-propagation procedures are performed periodically on a daily
basis. The biggest problem with TENET’s propagation procedures is that the procedures
need to be constantly monitored manually. Someone on the mainframe side must monitor
the successful completion of the propagation jobs there, and someone on the AS/400 side
must verify the successful completion of the propagation jobs there. This is done five
days a week multiple times a day. This area needs to be reviewed ASAP.
Data Verification
To maintain warehouse integrity, systematic procedures to periodically compare
warehouse information to operational data must be in place. TENET has none, therefore
we only know of problems when the users call us on them, or the propagation procedure
crashes because it encountered some unreadable data.
As I have shown you, extraction, transformation, and propagation, the three processes
that move data into warehouses, as well as verification procedures, are all important
elements in creating and maintaining effective data warehouses. Figure 20 on the
following page presents an overview of these interrelated processes.
36. 36
Figure 20.
Data-Warehouse Maintenance Procedures
Operational Database
Raw Data
Extraction Phase: Custom programs and/or replication tools access raw data.
Extracted Data
Transformation Phase: Custom programs and/or replication tools cleanse, decode,
standardize, and aggregate extracted data.
Transformed Data
Propagation Phase: Custom programs and/or replication tools move transformed data
to the data warehouse.
Data Warehouse
Warehouse Data
Verification Phase: Customized programs regularlycompare warehouse data to source data.
Printed Results
Verification Reports
Operational Data
37. 37
Testimonials
Data Warehouse
If you or a friend have a mortgage loan with Countrywide, feel free to go to
WWW.Countrywide.com , and pull up your loan or any other information on any of the
other products such as HELOC, Credit Card and various insurance offerings. The
information displayed is retrieved from the back-end I designed on an AS/400 using the
previously detailed techniques and procedures. The exceptionally good response time is
due mostly to the denormalization technique that allowed me to reduce a database
composed of eighteen normalized tables into a database of two denormalized ones. We
also came up with a live demo hosting WWW.countrywide.com on the same AS/400 that
was hosting the warehouse. Unfortunately at the time we had some unresolved security
issues with the firewall and no time to work them out. So we decided to host the website
on the NT server. This actually worked out to my favor in that it emphasized the power of
denormalized tables. When you request any loan, or other product information at
WWW.countrywide.com, your request is received by a JAVA program on the NT server
where the site is hosted. The JAVA program then submits a SQL read to the
denormalized database on the AS/400 back-end, and returns the requested information
with sub-second response time thanks to the minimized disk IO. Keep in mind though,
that if the AS/400 that hosts the warehouse would also host the website, the response time
would be further increased by the elimination of the middle layer (NT server).
38. 38
DSS & EIS
The initial project requirements also included the creation of a Decision Support System
(DSS) and an Executive Information System (EIS). Unfortunately, due to political
turmoil only the “Retrieval of Loan and Related Information on the Web” requirement
survived. Nonetheless we were able to come up with a live demo of the DSS/EIS system.
The prototype consisted of the not so hypothetical query, “How many of our customers
have multiple products”? We had to come up with a way of satisfying the executives
thirst for knowledge and their impatience with long response times. After some
brainstorming I came up with the following solution:
1. Develop a “Customer to Product Relationship Table” updated daily from the Product
Warehouse.
2. Creation of a temporary table using the Relationship table as input, containing two
fields.
Customer# # of Products
12345 2
45689 5 Creation time for this table against a 17 million row warehouse
56423 3 on an AS/400 was 2 minutes.
etc. etc.
3. Count the number of records in the above temporary table thereby obtaining the
answer.
Steps two and three were obtained through the following SQL code:
CREATE TABLE TEMP1 (COL1 INT, COL2 INT)
INSERT INTO TEMP1
SELECT T03CUSTNUM, COUNT(T03PRODCOD) FROM WEBT030P
GROUP BY T03CUSTNUM HAVING COUNT (T03PRODCOD) > 1
SELECT COUNT (*) FROM TEMP1
Assume for a moment that we are able to convince the executives to sit down with us and
help us pre-define their queries. We could then set up a series of jobs that would run at
night whose sole purpose was to create a series of temporary tables, one for each
executive query. At that point, come morning the executives would have but to press a
key or click a button to obtain answers with sub-second response time.
39. 39
Conclusion
Time is of the essence. We need to overhaul TENET’s data warehouse now. Not
tomorrow, or the day after, but now. The current architecture does not promote itself to
unhindered growth. What confirms this are the difficulties we are encountering to add the
ORNDA hospitals, due to the different “patient account number” field sizes. In addition,
there are the lack of edits, which cause the notorious garbage in garbage out situation.
Wasted storage space, no metadata, and inefficient propagation procedures. If we
maintain the status quo it is only a matter of time before we jeopardize our relationship
with TENET. This scenario is particularly undesirable in light of the recent IPO.
There are several major roadblocks to implementing this overhaul. First and foremost,
“If it ain’t broke, don’t fix it”. Well, its about to break. Second, if we get the go ahead to
modernize the architecture, the three homegrown applications, CASEMIX, PQS, and
Cost Accounting will have to be re-written, and the users will have to be re-trained to
access the new data warehouse. So we are looking at embarking on a project of epic
proportions which frankly, TENET may not be interested with. In which case, amen. But,
if my vision makes sense to you and you feel, like I do, that there is a need for powerful
data warehousing solutions such as the one I have depicted in this novel, then we can lay
down the foundations for PEROT Systems to become a major player in providing
customized data warehousing solutions for our current and future clients.