1. PIMS Data Warehouse
COMSATS Institute of Information Technology, Islamabad
Patient Information and Monitoring
System Using Data Warehousing
By
Tahir Ayoub
SP08-BCS-052
Faraz Ahmed
SP08--BCS-015
Supervisor: Muhammad Mustafa Khattak
Bachelor of Computer Science (2008-2012)
The candidate confirms that the work submitted is his own and appropriate credit has
been given where reference has been made to the work of others
2. PIMS Data Warehouse
2
DECLARATION
We hereby declare that this software, neither as a whole nor as a part hereof has been
copied out from any source. It is further declared that we have developed this
Software and the accompanied report entirely on the basis of our personal efforts
made under the sincere guidance of our seniors and teachers. If any part of this report
is proved to be copied out or found to be reported, we shall standby the consequences.
No portion of the work presented in this report has been submitted in support of any
other degree or qualification of this or any other university or institute of learning.
Tahir Ayoub
SP08-BCS-052
Faraz Ahmed
SP08-BCS-015
3. PIMS Data Warehouse
3
CERTIFICATE OF
APPROVAL
It is to certify that the final year project of BS (CS) “PATIENT INFORMATION AND
MONITORING SYSTEM USING DATA WAREHOUSE” was developed by “Tahir
Ayoub (CIIT/SP08-BCS-052)” and “Faraz Ahmed (CIIT/SP08-BCS-015)” under the
supervision of “Muhammad Mustafa Khattak” and that in their opinion; it is fully
adequate, in scope and quality for the degree of Bachelors of Science in Computer
Sciences.
---------------------------------------
Supervisor
---------------------------------------
External Examiner
---------------------------------------
Head of Department
(Department of Computer Science)
4. PIMS Data Warehouse
4
EXECUTIVE SUMMARY
There are a number of reasons for which the migration form Relational Database
Management System (RDBMS) to Data Warehouse is required.
Data warehouse is an informational environment that:
Provides an integrated and total view of enterprise.
Make the enterprise current and historical information easily available for
decision making.
Make decision-support transactions possible without hindering operational
systems.
Renders the organization’s information consistent.
Presents a flexible and interactive source of strategic information.
This is a solution for a user with the prior knowledge of data warehouse design
concepts. The Warehouse will provide support for intelligent user to create data
warehouse schema from an existing OLTP systems consisting of a relational database,
which will be MS Notepad, MS Excel, MS Access, SQL server 2008. The target
system i.e., the warehouse will also be implemented in SQL Server 2008 r2.
5. PIMS Data Warehouse
5
ACKNOWLEDGMENT
ALLAH the Almighty! We are thankful to you, for giving us with the courage to take,
and for your infinite help to complete this project. Without your help we would have
never been able to complete this project.
Thanks to all the teachers for guiding us throughout our stay at this university and all
the friends for providing us a beautiful company which we will never forget.
And last but not the least, thanks to our parents because without their love, affection,
and prayers for us; our studies and this project were not achievable.
-------------------------------- --------------------------------
Tahir Ayoub Faraz Ahmed
6. PIMS Data Warehouse
6
Abbreviations
ODS Operational Data source
SSIS SQL Server Integration Services
SSAS SQL Analysis Services
SSRS SQL Reporting Services
DWH Data Warehouse
OLAP Online Analysis Process
OLTP Online Analytical Process
ETL Extract Transform Load
7. PIMS Data Warehouse
7
Contents
1.Introduction........................................................................................................ 13
1.1 Brief .................................................................................................................................13
1.2 Relevance to Course Modules............................................................................................. 13
1.3 Project Background............................................................................................................ 14
1.4 Literature Review .............................................................................................................. 14
1.4.1 Area of Knowledge....................................................................................................... 14
1.4.2 Decision Support Systems (DSS)................................................................................... 15
1.4.3 Data Warehouse ........................................................................................................... 16
1.4.4 Development Lifecycle .................................................................................................16
1.4.5 Data Warehouse SDLC.................................................................................................18
1.4.6 Classical SDLC............................................................................................................ 18
1.4.7 Overview of ETL.......................................................................................................... 18
1.4.8 Major Functions ........................................................................................................... 19
1.5 Methodology and Software Life Cycle ................................................................................ 20
2 Problem Definition.............................................................................................. 22
2.1 Purpose............................................................................................................................. 22
2.2 Product Functions .............................................................................................................. 22
2.3 Proposed Architecture........................................................................................................ 23
2.3.1 Basics of Data Warehouse and ETL............................................................................... 23
2.3.2 Data Warehouse Architectures....................................................................................... 25
2.3.3 Basic Data Warehouse Architecture............................................................................... 25
Figure 3 Basic Data Warehouse Architecture.............................................................................. 25
2.3.4 Data Warehouse Architecture with Staging Area ............................................................ 26
Figure 4 Data Warehouse Architecture with Staging Area........................................................... 26
2.3.5 Data Warehouse Architecture with Staging Area and Data Marts..................................... 26
Figure 5 Data Warehouse Architecture with Staging Area and Data Marts....................................26
2.3.6 Data Warehouse Modeling ............................................................................................ 27
2.3.7 ETL Operation ............................................................................................................. 27
3 Requirements Analysis........................................................................................ 32
3.1 Project Overview............................................................................................................... 32
3.1.1 Data Profiling............................................................................................................... 32
3.1.2 Warehouse Schema Generation ..................................................................................... 32
3.1.3 Data Extraction............................................................................................................. 32
3.1.4 Data Transformation..................................................................................................... 32
3.1.5 Data Loading................................................................................................................ 32
3.2 Functional Requirements ....................................................................................................33
3.2.1 Data Profiling............................................................................................................... 33
3.2.2 Warehouse Schema Generation ..................................................................................... 33
3.2.3 Data Extraction............................................................................................................. 33
3.2.4 Data Transformation..................................................................................................... 34
3.2.5 Data Loading................................................................................................................ 35
3.3 Nonfunctional Requirements .............................................................................................. 35
3.3.1 Performance Requirements............................................................................................ 35
3.3.2 Safety Requirements ..................................................................................................... 35
3.3.3 Reliability Requirements............................................................................................... 35
3.4 External Interface Requirements ......................................................................................... 35
3.4.1 User Interface............................................................................................................... 35
3.4.2 Hardware Resources..................................................................................................... 35
3.4.3 Hardware Interfaces...................................................................................................... 36
3.5 Use Case Specifications ..................................................................................................... 36
3.5.1 Connect RDBMS User..................................................................................................36
3.5.2 Connect Data Warehouse Schema.................................................................................. 36
3.5.3 Connect database User..................................................................................................37
3.5.4 Load Relational Database Model................................................................................... 37
3.5.5 Identify Table Names....................................................................................................38
8. PIMS Data Warehouse
8
3.5.6 Identify Columns with Data Types................................................................................. 38
3.5.7 Identify Relationships between Tables ........................................................................... 39
3.5.8 Load Warehouse Schema.............................................................................................. 39
3.5.9 Map Columns............................................................................................................... 40
3.5.10 Extract Data from RDBMS........................................................................................... 41
3.5.11 Transform Extracted Data ............................................................................................. 41
3.5.12 Load Data in Warehouse............................................................................................... 42
4 The Design.......................................................................................................... 44
4.1 Modules............................................................................................................................ 44
4.1.1 Connectivity................................................................................................................. 44
4.1.2 RDBMS Details ........................................................................................................... 44
4.1.3 Schema Generation....................................................................................................... 44
4.1.4 Column Mappings ........................................................................................................ 45
4.1.5 Extraction .................................................................................................................... 45
4.1.6 Transformation............................................................................................................. 45
4.1.7 Loading ....................................................................................................................... 45
5 UML Structure Diagram....................................................................................... 45
5.1 Class Diagram................................................................................................................... 45
5.1.1 City lab........................................................................................................................ 45
Figure 7 City lab....................................................................................................................... 45
5.1.2 Health ways ................................................................................................................. 46
5.1.3 Clinic........................................................................................................................... 47
5.1.4 CMH hospital............................................................................................................... 48
5.1.5 Urwah lab.................................................................................................................... 49
5.2 Object diagram .................................................................................................................. 50
5.3 Component Diagram.......................................................................................................... 51
5.4 Deployment Diagram......................................................................................................... 52
5.5 Composite Structure Diagram............................................................................................. 53
5.6 Package diagram................................................................................................................ 54
6 UML Behavior Diagrams...................................................................................... 56
6.1 Use Case Diagram ............................................................................................................. 56
6.2 Activity Diagram ............................................................................................................... 57
6.2.1 Create new project........................................................................................................ 57
6.2.2 Open existing project....................................................................................................57
6.2.3 Close project................................................................................................................ 58
Figure 20 Close project............................................................................................................. 58
6.2.4 Create mapping ............................................................................................................ 58
6.2.5 Load RDBMS.............................................................................................................. 59
6.3 State Machine diagram....................................................................................................... 60
6.3.1 Report.......................................................................................................................... 60
6.3.2 ETL............................................................................................................................. 61
7 UML Interaction Diagrams................................................................................... 63
7.1 Sequence Diagram............................................................................................................. 63
7.1.1 Create NewProject....................................................................................................... 63
7.1.2 Open Existing Project ...................................................................................................64
7.1.3 Close Project................................................................................................................ 65
7.1.4 Load RDBMS Details ...................................................................................................66
7.1.5 Create Schema.............................................................................................................. 67
7.1.6 Create Mappings .......................................................................................................... 69
7.1.7 Data Extraction............................................................................................................. 70
7.1.8 Data Transformations....................................................................................................71
7.1.9 Report Generation ........................................................................................................ 73
7.1.10 ETL............................................................................................................................. 74
7.2 Communication Diagram ...................................................................................................75
7.2.1 ETL............................................................................................................................. 75
7.2.2 Report.......................................................................................................................... 75
9. PIMS Data Warehouse
9
7.3 Interaction Overview.......................................................................................................... 76
7.3.1 ETL............................................................................................................................. 76
7.3.2 Warehouse Interaction ..................................................................................................76
7.3.3 Access Model............................................................................................................... 77
8 Implementation.................................................................................................. 79
8.1 System Implementation...................................................................................................... 79
8.2 Back end software SQL server r2........................................................................................ 79
8.2.1 Snowflake Schema ....................................................................................................... 79
Figure 41 Snowflake Schema ....................................................................................................79
8.2.2 ETL SSIS..................................................................................................................... 80
8.2.3 Overview of PIMS ETL................................................................................................ 80
8.2.4 General Overview of PIMS City Lab ETL...................................................................... 81
8.2.5 General Overview of PIMS Clinics ETL ........................................................................ 81
8.2.6 General Overview of PIMS CMH Hospital ETL............................................................. 82
8.2.7 General Overview of PIMS Fact table ETL ....................................................................83
8.2.8 General Overview of PIMS Heath ways ETL .................................................................84
8.2.9 General Overview of PIMS Urwah Lab ETL..................................................................85
8.2.10 General Overview of PIMS ETL of date format.............................................................. 86
8.2.11 PIMS ETL of date format.............................................................................................. 86
8.3 SSAS................................................................................................................................ 88
8.3.1 Overview of OLAP Cube.............................................................................................. 88
8.3.2 Overview of OLAP Cube drill down.............................................................................. 89
8.4 General Overview of Dimensions ....................................................................................... 90
8.4.1 Overview of Date time dimensions ................................................................................ 90
8.4.2 Date time English month calculation.............................................................................. 91
8.4.3 Overview of Doctor Dimension ..................................................................................... 92
8.5 Graphical User Interface..................................................................................................... 93
8.5.1 Overview of SSRS report 1........................................................................................... 93
8.5.2 SSRS report 1 month wise............................................................................................. 94
8.5.3 SSRS report 2 total Patients in different cities w.r.t year.................................................. 95
8.5.4 SSRS report 3 disease wise ........................................................................................... 96
8.5.5 SSRS report 3a Total patients affected by disease in different years.................................97
8.5.6 SSRS report 3c tabular disease report............................................................................. 98
8.5.7 SSRS report 4 Patient Detail.......................................................................................... 99
8.5.8 SSRS report 5 Dr. Detail............................................................................................. 100
8.5.9 SSRS report 5a Dr. Detail .......................................................................................... 101
8.5.10 SSRS report 5b Patient Detail...................................................................................... 101
8.5.11 SSRS report 5c Report Patient Detail........................................................................... 102
9 Testing and Evaluation.......................................................................................104
9.1 Testing............................................................................................................................ 104
9.1.1 Black Box Testing ...................................................................................................... 104
9.2 Testing of PIMS Data warehouse...................................................................................... 105
9.3 Test Cases....................................................................................................................... 106
10 Future Work...............................................................................................111
11 References.................................................................................................113
10. PIMS Data Warehouse
10
Table of Figures
Figure 1 Methodology and Software Life Cycle................................................................. 20
Figure 2 Contrasting OLTP and Data Warehousing Environments....................................... 24
Figure 3 Basic Data Warehouse Architecture.................................................................... 25
Figure 4 Data Warehouse Architecture with StagingArea................................................. 26
Figure 5 Data Warehouse Architecture with StagingArea and Data Marts ......................... 26
Figure 6 ETL Operation.................................................................................................... 28
Figure 7 City lab.............................................................................................................. 45
Figure 8 Healthways....................................................................................................... 46
Figure 9 Clinic................................................................................................................. 47
Figure 10 CMH hospital................................................................................................... 48
Figure 11 Urwah lab........................................................................................................ 49
Figure 12 Object diagram................................................................................................ 50
Figure 13 Component Diagram........................................................................................ 51
Figure 14 Deployment Diagram....................................................................................... 52
Figure 15 Composite Structure Diagram........................................................................... 53
Figure 16 Package diagram.............................................................................................. 54
Figure 17 Use Case Diagram............................................................................................ 56
Figure 18 Activity Diagram(Create new project)............................................................... 57
Figure 19 Activity Diagram(Open existing project)............................................................ 57
Figure 20 Close project.................................................................................................... 58
Figure 21 Activity Diagram(Create mapping).................................................................... 58
Figure 22 Activity Diagram(Load RDBMS) ........................................................................ 59
Figure 23 State Machine Diagram (Report)....................................................................... 60
Figure 24 State Machine Diagram (ETL)............................................................................ 61
Figure 25 Sequence Diagram (Create New Project)........................................................... 63
Figure 26 Sequence Diagram (Open Existing Project) ........................................................ 64
Figure 27 Sequence Diagram (Close Project)..................................................................... 65
Figure 28 Sequence Diagram (Load RDBMS Details).......................................................... 66
Figure 29 Sequence Diagram (Create Schema).................................................................. 68
Figure 30 Sequence Diagram (Create Mappings)............................................................... 69
Figure 31 Sequence Diagram (Data Extraction)................................................................. 70
Figure 32 Sequence Diagram (Data Transformations)Data Loading.................................... 71
Figure 33 Sequence Diagram (Data Loading)..................................................................... 72
Figure 34 Sequence Diagram (Report Generation)............................................................ 73
Figure 35 Sequence Diagram (ETL)................................................................................... 74
Figure 36 Communication Diagram (ETL).......................................................................... 75
Figure 37 Communication Diagram (Report)..................................................................... 75
Figure 38 Interaction Overview (ETL) ............................................................................... 76
Figure 39 Interaction Overview (Warehouse Interaction)Web Diagrams ............................ 76
Figure 40 Web Diagrams (Access Model).......................................................................... 77
Figure 41 Snowflake Schema........................................................................................... 79
Figure 42 Overview of PIMS ETL....................................................................................... 80
Figure 43 General Overview of PIMS City Lab ETL ............................................................. 81
Figure 44 General Overview of PIMS Clinics ETL................................................................ 81
11. PIMS Data Warehouse
11
Figure 45 General Overview of PIMS CMH Hospital ETL..................................................... 82
Figure 46 General Overview of PIMS Fact table ETL .......................................................... 83
Figure 47 General Overview of PIMS Heath ways ETL........................................................ 84
Figure 48 General Overview of PIMS Urwah Lab ETL ......................................................... 85
Figure 49 General Overview of PIMS ETL of date format................................................... 86
Figure 50 PIMS ETL of date format................................................................................... 87
Figure 51 Overview of OLAP Cube.................................................................................... 88
Figure 52 Overview of OLAP Cube drill down.................................................................... 89
Figure 53 Overview of Date time dimensions.................................................................... 90
Figure 54 Date time English month calculation................................................................. 91
Figure 55 Overview of Doctor Dimension......................................................................... 92
Figure 56 Overview of SSRS report 1................................................................................ 93
Figure 57 SSRS report 1 month wise................................................................................. 94
Figure 58 SSRS report 2 total Patients in different cities w.r.tyear..................................... 95
Figure 59 SSRS report 3 disease wise ............................................................................... 96
Figure 60 SSRS report 3a Total patients affected by disease in different years.................... 97
Figure 61 SSRS report Total revenue of CMH Hospitals yearly............................................ 98
Figure 62 SSRS report 4 Patient Detail.............................................................................. 99
Figure 63 SSRS report 5 Dr. Detail...................................................................................100
Figure 64 SSRS report 5b Patient Detail...........................................................................101
Figure 65 SSRS report 5c Report Patient Detail................................................................102
13. PIMS Data Warehouse
13
1. Introduction
1.1 Brief
This Documentation includes the detailed description of “Patient Information and
monitoring system using data warehousing”. It covers all the phases of system
development including requirement analysis, designing, and implementation and
testing.
The aim of this project is to make patient information and monitoring system using
data-ware housing. Also, worldwide the healthcare industry is looking for technology
that can aid with establishing on-line clinical repositories enabling rapid access to
shared information that can help find cures for prevalent medical conditions. It not
only facilitates the hospitals as well as doctors but also facilitates the government
sector to make critical decisions; our project shall help them in decision making. The
project is divided in many parts like:
First comes different ODS, our project has 5 different ODS’s namely,
Healthways.xlx
Urwahlab.xlsx
CMH Hospitals SQL
Clinics.txt
City_Lab.accdb
Second come the ETL part,
The tools used for ETL is SSIS, which is used to Extract data from ODS’s,
transformed them to our standard formats and load them to DWH.
Thirdly after doing ETL we made OLAP cubes
The tool used for it was SSAS.
Fourthly after making OLAP cubes we made reports (having queries both from
OLAP cubes and DWH).
The tool used to make reports is SSRS.
Then comes the last part i.e. these reports have to be shown to front end users the
tools used for it is ASP.net
1.2 Relevance to Course Modules
The course of “data base” provides us the basic knowledge about databases which is
the one of the basic requirement of our project. “Human Computer Interaction”
helped us in designing user friendly GUI and reports. The most important is the
14. PIMS Data Warehouse
14
internet/tutorials part by which we learned a lot about data warehouse, which were
very beneficial during the project.
1.3 Project Background
Health care has become one of the most important service industries that are
undergoing rapid structural transformations. Healthcare remains a paper-intensive and
minimally automated and digitized industry. CBS market watch reported that an
estimated 90% of all patient information remains on paper. Also, worldwide the
healthcare industry is looking for technology that can aid with establishing on-line
clinical repositories enabling rapid access to shared information that can help find
cures for prevalent medical conditions.
As in Pakistan there is no such concept of DWH in healthcare department, so we are
making this project and the basic theme behind is to make patient information and
monitoring system using data-ware housing. Also, worldwide the healthcare industry
is looking for technology that can aid with establishing on-line clinical repositories
enabling rapid access to shared information that can help find cures for prevalent
medical conditions. It not only facilitates the hospitals as well as doctors but also
facilitates the government sector to make critical decisions; our project shall help
them in decision making.
1.4 Literature Review
The origins of DSS processing hark back to the very early days of computers and
information systems. It is interesting that decision support system (DSS) processing
developed out of a long and complex evolution of information technology. Its
evolution continues today.
The Data warehouse architecture has evolved throughout the history of the different
stages of information processing. The information contained in a warehouse flows
from the same operational systems that could not be directly used to produce strategic
information. The data-warehouse user also called the DSS analyst is a business person
first and foremost, and a technician second. The primary job of the DSS analyst is to
define and discover information used in corporate decision-making.
For developing a complete understanding the chapter starts with explaining Data
Warehouse and basics of the ETL Operations.
1.4.1 Area of Knowledge
This project mainly concerns with healthcare of Data Warehouse design and ETL
Operation required populating the Data Warehouse. Complete and clear knowledge of
Decision Support System (DSS) and Date Warehouse designing is essential for
understanding this project.
15. PIMS Data Warehouse
15
1.4.2 Decision Support Systems (DSS)
The origin of DSS processing hark back to the very early days of computers and
information systems. It is interesting that decision support system (DSS) processing
developed out of a long and complex evolution of information technology. Its
evolution continues today. By the mid-1970s, online transaction processing (OLTP)
made faster access to data possible, opening whole new vistas for business and
processing. The computer could now be used for tasks not previously possible,
including driving reservations systems, bank teller systems, manufacturing control
systems, and the like.
Throughout this period, organizations accumulated growing amounts of data stored in
their operational databases. However, in recent times where such systems are
common place, organizations are focusing on ways to use operational data to support
decision-making, as a means of gaining competitive advantage. Business executives
have become desperate for information to stay competitive and improve the bottom
line. Although operational systems provide information to run the day-to-day
operations, these cannot be readily used to make strategic decisions. Businesses,
therefore, are compelled to turn to new ways of getting strategic information. IT
departments have been attempting to provide information to the key business
personnel in their companies for making strategic decisions. Sometimes an IT
department could produce ad hoc reports from a single application. In most cases, the
reports would need data from multiple systems, requiring the rewriting of existing
programs to create intermediary files that could be used to produce ad hoc reports.
Most of the attempts by the IT in the past ended in failure. The users could not clearly
define what they wanted in the first place. Once they saw the first set of reports, they
wanted more data in different formats. The chain continued. This was mainly because
of the very nature of the process of making strategic decisions. We have been trying
all along to provide strategic information from the operational systems. Information
needed for strategic decisions making has to be available in an interactive manner.
The use must be able to query online, get results, and query some more. The
information must be in a format suitable for analysis. If we need the ability to provide
strategic information, we must get the information from altogether different types of
systems. For example, the following queries cannot be answered using by a simple
operational system as it contains static data:
How profitable shall the company be next quarter?
Who are the top ten customers during the last six months?
What was the profit last month and how much did it differ from the profit of
the same month during the last three years?
What is the relationship between the total annual revenue generated by each
branch office and the total number of sales staff assigned to each branch
office?
16. PIMS Data Warehouse
16
Operational systems support the business processes of the company. They are used to
watch how the business runs, and then make strategic decisions to improve the
business. The concept of data warehouse is deemed the solution to meet the
requirements of a system capable of supporting decision making, receiving data from
multiple data sources.
1.4.3 Data Warehouse
A data warehouse is a relational database that is designed for query and analysis
rather than for transaction processing. It usually contains historical data derived from
transaction data, but it can include data from other sources. It separates analysis
workload from transaction workload and enables an organization to consolidate data
from several sources.
In addition to a relational database, a data warehouse environment includes an
extraction, transportation, transformation, and loading (ETL) solution, an online
analytical processing (OLAP) engine, client analysis tools, and other applications that
manage the process of gathering data and delivering it to business users.
Data warehouse is an informational environment that:
Provides an integrated and total view of enterprise.
Make the enterprise current and historical information easily available for
decision making.
Make decision-support transactions possible without hindering operational
systems.
Renders the organization’s information consistent.
Presents a flexible and interactive source of strategic information.
The users of the data warehouse environment have a completely different approach to
using the system. Unlike operational users who have a straightforward approach to
defining their requirements, the data warehouse user operates in a mindset of
discovery. The end user of the data warehouse says, “Give me what I say I want, and
then I can tell you what I really want.”
1.4.4 Development Lifecycle
Operational data is usually application oriented and as a consequence is an integrated,
whereas data warehouse data must be integrated. Other major differences also exist
between the operational level of data and processing and the data warehouse level of
data and processing. The development life cycles of these systems can be a profound
concern, the operational environment is supported by the classical systems
development life cycle (the SDLC). The SDLC is often called the “waterfall”
development approach because the different activities are specified and one activity-
upon its completion-spills down into the next activity and triggers its start.
17. PIMS Data Warehouse
17
The development of the data warehouse operates under a very different life cycle,
sometimes called the CLDS (the reverse of the SDLC). The classical SDLC is driven
by requirements.
The CLDS is almost exactly the reverse: The CLDS starts with data. Once the data is
in hand, it is integrated and then tested to see what bias there is to the data, if any.
Programs are then written against the data. The results of the programs are analyzed,
and finally the requirements of the system are understood. The CLDS is usually called
a “spiral” development methodology.
The classical system development life cycle (SDLC) does not work in the world of the
DSS analyst. The SDLC assumes that requirements are known at the start of design
(or at least can be discovered). In the world of the DSS analyst, though, new
requirements usually are the last thing to be discovered in the DSS development life
cycle. The DSS analyst starts with existing requirements, but factoring in new
requirements is almost impossibility. A very different development life cycle is
associated with the data warehouse.
18. PIMS Data Warehouse
18
1.4.5 Data Warehouse SDLC
Implement Warehouse
Integrate data
Test for bias
Program against data
Design DSS system
Analyze results
Understand requirements
1.4.6 Classical SDLC
Requirements gathering
Analysis
Design
Programming
Testing
Integration
Implementation
The CLDS is a classic data-driven development life cycle, while the SDLC is a classic
requirements-driven development life cycle.
1.4.7 Overview of ETL
When data is required to be loaded in data warehouse regularly, so that it can serve its
purpose of facilitating business analysis. To do this, data from one or more
operational systems needs to be extracted and copied into the warehouse. The process
of extracting data from source systems and bringing it into the data warehouse is
commonly called ETL, which stands for extraction, transformation, and loading. The
acronym ETL is perhaps too simplistic, because it omits the transportation phase and
implies that each of the other phases of the process is distinct. We refer to the entire
process, including data loading, as ETL. The ETL refers to a broad process, and not
three well-defined steps.
The methodology and tasks of ETL have been well known for many years, and are not
necessarily unique to data warehouse environments: a wide variety of proprietary
applications and database systems are the IT backbone of any enterprise. Data has to
be shared between applications or systems, trying to integrate them, giving at least
two applications the same picture of the world. This data sharing was mostly
addressed by mechanisms similar to what is now called ETL.
Data warehouse environments face the same challenge with the additional burden that
they not only have to exchange but to integrate, rearrange and consolidate data over
many systems, thereby providing a new unified information base for business
19. PIMS Data Warehouse
19
intelligence. Additionally, the data volume in data warehouse environments tends to
be very large.
What happens during the ETL process? During extraction, the desired data is
identified and extracted from many different sources, including database systems and
applications. Very often, it is not possible to identify the specific subset of interest;
therefore more data than necessary has to be extracted, so the identification of the
relevant data shall be done at a later point in time. Depending on the source system's
capabilities (for example, operating system resources), some transformations may
take place during this extraction process. The size of the extracted data varies from
hundreds of kilobytes up to gigabytes, depending on the source system and the
business situation. The same is true for the time delta between two (logically)
identical extractions: the time span may vary between days/hours and minutes to near
real-time. Web server log files for example can easily become hundreds of megabytes
in a very short period of time.
1.4.8 Major Functions
The basic purpose of the project is to build a healthcare system that not only help
doctors, patients but also the government sector to take major decision on health
department, The main features provided by the software to use ETL to:
Create a definition of a data warehouse.
Configure the definitions for a physical instance of the data warehouse.
Validate the set of definitions and their configurations.
Create and populate the data warehouse instance.
Data transformations.
Deploy and initially load the data warehouse instance.
Maintain the physical instance by conditionally refreshing.
ETL supports the design of relational database schemas, ETL processes and End User
tool environments through the client.
Source systems play an important role in ETL a solution. Instead of creating metadata
manually, ETL provides integrated components that import the relevant information
into its repository.
To ensure the quality and completeness of the data in the repository ETL provides
extensive validation within the repository. Validation helps to keep a complex system
in an accurate and coherent state.
20. PIMS Data Warehouse
20
1.5 Methodology and Software Life Cycle
Figure 1 Methodology and Software Life Cycle
22. PIMS Data Warehouse
22
2 Problem Definition
2.1 Purpose
As in Pakistan there is no such concept of data warehouse in healthcare department,
so we are making this project and the basic theme behind is to make patient
information and monitoring system using data-ware housing. Also, worldwide the
healthcare industry is looking for technology that can aid with establishing on-line
clinical repositories enabling rapid access to shared information that can help find
cures for prevalent medical conditions. It not only facilitates the hospitals as well as
doctors but also facilitates the government sector to make critical decisions; our
project shall help them in decision making.
Health care has become one of the most important service industries that are
undergoing rapid structural transformations. Healthcare remains a paper-intensive and
minimally automated and digitized industry. CBS market watch reported that an
estimated 90% of all patient information remains on paper. Also, worldwide the
healthcare industry is looking for technology that can aid with establishing on-line
clinical repositories enabling rapid access to shared information that can help find
cures for prevalent medical conditions.
2.2 Product Functions
A PIMS based on Data warehouse system that not only contains the historical data but
also helps the specific users in decision making.
It helps users to take decisions on healthcare department.
It helps the users in making decisions on patient related cities to focus on them
for disease cure purpose.
System not only provides graphical reports but also provide drill down tabular
reports to help understand the healthcare issues.
System not only calculates the spreading of disease in required cities but also
calculate it with respect to time (containing year, quarter, month, day as well)
of it.
System tells the doctors to focus on such age patients having required disease.
System also tells the user that they generate how much revenue in a year,
quarter, month, day, hour, min and sec from which patient.
System not only inform doctors that how much patients they are handling but
also tell the doctors about their specific patient’s gender and their previous
reports and there disease status and so on.
23. PIMS Data Warehouse
23
2.3 Proposed Architecture
2.3.1 Basics of Data Warehouse and ETL
2.3.1.1 What is a Data Warehouse?
A common way of introducing data warehousing is to refer to the characteristics of a
data warehouse as set forth:
Subject Oriented
Integrated
Nonvolatile
Time Variant
2.3.1.2 Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more
about your company’s sales data, you can build a warehouse that concentrates on
sales. Using this warehouse, you can answer questions like "Who was our best
customer for this item last year?" This ability to define a data warehouse by subject
matter, sales in this case makes the data warehouse subject oriented.
2.3.1.3 Integrated
Integration is closely related to subject orientation. Data warehouses must put data
from disparate sources into a consistent format. They must resolve such problems as
naming conflicts and inconsistencies among units of measure. When they achieve
this, they are said to be integrated.
2.3.1.4 Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change.
This is logical because the purpose of a warehouse is to enable you to analyze what
has occurred.
2.3.1.5 Time Variant
In order to discover trends in business, analysts need large amounts of data. This is
very much in contrast to online transaction processing (OLTP) systems, where
performance requirements demand that historical data be moved to an archive. A data
warehouse’s focus on change over time is what is meant by the term time variant.
24. PIMS Data Warehouse
24
2.3.1.6 Contrasting OLTP and Data Warehousing Environments
Figure 2 Contrasting OLTP and Data Warehousing Environments
Data warehouses and OLTP systems have very different requirements. Here are some
examples of differences between typical data warehouses and OLTP systems:
2.3.1.7 Workload
Data warehouses are designed to accommodate ad hoc queries. You might not know
the workload of your data warehouse in advance, so a data warehouse should be
optimized to perform well for a wide variety of possible query operations.
OLTP systems support only predefined operations. Your applications might be
specifically tuned or designed to support only these operations.
2.3.1.8 Data modifications
A data warehouse is updated on a regular basis by the ETL process (run nightly,
weekly monthly, or yearly) using bulk data modification techniques. The end users of
a data warehouse do not directly update the data warehouse.
In OLTP systems, end users routinely issue individual data modification statements to
the database. The OLTP database is always up to date, and reflects the current state of
each business transaction.
2.3.1.9 Schema design
Data warehouses often use denormalized or partially denormalized schemas (such as a
star schema) to optimize query performance.
OLTP systems often use fully normalized schemas to optimize update/insert/delete
performance, and to guarantee data consistency.
25. PIMS Data Warehouse
25
2.3.1.10 Typical operations
A typical data warehouse query scans thousands or millions of rows. For example,
"Find the total sales for all customers last month."
A typical OLTP operation accesses only a handful of records. For example, "Retrieve
the current order for this customer."
2.3.1.11 Historical data
Data warehouses usually store many months or years of data. This is to support
historical analysis.
OLTP systems usually store data from only a few weeks or months. The OLTP
system stores only historical data as needed to successfully meet the requirements of
the current transaction.
2.3.2 Data Warehouse Architectures
Data warehouses and their architectures vary depending upon the specifics of an
organization's situation. Three common architectures are:
Data Warehouse Architecture (Basic).
Data Warehouse Architecture (with a Staging Area).
Data Warehouse Architecture (with a Staging Area and Data Marts).
2.3.3 Basic Data Warehouse Architecture
Figure 3 Basic Data Warehouse Architecture
26. PIMS Data Warehouse
26
2.3.4 Data Warehouse Architecture with Staging Area
Figure 4 Data Warehouse Architecture with Staging Area
2.3.5 Data Warehouse Architecture with Staging Area and Data Marts
Figure 5 Data Warehouse Architecture with Staging Area and Data
Marts
27. PIMS Data Warehouse
27
2.3.6 Data Warehouse Modeling
One question that very often arises at my data warehousing presentations is: Which
data modeling tool is best for data warehousing? The answer is simple: your brain.
While all the various data modeling tools have their pros and cons, none of them is so
intrinsically better than the rest for data warehousing as to rate a recommendation. For
example, none of the current data modeling tools cleanly diagrams or records any
meta-data regarding how facts and aggregates might use partitioning and/or
materialized views. For data warehousing, the physical data model is useful merely as
a roadmap for the ETL programmers. The real physical object implementation is far
too complex for modeling tools to handle.
Some basic steps for transforming an OLTP model into a star schema design are:
DE normalizes lookup relationships.
DE normalizes parent/child relationships.
Create and populate a time dimension.
Create hierarchies of data within dimensions.
Consider using surrogate or meaningless keys.
In dimensional modeling of a Data Warehouse, there are generally only two kinds of
tables:
2.3.6.1 Dimensions
Dimensions are relatively small, DE normalized lookup tables containing business
descriptive columns that end-users reference to define their restriction criteria for ad-
hoc business intelligence queries.
2.3.6.2 Facts
Facts are extremely large tables whose primary keys are formed from the
concatenation of all the columns that are foreign keys referencing related dimension
tables. Facts also possess numerically additive, non-key columns utilized to satisfy
calculations required by end-user ad-hoc business intelligence queries. The key point
is that to be successful, fact table implementations must accommodate the different
requirements.
2.3.7 ETL Operation
ETL involves three (3) major operations:
Data Extraction.
Transformation.
Loading.
28. PIMS Data Warehouse
28
Extraction Transformation
&
Schema Design
Loading
OLTP System Data
Warehouse
Figure 6 ETL Operation
2.3.7.1 Data Extraction
Extraction is the operation of extracting data from a source system for further use in a
data warehouse environment. This is the first step of the ETL process. After the
extraction, this data can be transformed and loaded into the data warehouse.
The source systems for a data warehouse are typically transaction processing
applications. For example, one of the source systems for a sales analysis data
warehouse might be an order entry system that records all of the current order
activities.
Designing and creating the extraction process is often one of the most time consuming
tasks in the ETL process and, indeed, in the entire data warehousing process. The
source systems might be very complex and poorly documented, and thus determining
which data needs to be extracted can be difficult. The data has to be extracted
normally not only once, but several times in a periodic manner to supply all changed
data to the warehouse and keep it up-to-date. Moreover, the source system typically
cannot be modified, nor can its performance or availability be adjusted, to
accommodate the needs of the data warehouse extraction process.
These are important considerations for extraction and ETL in general. It assumes that
the data warehouse team has already identified the data that shall be extracted, and
discusses common techniques used for extracting data from source databases.
Designing this process means making decisions about the following two main aspects:
Which extraction method do I choose?
This influences the source system, the transportation process, and the time
needed for refreshing the warehouse.
How do I provide the extracted data for further processing?
This influences the transportation method, and the need for cleaning and
transforming the data.
2.3.7.2 Data Transformation
Data transformations are often the most complex and, in terms of processing time, the
most costly part of the ETL process. They can range from simple data conversions to
29. PIMS Data Warehouse
29
extremely complex data scrubbing techniques. Many, if not all, data transformations
can occur within a database, although transformations are often implemented outside
of the database (for example, on flat files) as well.
2.3.7.3 Data Loading
Data is loaded into a data warehouse in two fundamental ways: a record at a time
through a language interface or en masse with a utility. As a rule, loading data by
means of a utility is much faster. In addition, indexes must be efficiently loaded at the
same time the data is loaded. In some cases, the loading of the indexes may be
deferred in order to spread the workload evenly.
As the burden of the volume of loading becomes an issue, the load is often
parallelized. When this happens, the data being loaded is divided into one of several
job streams. Once the input data is divided, each job stream is executed independently
of the other job streams. In doing so, the elapsed time needed for loading is reduced
by the number of job streams (roughly speaking).
Another related approach to the efficient loading of very large amounts of data is
staging the data prior to loading. As a rule, large amounts of data are gathered into a
buffer area before being processed by extract/transfer/load (ETL) software. The
staged data is merged, perhaps edited, summarized, and so forth, before it passes into
the ETL layer. Staging of data is needed only where the amount of data is large and
the complexity of processing is high.
2.3.7.4 Data Profiling
Data profiling is not a glamorous task. It is also not something that you can do once
and forget about it. Proper data profiling methodology must become a standard part of
both your business and IT infrastructure to allow you to diagnose the health of your
systems.
Today, many organizations attempt to conduct data profiling tasks manually. With
very few columns and minimal rows to profile, this may be practical. But
organizations today have thousands of columns and millions (or billions) of records.
Profiling this data manually would require an inordinate amount of human
intervention that would still be error-prone and subjective.
In practice, your organization needs a data profiling tool that can automatically
process data from many data source and process hundreds or thousands of columns
across many data sources. Data profiling in practice consists of three distinct phases:
Initial profiling and data assessment.
Integration of profiling into automated processes.
Handoff of profiling results to data quality and data integration processes.
30. PIMS Data Warehouse
30
The most effective data management tools can address all of these initiatives. Data
analysis reporting alone is just a small part of your overall data initiative. The results
from data profiling serve as the foundation for data quality and data integration
initiatives. Look for a data profiling solution that allows you to construct data
correction, validation and verification routines directly from the profiling reports. This
shall help you combine the effort of data inspection and correction phases, helping to
streamline your data management process.
2.3.7.5 Assumptions and Dependencies
2.3.7.5.1 Assumptions
The software can be used round the clock.
The requirements can be change with time.
2.3.7.5.2 Dependencies
User should have domain knowledge of computers.
DWH should be maintained with time to time
2.3.7.6 Project Deliverables
Executable in running condition
Detailed Final Draft (Report)
2.3.7.7 Operating Environment
2.3.7.7.1 Software
Operating system: Windows Xp/7
Dream viewer, Visual studio
Backhand Data Mart
2.3.7.7.2 Web services
Opera version
Mozilla Firefox
Microsoft Internet Explorer version
Apple Safari version
Google Chrome version
32. PIMS Data Warehouse
32
3 Requirements Analysis
The analysis phase defines the requirements of the system, independent of how these
requirements shall be accomplished. This phase defines the problem that the end user
is trying to solve. The deliverable result at the end of this phase is a requirement
document. Ideally, this document states in a clear and precise fashion what is to be
built. This analysis represents the ``what'' phase. The requirement document tries to
capture the requirements from the end user’s perspective by defining goals and
interactions at a level removed from the implementation details.
3.1 Project Overview
The product shall provide following functionality regarding ETL Operations:
3.1.1 Data Profiling
Analyzing the column properties.
Analyzing the relationships.
3.1.2 Warehouse Schema Generation
The user shall provide an OLTP source system and shall generate a target warehouse
schema shall be a star schema
3.1.3 Data Extraction
The data from the OLTP system shall be extracted into files.
Extracted data shall be further processed before loading into the warehouse.
3.1.4 Data Transformation
The extracted data is raw and cannot be placed in the data warehouse without
enriching it.
The extracted data shall be processed within the staging area according to the
required format.
Its quality shall be improved, and
Shall be made ready to be loaded into the data warehouse.
3.1.5 Data Loading
This process again is quite cumbersome and shall require special techniques and
methods so that all the records are applied successfully to the data warehouse.
The data prepared after transformation shall be applied to the data warehouse
database and shall be stored there.
Load images are created to correspond to the target files to be loaded in the
data warehouse database.
33. PIMS Data Warehouse
33
Mapping functions shall be provided, which shall map the source system
records to the target warehouse
3.2 Functional Requirements
Following are some basic requirements described briefly.
3.2.1 Data Profiling
PIMS.DP.F.0010:
Identify the Data Type of the columns.
PIMS.DP.F.0020:
Identify the Maximum Length of the columns.
PIMS.DP.F.0030:
Identify the Null Rule of the columns.
PIMS.DP.F.0040:
Identify the Unique Rule of the columns.
PIMS.DP.F.0050:
Identify the relationships between the Tables.
3.2.2 Warehouse Schema Generation
PIMS.WSG.F.0010:
Support for a Star schema (is a logical structure that has a fact table in the center,
surrounded by dimension tables) shall be provided for the warehouse schema design.
PIMS.WSG.F.0020:
Functions shall be provided so that the user can transform the source data into the
target system
PIMS.WSG.F.0030:
Generated schema shall be implemented in Microsoft Visual Studio R2 as warehouse
objects such as facts, dimension tables.
3.2.3 Data Extraction
PIMS.DE.F.0010:
The source systems for a data warehouse are typically OLTP system which shall be
Microsoft Visual Studio R2.
34. PIMS Data Warehouse
34
PIMS.DE.F.0020:
The data from the OLTP system shall be extracted into files.
PIMS.DE.F.0030:
Data has to be extracted for each incremental load as well as for one time initial full
load.
PIMS.DE.F.0040:
Extracted data shall be further processed before loading into the warehouse.
3.2.4 Data Transformation
PIMS.DT.F.0010:
Quality of the data is to be improved.
PIMS.DT.F.0020:
Transformation functions provided shall transform the data format.
PIMS.DT.F.0030:
Standardization of data for different sources shall be done.
PIMS.DT.F.0040:
Selection takes place at the beginning of the whole process of data transformation.
Either whole records or parts of several records can be selected from the source
system.
PIMS.DT.F.0050:
Splitting/ Joining includes the types of data manipulation needed to perform on the
selected parts of the source records.
PIMS.F.F.0060:
Conversion includes all wide variety of rudimentary conversions of single fields for
two primary reasons, one to standardize among the data extraction from different
sources, another to make the fields usable and understandable to the user.
PIMS.DT.F.0070:
For summarization, the transformation function is used for summarizing the facts.
PIMS.DT.F.0080:
Enrichment is the rearrangement and simplification of individual fields to make them
more useful for the data.
35. PIMS Data Warehouse
35
3.2.5 Data Loading
PIMS.DL.F.0010:
Load images are created to correspond to the target files to be loaded in the data
warehouse database.
PIMS.DL.F.0020:
Identification of source system’s table fields mapping to warehouse table.
PIMS.DL.F.0030:
Loading must be efficient.
3.3 Nonfunctional Requirements
3.3.1 Performance Requirements
This shall be very important system being used in the development cycle of ETL so it
must be efficient. As it shall be communicating with the SQL server r2 and heavy
resources are required for processing so it must utilize the hardware resources
optimally.
3.3.2 Safety Requirements
The process of testing a test case may take very long time and it is important to keep
the track of intermediate results so in case of any failure the work already done is not
lost.
3.3.3 Reliability Requirements
The process involves different phases and the data after each phase should be secure
and reliable.
3.4 External Interface Requirements
3.4.1 User Interface
The Users of this application shall be professionals as well as normal users so the user
interface has to be comprehensive that provides the required user the ability to control
everything and extract the required information easily and as quickly as possible.
3.4.2 Hardware Resources
The application requires heavy processing resources. Latest hardware resources are
required for the efficient and effective working of the application.
36. PIMS Data Warehouse
36
3.4.3 Hardware Interfaces
The software shall not interact with any hardware; it shall only use the operating
systems services for connecting to the SQL Database.
3.5 Use Case Specifications
3.5.1 Connect RDBMS User
Pre-Condition: Required Schema and User must Exist
Description: Connect to an existing schema using a user name and
password.
Actor: User
Success Scenario: User provides the user name and password.
User is connected to the schema with the user name provided.
User can view the required details of the schema.
Alternate Scenarios: None
Post-Condition: The details of the RDBMS should be loaded for the viewing
purpose.
3.5.2 Connect Data Warehouse Schema
Pre-Condition: A proper Data Warehouse schema should exist.
Description: Open a previously created Data Warehouse with the provide
user name and password.
Actor: User
Success Scenario: User provides the user name and password.
User is connected to the schema with the user name provided.
User can view the details of the schema.
37. PIMS Data Warehouse
37
Alternate Scenarios: None
Post-Condition: The details of the Data Warehouse should be loaded.
3.5.3 Connect database User
Pre-Condition: The user and the schema to be connected must exist
Description: Connect to an existing database user with required schema.
Actor: User
Success Scenario: User should be connected to the required schema with the user
rights assigned to the user.
Alternate Scenarios: None.
Post-Condition: The details of the schema are populated for the connected user
to further operations required.
3.5.4 Load Relational Database Model
Pre-Condition: Relational Database model must exist.
User must have the rights to connect to the database.
Description: Load an existing RDBMS for the required operations.
Actor: User
Success Scenario: User connects the relational database by specifying the user
name and password.
The entire ERD Model of the RDBMS is loaded.
Relationships between these tables are populated.
User can view the details of the tables.
38. PIMS Data Warehouse
38
User can perform the desired operations.
Alternate Scenarios: None
Post-Condition: None
3.5.5 Identify Table Names
Pre-Condition: Database must exist.
The tables must exist in the database.
Description: Identify all the tables in the database.
Actor: User
Success Scenario: User selects to view the tables of the Database.
The entire table names in the database are loaded after
transformation.
Alternate Scenarios: None
Post-Condition: None
3.5.6 Identify Columns with Data Types
Pre-Condition: The tables must be specified for which the columns are to be
populated.
Description: Identify column names with data type of each columns column
for the specified table names.
Actor: User
Success Scenario: User selects to view the complete database.
39. PIMS Data Warehouse
39
All the required table columns names and their data types are
populate.
Alternate Scenarios: None.
Post-Condition: None.
3.5.7 Identify Relationships between Tables
Pre-Condition: All The table name and column name for each table must be
populated to identify the relationships.
Description: Identify relationships between the tables.
Actor: User
Success Scenario: User selects to view the complete database design.
All the relationships are loaded with the details of the Primary
Keys and Foreign Keys of the complete database.
Alternate Scenarios: None.
Post-Condition: None.
3.5.8 Load Warehouse Schema
Pre-Condition: The Warehouse Schema must be created before and the fact
and dimension tables must exist.
Description: Load an existing Data Warehouse Schema for cubes and
reports.
Actor: User
Success Scenario: User loads the existing data warehouse schema.
40. PIMS Data Warehouse
40
The details of the schema are loaded.
The Details are displayed to the user.
Alternate Scenarios: None.
Post-Condition: None.
3.5.9 Map Columns
Pre-Condition: A complete Data Warehouse schema must exist with all the
facts and dimensions properly defined.
Description: The Mapping of columns between the columns of Data
Warehouse Facts/Dimensions and the columns of Tables from
an RDBMS.
Actor: User
Success Scenario: User maps each column of the facts and dimension tables to
the relational database model columns as required.
The mapping between the columns is maintained for the ETL
operations.
All the mappings are stored in the file.
Alternate Scenarios: Load an existing Data Warehouse schema with the already
defined mappings.
Post-Condition: None.
41. PIMS Data Warehouse
41
3.5.10 Extract Data from RDBMS
Pre-Condition: All the mappings should be done for the effective and efficient
load.
Description: Extract Data from the Database to be loaded in the Data
Warehouse.
Actor: User
Success Scenario: User selects to extract the data from the source system.
The extracted data is to be extracted in accordance with the
target system.
The extracted data is saved in the SQL server 2008.
On the Completion of extractions of data from the source, the
data is ready for the transformation or loading.
Alternate Scenarios: None.
Post-Condition: None.
3.5.11 Transform Extracted Data
Pre-Condition: All the data must be extracted from the source system.
Description: Transform the extracted data, as required, before loading the
data into the target Warehouse.
Actors: User.
Success Scenario: Select the Transformation functions required to be performed
be loading the data into the Data Warehouse.
User specifies the transformation functions to be performed.
User selects to apply the transformation functions.
42. PIMS Data Warehouse
42
Alternate Scenarios: If user does not provide any transformation function the data
shall be loaded without any changes in some cases.
Post-Condition: None.
3.5.12 Load Data in Warehouse
Pre-Condition: All the data to be loaded should be maintained in the SQL
server 2008.
Description: Loading the Extracted Data from different ODS’s to the Target
Data Warehouse.
Actor: User
Success Scenario: User selects to load the data into the target system.
The transformations to be performed are applied to the
extracted data.
After the successful transformation of data the data is loaded
to the target data warehouse.
All the changes are saved.
Alternate Scenarios: None
Post-Condition: Prompt for the Successful ETL.
44. PIMS Data Warehouse
44
4 The Design
In the design phase the architecture is established. This phase starts with the
requirement document delivered by the requirement phase and maps the requirements
into architecture. The architecture defines the components, their interfaces and
behaviors. The deliverable design document is the architecture. The design document
describes a plan to implement the requirements. This phase represents the ``how''
phase. Details on computer programming languages and environments, machines,
packages, application architecture, distributed architecture layering, memory size,
platform, algorithms, data structures, global type definitions, interfaces, and many
other engineering details are established. The design may include the usage of
existing components.
4.1 Modules
The current system can easily be divided into four modules. These modules are quite
independent from each other and have very simple and well defined interface between
each other. These things play a very important role in the successful working of the
modules. Basic modules in our system are as follows.
4.1.1 Connectivity
The functionality of this module shall be to provide the communication between
Database and its client. Connectivity shall provide the interface for communication
for providing full operational environment in order to achieve the efficient interaction
within the system.
4.1.2 RDBMS Details
The detailed schema of the relational database model shall be loaded with the help of
this module. This module loads:
Table Name
Columns Names
Data Types
Constraints
Relationships
Path Details of Database Traversals.
4.1.3 Schema Generation
This module shall provide its users a very easy way to create the schema of the Data
Warehouse. Schema Generator provides the functionality for defining:
Facts
Dimensions
45. PIMS Data Warehouse
45
4.1.4 Column Mappings
This module shall provide the functionality to map the columns from relational
database model with the Data Warehouse Facts or Dimension Tables. The
modifications in the columns shall also be managed in this module.
4.1.5 Extraction
The procedure for the data extraction shall be defined with the help of the provided
mappings. This module shall extract the data from the relational database in
accordance with the defined procedures of the effective extractions using the
mappings.
4.1.6 Transformation
Different Transformation functions shall be provided. The user shall select the type of
transformations and the desired output of the transformation. These transformations
procedures shall be maintained for the preload transformations.
4.1.7 Loading
Loading mechanism shall be defined to incorporate the defined transformations. The
data shall be transformed and loaded in the single step. Initial the data shall be
transformed according to the specified functions and the then shall be loaded to the
designed Data Warehouse schema.
53. PIMS Data Warehouse
51
5.3 Component Diagram
There shall be four different data base components from which data shall be extracted,
transformed, loaded (ETL) into single component of data warehouse (data mart).
ODB-1 ODB-2
ETL
ODB-3 ODB-4
Data Warehouse (Data Mart)
Figure 13 Component Diagram
54. PIMS Data Warehouse
52
5.4 Deployment Diagram
Deployment shall be divided into four levels Database Server maintain SQL Server then Data
Warehouse Server maintain Data Warehouse SQL Server then Application Server maintain Web
Application and Client Workstation that can view Web Application through Interface.
Database Server
Data Warehouse Server
Application Server
Client Work Station
Interface
Web Application
Web ApplicationVisual Studio
SQL Server
SQL Server
Maintain Database
Maintain Data
Warehouse
Deployment Diagram
Figure 14 Deployment Diagram
55. PIMS Data Warehouse
53
5.5 Composite Structure Diagram
Administrator send request to reporting manager then request goes to SQL server to ETL tool
to report tools to query manager. In provided interface report shown.
Report
Report Report Tools
Query
Manager
DWH
Manager
ETL
Tool
SQL
Server
Reporting
Manager
Administrator
Checking
Design and
Structure
Figure 15 Composite Structure Diagram
56. PIMS Data Warehouse
54
5.6 Package diagram
User through interface either registers or login after successful login view report with help of
data mining from OLTP process.
Data Mart (PIMS) store all information about patient which came through ETL process from
four different Operational Data Stores.
Application
Login
Regsiter
User
Interface
ReportData Mining
OLAP
Data
Warehouse
Data Mart
(PIMS)
ETL
Operational
Data Store-1
Operational
Data Store-2
Operational
Data Store-3
Operational
Data Store-4
Figure 16 Package diagram
59. PIMS Data Warehouse
57
6.2 Activity Diagram
6.2.1 Create new project
Create New Project New ConnectionUser Open Connection
ODS's
Set project information
[No]
[Yes]
Set global valuesLoad Project
Figure 18 Activity Diagram (Create new project)
6.2.2 Open existing project
Open project Check connectionsUser Open Connection
ODS's
Set project information
[No]
[Yes]
Set global valuesLoad Project
Figure 19 Activity Diagram (Open existing project)
60. PIMS Data Warehouse
58
6.2.3 Close project
Open project Finalize changesUser Save changes
Commit changes
Close connection
[No]
Changes savedClosed
[Yes]
Figure 20 Close project
6.2.4 Create mapping
Create mappings New mappingUser Get source details
Get target details
Provide mapping
[No]
[Yes] Validate mappings
Finalize mappings
Figure 21 Activity Diagram (Create mapping)
62. PIMS Data Warehouse
60
6.3 State Machine diagram
6.3.1 Report
ETL
Query to generate report Process required data Send data
Apply defined rulesGenerate report
[No]
[Yes]
Figure 23 State Machine Diagram (Report)
63. PIMS Data Warehouse
61
6.3.2 ETL
ETL
Load Project get information check meta data
Perform ETLExtract
Transform
Load
DWH
Figure 24 State Machine Diagram (ETL)
65. PIMS Data Warehouse
63
7 UML Interaction Diagrams
7.1 Sequence Diagram
A sequence diagram shows an interaction arranged in time sequence. In particular, it shows the
instances participating in the interaction by their “lifelines” and the stimuli they exchange
arranged in time sequence
7.1.1 Create New Project
MainForm NewProject Connect
CreateNewProject
NewConnection
SetProjectInfo
SetGlobalValues
LoadProject
OpenConnection
OpenedConnection
Figure 25 Sequence Diagram (Create New Project)
75. PIMS Data Warehouse
73
7.1.9 Report Generation
Doctor shall select specific criteria to generate report and send to data warehouse manager and
it shall give some acknowledgement and send request to report generation tool and it shall
generate report according to criteria set by doctor.
Web Interface Data Warehouse Manager
Send Report
Doctor
Report Generation Tool
Criteria to generate report
Process required data
Send data
Apply defined rules
Acknowledge
Figure 34 Sequence Diagram (Report Generation)
76. PIMS Data Warehouse
74
7.1.10 ETL
Data came from operational data stores that shall be extracted by extract manager then sent to
transform manager that set data into standard format then to cleaning manager to get useful
information then to load manager and finally it shall be loaded to data warehouse.
ODS Extract Manager Transform Manager Cleaning Manager Load Manager Data Warehouse
Req()
Get_data()
Extracted Data
Transformed Data
Clean Data
Load to DWH
Extract_data()
Transform Data
Clean()
Figure 35 Sequence Diagram (ETL)
77. PIMS Data Warehouse
75
7.2 Communication Diagram
7.2.1 ETL
Data source respond to ETL request then it shall extract data with help of extract manager then
sent to transform manager that set data into standard format then to cleaning manager to get
useful information then to load manager and finally it shall be loaded to data warehouse.
ETL Tool 2.Extract()
Extract
Manager 3.Transform
Data
Request()
Data Source
1.Respond to Request
4.C
leaning
D
ata
Transform
Manager
Clean
Manager
Load
Manager
5.Load Data
Data
Warehouse
Figure 36 Communication Diagram (ETL)
7.2.2 Report
User request generate or view report depends on need of user the request shall be sent to
reporting tool to generate required report or view required report by reporting manager.
Reporting
Tool
1.GenerateReport
2.ViewReport
Report
Manager
2.1.1.See Specific
Report
1.1.Generate Required Report
2.1.View Required Report
Web
Application
User
User Requests
Request()
1.1.1. Report Generated
Figure 37 Communication Diagram (Report)
78. PIMS Data Warehouse
76
7.3 Interaction Overview
7.3.1 ETL
Extract manager extract data then sent to transform manager that set data into standard format
then to cleaning manager to get useful information then to load manager and finally it shall be
loaded to Data Warehouse.
Extract Manager
Transform
Manager
Cleaning
Manager
Load Manager
Data Ware
House
Figure 38 Interaction Overview (ETL)
7.3.2 Warehouse Interaction
Doctor view patient history through query manager analyzes patient history and reporting tool
suggests prescription depending on analysis of patient history.
Doctor
View Patient
History
Analyze Patient
History
Prescription
Query Manager
Reporting Tool
Figure 39 Interaction Overview (Warehouse Interaction)
79. PIMS Data Warehouse
77
Web Diagrams
7.3.3 Access Model
<<Navigation Class>>
User/Staff
<<Menu>>
User Menu
<<Guided Tour>>
SingIn Page <<Guided Tour>>
Contact Us
<<Guided Tour>>
About Us
<<Guided Tour>>
SignUp
<<Navigation Class>>
Staff
<<Navigation Class>>
User
Authorized Staff
Authorized User
<<Guided Tour>>
Home Page
<<Menu>>
Staff Menu
<<Menu>>
User Menu
<<Navigation Class>>
Staff register member
<<Navigation Class>>
Staff upload Report
<<Navigation Class>>
Staff/User View report
Staff
Staff
User
Staff
<<Guided Tour>>
SignOut
<<Navigation Class>>
Staff/User Change
Password
Staff
User
Figure 40 Web Diagrams (Access Model)
81. PIMS Data Warehouse
79
8 Implementation
8.1 System Implementation
In this chapter our project detail is discussed and the implementation of the patient information
and monitoring system based on data warehouse. The implementation of the system is done in
two parts:
Back end software SQL server r2
GUI
8.2 Back end software SQL server r2
Back end is also subdivided into 2parts, namely
SSIS
SSAS
8.2.1 Snowflake Schema
Figure 41 Snowflake Schema
82. PIMS Data Warehouse
80
8.2.2 ETL SSIS
First comes the ETL part for that we have used SQL server Integration services (SSIS). When
data is required to be loaded in data warehouse regularly, so that it can serve its purpose of
facilitating business analysis. To do this, data from one or more operational systems needs to be
extracted and copied into the warehouse. The process of extracting data from source systems and
bringing it into the data warehouse is commonly called ETL, which stands for extraction,
transformation, and loading. The acronym ETL is perhaps too simplistic, because it omits the
transportation phase and implies that each of the other phases of the process is distinct. We refer
to the entire process, including data loading, as ETL. The ETL refers to a broad process, and not
three well-defined steps.
What happens during the ETL process? During extraction, the desired data is identified and
extracted from many different sources, including database systems and applications. Very often,
it is not possible to identify the specific subset of interest; therefore more data than necessary has
to be extracted, so the identification of the relevant data shall be done at a later point in time.
Depending on the source system's capabilities (for example, operating system resources), some
transformations may take place during this extraction process. The size of the extracted data
varies from hundreds of kilobytes up to gigabytes, depending on the source system and the
business situation. The same is true for the time delta between two (logically) identical
extractions: the time span may vary between days/hours and minutes to near real-time. Web
server log files for example can easily become hundreds of megabytes in a very short period of
time.
8.2.3 Overview of PIMS ETL
Figure 42 Overview of PIMS ETL