A case study showing how to approach a basic scheduling problem within the operations research field. More info: https://www.researchgate.net/publication/275097742_Visitation_time_scheduling
Fostering Friendships - Enhancing Social Bonds in the Classroom
Visitation time scheduling
1. Visitation time
scheduling
Alfonso de la Fuente Ruiz
2013
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
2. Content index
Scenario
The O.R. Problem
Initial considerations
First approach: Microsoft Excel
Importing data from CSV into MS Excel
Exploring the dataset
Data order by client
Vouching for data validity
Alternatives and decision making
Coding software and choosing tools
Microsoft Excel Macros
Open Office Suite: Calc
Structured Query Language
Open Office Suite: Base
Visual Studio Express
Oracle and PL/SQL
Using Transact-SQL in
Microsoft SQL Server 2k+
Cleaning the data
Pseudocode for data cleaning
Result after data cleaning
PERT and GANTT
Scheduling schemes
Scheduling scheme chosen
Coding the scheme
Reporting output
ACID Compliant DBMS
ACID Compliancy in MS SQL Server (I)
ACID Compliancy in MS SQL Server (II)
ACID Compliancy in MS SQL Server (III)
Database design: a bird´s eye view
Database normalization
Database map, visually
Database map: Table definition
Database map: Procedures and functions
References
Conclusion
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
3. Scenario:
The small test project that was asked to be
prepared is described in a PDF file
(Portable Document Format) and the data
required is in a CSV file (Comma Separated
Values).
One natural week was given to find a
solution and to prepare a presentation that
was to be shown remotely to the UK.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
4. The O.R. problem
The problem, from the Operational Research perspective, constitutes a very
simple case of “visitation time scheduling” with multiple clients and a single
server which can attend only one petition at a time.
Therefore, a number of solution schemes are readily available, such as First-Come
First Served, Priority Queues, Gantt techniques and others.
The difficulty of the problem seems to root not in the complexity of the algorithm
coding stage, but in the data formatting stage (both for input and output) and at
the database design stage.
The precise software tools to be used were left unspecified, so a large number of
alternatives are all posible choices. SQL Server and PostgreSQL were suggested.
In our approach, we firstly will use Microsoft Excel in order to study the data and
to perform basic filtering, after which we will consider a number of solutions from
the software market.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
5. Initial considerations
This problem constitutes a typical Computer Science Project for Business or
Engineering students during their first years at the university.
The students will usually be asked to solve this kind of problem during one term,
having a couple of months (up to a semester depending upon academic pressure
considerations) to solve it and to prepare a written Project alone or in small
teams, to be handed-in at the end of it.
The preparation of the Project case involves careful design considerations, ranging
from plagiarism avoidance to speeding up marking processes and exception
control.
This kind of knowledge can also come in handy for real business applications at the
SAME (Small And Medium-sized Enterprise) level or larger.
In most scenarios, just a subset of the information contained in these pages will be
documented and presented to students or staff personnel so to avoid informational
saturation and to enhance operational understanding.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
6. First approach: Microsoft Excel
Since SQL Server was the first option
suggested, and there exists a very popular
software package from Microsoft in the
market (MSSQLS), in our first approach,
we load the CSV data file in Microsoft
Excel (2013 Spanish version) to have a
look from it.
In order to do so, we need to import the
data from the file, using the
“Data/Import/From textfile…” feature.
There we will select the “simple.csv” file
and to follow the assistant.
In the assistant wizard window we select
delimitated data file type, with headers,
Windows (ANSI) file origins, “Comma” (,)
as the separator character, and “General”
data type for every column so that Excel
autodetects it.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
7. Importing data from CSV into MS Excel
As a result, we obtain a set of
columns where the headers can show
the “autofilter” option which we
often utilize to order alphabetically or
numerically.
Here we ordered the data by the
“datetime_from” field, so that we
can observe the information and
assume some hypothesis over the
contents.
We can easily observe several types of
plausible anomalies in the data which
force us to take some decision-taking
at design time.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
8. Exploring the dataset
At this point, we depict the problem on a paper
sheet to gain further insight before moving on to
the software tools.
There we get some data schemes and timetabling
that will be commented upon further on.
Among other stuff, we observe that the total time
for all visitations does not exceed the total time
available for service, under any set of assumptions,
which is a good sign, for it means that we will be
able to deal with the service without overbooking.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
9. Data order by client
We now apply a second ordering to the data over the client_id field.
We name the rows as c#t#, where hashes represent client number and task
number for that particular client.
Therefore we obtain the following set:
{c1t1,c1t2,c1t3,c1t4 ; c2t1,c2t2,c2t3,c2t4 ; c3t1 ; c4t1,c4t2,c4t3}
id client_id datetime_from datetime_to Name Rep? Inv? >24h?
1 1 2013-01-01 09:00 2013-01-01 10:00 gary doades 0 0 0,00
8 1 2013-01-01 09:01 2013-01-01 09:00 gary doades 0 1 0,00
3 1 2013-01-01 09:45 2013-01-01 10:45 gary doades 0 0 0,00
6 1 2013-01-01 12:00 2013-01-01 12:30 gary doades 0 0 0,00
4 2 2013-01-01 23:00 2013-01-02 06:00 richard ward 0 0 1,00
5 2 2013-01-02 04:00 2013-01-02 04:15 richard ward 0 0 0,00
10 2 2013-01-02 05:00 2013-01-02 06:00 richard ward 0 0 0,00
11 2 2013-02-30 01:00 2013-02-30 02:00 richard ward 0 0 #¡VALOR!
7 3 2013-01-01 01:00 2013-01-01 02:00 natasha lunt 0 0 0,00
2 4 2013-01-01 01:00 2013-01-01 01:01 olivia groom-smith 1 0 0,00
9 4 2013-01-01 01:00 2013-01-01 01:01 olivia groom-smith 0 0 0,00
12 4 2013-01-01 18:00 2013-01-02 19:00 olivia groom-smith 0 0 1,00
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
10. Vouching for data validity
In order to detect anomalies, we ordered the data by
“datetime_from” and then a few quick tests were
implemented in boolean logic:
REPETITION “Rep?”: IF(AND(C2=C3;D2=D3);1;0)
Briefly checks whether two visitation frames are
repeated in consecutive rows. Instances #2, #9 for
Olivia Groom-Smith are. Obviously not aplicable to the
last row.
INVERSION “Inv?”: IF([@[datetime_from]]>=[@[datetime_to]];1;0)
Checks whether the end time strictly happens after the
beginning. Instance #8 for Gary Doades does not.
MORE THAN ONE DAY “>24h?”:
=DAYS([@[datetime_to]];[@[datetime_from]])
Checks to see whether a visitation begins and ends in
different days. Instances #12, #4 do, where #12 lasts for
more than 24 hours and #4 does not (just 7 hours).
Instance #11 also returns an error code because the
date format is not correct, as February does not have
30 days.
id client_id datetime_from datetime_to Name Rep? Inv? >24h?
7 3
2013-01-01
01:00
2013-01-01
02:00 natasha lunt 0 0 0,00
2 4
2013-01-01
01:00
2013-01-01
01:01
olivia groom-
smith 1 0 0,00
9 4
2013-01-01
01:00
2013-01-01
01:01
olivia groom-
smith 0 0 0,00
1 1
2013-01-01
09:00
2013-01-01
10:00 gary doades 0 0 0,00
8 1
2013-01-01
09:01
2013-01-01
09:00 gary doades 0 1 0,00
3 1
2013-01-01
09:45
2013-01-01
10:45 gary doades 0 0 0,00
6 1
2013-01-01
12:00
2013-01-01
12:30 gary doades 0 0 0,00
12 4
2013-01-01
18:00
2013-01-02
19:00
olivia groom-
smith 0 0 1,00
4 2
2013-01-01
23:00
2013-01-02
06:00 richard ward 0 0 1,00
5 2
2013-01-02
04:00
2013-01-02
04:15 richard ward 0 0 0,00
10 2
2013-01-02
05:00
2013-01-02
06:00 richard ward 0 0 0,00
11 2
2013-02-30
01:00
2013-02-30
02:00 richard ward 0 0 #¡VALOR!
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
11. Alternatives and decision making
The first observation that we made is that these data show some conflicts that require decision-taking:
There are 4 clients (customers) and 12 tasks a priori
Task c1t2 defines a visitation to end before it begins. This could only be understood as a reverse visitation
(server visiting client) or as a quantum effect.
We assume that those two alternatives lie outside of the scope for the problem. Removed those, choice is to either
exchange times or to remove the reservation row
Some tasks already show overlap within the order given a priori, thus rearrangement is required, such as
c1t1 and c1t3
Task c2t1 occurs overnight, causing it to begin and end in different day dates.
Task c4t4 occurs in a different month than all other, being a possible outlier or mistaken data. Furthermore,
the date is not correct, since February cannot have 30 days.
The course of action here could either be to remove the whole row or to correct the month to January.
Since no certainty exists that this table must contain data from a single month, the whole row will be treated as invalid.
Client #3 has only one visitation task defined for her, being the only one with a single visitation
Tasks c4t1 and t4t2 are repeated, so one of them could be deleted or either they could be arranged by their
id number. Furthermore, they only last for one minute, being possibly outliers or mistakes.
Task c4t3 lasts for more than 24 hours, being a possible outlier of mistake. Thus, it also exhibits the outlook
of c2t1 because it occurs overnight.
The output will be an array set of, at most, max_id (12) elements from which conflicting rows are to be
deleted.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
12. Coding software and choosing tools
There is a large number of alternatives being readily available in the market
that provide the software framework needed to deal with these kinds of
problems.
Among them, we can name but just a few: Microsoft Excel Macros, MS SQL
Server, MS Visual Studio Express, MS Access, MS Project, Open Office Base,
MySQL, SAS (Statistical Software Analysis) GANTT module, Visual Basic,
MicroGPSS, FORTRAN, Borland C++, Delphi, Java, PHP,…
From here on we show a brief selection of choice among those tools. Usually
the decisión is taken out of convenience, with criteria such as availability
(having the software package already installed and configured on the
machine) but there exist multiple choices, all valid solutions.
Whenever posible, specialised freeware 4GT (Fourth Generation Techniques)
will be used, being generally considered cheaper, most efficient, optimizing
internal computations and of a higher abstraction level, thus greatly
simplifying coding operations.
Finally we will deal with the database design issue according to analogous
principles.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
13. Open Office Suite: Calc
In case no budget is allocated for software licensing,
universities and other organizations often make usage
of the OpenOffice suite for teaching and operational
applications.
Open Office offers a range of solutions, such as the
“Calc” spreadsheet program and the “Base” database
management program.
Here we can observe how, upon importing the data
into OpenOffice Calc in an analogous way as we did in
Excel, the wrong “February 30th” data is immediately
detected.
Some other tools (such as OpenProj) will
automatically detect a mistake in the data and assign
the next available date for the field (+2 days towards
March the 2nd).
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
14. Microsoft Excel Macros
Another option is to
record a macro from
Microsoft Excel.
In order to do so, we
need to activate the
“Developer” tab.
Recording a macro is a
straightforward process,
but the source code
syntax and aspects are
quite complex in case
we had to ammend
anything in the code.
To keep the code as
readable as possible,
we can use some other
mean.
The logical course of
action seems to be to
use SQL code in order to
get to the required
scheduling solution.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
15. Structured Query Language
Structured Query Language (SQL) code is the
market for solved these kinds of problems.
Therefore, some SQL programming expertise is
assumed in order to get a solution.
16. Open Office Suite: Base
Open Office Base can be used to process the data
and to query the table for the output requested,
in the same way that the Microsoft Access
software package would.
In OO Base, we can quickly create the table that
we need, with the advantage that it is open source
software and implements SQL.
To do so, we first need to specify the field names
and types. Finally we would need to populate it
with actual data from OO Calc or MS Excel.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
17. Visual Studio Express
One other Microsoft Tool that
can be used is Visual Studio
Express (demo available for
free download).
Here we can observe how
VSE also detects the
invalidity of one of the dates
(February the 30th).
Visual Studio Express can
also be used to process the
data and to query the table
for the output requested.
It also implements SQL and is
designed for seamless data
Exchange with Microsoft SQL
servers.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
18. Oracle and PL/SQL
Oracle is a very powerful tool that larger organizations, such as
city councils or international corporations use. It has its own
language extensión for database management: PL/SQL
PL/SQL stands for "Procedural Language Extensions to SQL."
PL/SQL extends SQL by adding programming structures and
subroutines available in any high-level language.
The syntax and capabilities are very similar to those in T-SQL
and other derivatives of standard SQL.
Many Oracle applications are built using client-server
architecture. The Oracle database resides on the server. The
program that makes requests against this database resides on
the client machine. This program can be written in C, Java, or
PL/SQL.
Because PL/SQL is just like any other programming language, it
has syntax and rules that determine how programming
statements work together. It is important for you to realize that
PL/SQL is not a stand-alone programming language. PL/SQL is a
part of the Oracle RDBMS, and it can reside in two
environments, the client and the server. As a result, it is very
easy to move PL/SQL modules between server-side and client-
side applications.
Oracle also supplies a reduced command-line SQL extension
called SQL+.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
19. Using Transact-SQL in
Microsoft SQL Server 2k+
Microsoft SQL server 2000 (and above) is one of the
most popular software tools used to solve these kind
of problems at the business level, wherever
encountering high numbers of tables and instances.
MSSQLS uses a powerful extension of standard SQL
originally developed by Sybase, called Transact-SQL.
T-SQL code can be bundled into a variety of software
applications: web pages, Visual Basic, Visual C# and so
on.
New MS SQL Server versions such as 2005 indeed work
with CSV files and are interoperable with all of Visual
Studio, MSExcel and MSProject features and
functionality.
MS SQL Server requires a moderate investment in
licensing.
To the right you can see an example (cfr. bib.) where
you can read how to use the ORDER and GROUP BY
statements in T-SQL to aggregate data.
For our exercise it constitutes a very useful tool to
design code that orders the preprocessed visitation
list by starting date and returns results ordered by
client, once a scheduling scheme has been agreed
upon and implemented.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
20. Cleaning the data
As we have observed several irregularities within the input data, we need to clean
those by deletion of all rows affected.
To do so, we can either use the built-in tools of the software package of our
choice, or to write-up some code to do it for us.
Given that the amount of instances (rows) in our table is very small, we choose to clean
it by hand (with the software packages built-in tools) with the target of speeding up the
process.
If the amount of instances was higher (say dozens, hundreds or even millions of
registers), we should necessarily code a clean-up routine for this task.
According to the validity analysis performed at a previous stage, and given the
time available and scope, we choose to simplify as much as posible by completely
removing any instances that show any of the following conflicts:
REPETITION: All reservations must be DISTINCT, so second and further identical
reservations are deleted. Only the one with the lowest id is kept.
INVERSION: Reservations with null or negative time lapses are deleted.
MORE THAN 24 HOURS SERVICE TIME: Reservations that span over more tan one day are
deleted only if the total service time is greater than 24 hours. Otherwise they are kept,
assuming they occur over a night shift. We will also keep visitations lasting for just one
minute, assuming they represent a quick status check.
WRONG DATA INPUT: Reservations with a wrong date or any other piece of data in any
field are deleted.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
21. Pseudocode for data cleaning
Since no tool was specified within the problem´s requests, having a wide range of options including
several variants and extensions of SQL, we will use pseudocode to show how to program the main
scheduling routine.
When later a tool has been chosen, we may easily translate this pseudocode into the grammar of the
language of choice, without any loss of generality.
We asume that a few simple subroutines are provided by the language for order, deletion and so on.
We asume ROWS (for short) is a table that is to contain the RESERVATIONS
ROWS := SELECT DISTINCT FROM RESERVATIONS Removes duplicates (but obviously for ‘rows.id’, the master key)
ORDER ROWS BY DATETIME_FROM Orders all rows by starting time
FOR ID IN ROWS LOOP: For every distinct row repeat:
IF DATE(ROWS[ID].DATETIME_FROM) < 0 All invalid dates should return a negative
THEN DELETE(ROWS[ID]) Cleans wrongly timestamped rows
IF ( DATE(ROWS[ID].DATETIME_FROM) >= DATE(ROWS[ID].DATETIME_TO) )
THEN DELETE(ROWS[ID]) Cleans rows with non positive visitation time spans
IF ( DAYS(ROWS[ID].DATETIME_TO - DATE(ROWS[ID].DATETIME_FROM ) >= 0
THEN DELETE(ROWS[ID]) Cleans rows with visitation lasting for one day or more.
END LOOP End of loop
COMMIT_WRITE(ROWS,RESERVATIONS) Replaces all initial rows with the result of this cleaning routine
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
22. Result after data cleaning
Subsets to be substracted:
Repetition candidate subsets: {c4t1,c4t2}. Choice subset: {c4t2}
Inversion: One negative time lapse {c1t2}
>24 hours: {c4t3}
Wrong input: date out of margins (February 30th) {c2t4}
Substraction set: {c1t2,c2t4,c4t2,c4t3}
We end up with 8 instances after cleaning:
{c1t1,c1t2,c1t3,c1t4, c2t1,c2t2,c2t3,c2t4, c3t1, c4t1,c4t2,c4t3}
– {c1t2,c2t4,c4t2,c4t3}
= {c1t1,c1t3,c1t4, c2t1,c2t2,c2t3, c3t1, c4t1}
Or, according to the master key “Reservation ‘id’”:
{1,8,3,6,4,5,10,11,7,2,9,12}
- {8,9,11,12}
= {7,2,1,3,6,4,5,10}
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
23. PERT and GANTT
Program Evaluation Review Techniques
(PERT) are a set of tools for Project
Management that are commonly use in
scheduling environments.
The most widely known of these is the
GANTT bar chart where we can define tasks
to be executed in parallel, serialized or with
interdependencies.
There are again a number of tools that can
read an input, generate a Gantt chart and
apply scheduling schemes to the data, such
as Microsoft Project, GanttProject, OpenProj
and several others. Or we can just use a
general purpose RDBMS with SQL.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
24. Scheduling schemes
After we cleaned the data, there are several issues come to our mind that we should consider to
deal with the scheduling of the visitations, of which we name but just a few among the most
relevant:
We could want all of the visitations to be scheduled as soon as posible.
The first visitation occurs at 9:00 am, so we could schedule all of the reservations to be atended
only during office hours.
We could also want to add breaks for meals, resting times, service maintenance or other
managerial reasons. We asume none.
Some visitations occur overnight, so we can decide to schedule all visitations anytime during the
day and over night
We could want to reschedule as few reservations as posible, or to have all visitations for the
same client being served together, one right after another, so that each client came only once.
We could want to simplify:
To consider the earliest reservation starting time as the beginning and then queue all others right
behind according: first, to their starting time, and second (if there were more tan one) by other
criteria
Other possible criteria are: visitation duration, client id, alphabetical by name, or any other
priority scheme. For the sake of simplicity we choose the plain vanilla reservation id (the table´s
master key)
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
25. Scheduling scheme chosen
Since there exists a number of combinations for these and other criteria, that
result in very different scheduling schemes. The choice is usually to be made
among them according to the meta-knowledge that we have of the problem’s
environment (being it a hospital, a supermarket, a computer´s CPU…). This was
also the case at the data clean-up stage.
Since the problem was submitted decontextualised, we are somewhat free to
choose here. Our scheduling scheme is defined as follows:
The earliest reservation with the lowest ‘id’ will be scheduled as the first one.
All others will follow without any time lapses, according to their starting time,
and in case of conflict, to their reservation id.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
26. Coding the scheme
Alike before, we use pseudocode to show a simple scheduling routine:
We asume all ROWS have consecutive ID master keys after the COMMIT in the cleaning routine.
ROWS := SELECT ALL FROM RESERVATIONS Loads data from the Reservations table
ORDER ROWS BY DATETIME_FROM Orders all rows by starting time
ORDER ROWS BY ID Orders all rows by the master key
FOR I=ID FROM ROWS[FIRST] TO ROWS[LAST-1] LOOP: For every row but the last one, repeat with index ‘i’:
TIMESPAN := ROWS[I+1].DATETIME_FROM - ROWS[I+1].DATETIME_FROM Calculates duration for the next task
ROWS[I+1].DATETIME_FROM := ROWS[I].DATETIME_TO Set all tasks to start right after the previous one ends
ROWS[I+1].DATETIME_TO := ROWS[I].DATETIME_TO + TIMESPAN Set termination time for all tasks
END LOOP End of loop
COMMIT_WRITE(ROWS,VISITATIONS) Overwrites the VISITATIONS table with the result of this scheduling
routine
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
27. Reporting output
After scheduling we code a reporting routine in the same fashion as before:
We asume VIS (for short) is to contain the final output from VISITATIONS.
ORDER VISITATIONS BY DATETIME_FROM Orders all rows by starting time
ORDER VISITATIONS BY CLIENT_ID Performs a second ordering by client
VIS := SELECT FROM VISITATIONS: Loads several columns from the ordered Visitations table
VIS.ID
VIS.CLIENT_ID
VIS.NAME
VIS.DATETIME_FROM
VIS.DATETIME_TO
COMMIT WRITE(VIS,FILE(”.Output.csv”;#CSV)) Writes the result of this query in an archive in the comma-
separated values format.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
28. ACID Compliant DBMS
In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is a
set of properties that guarantee that database transactions are processed
reliably.
In the context of databases, a single logical operation on the data is called a
transaction.
This approach has many advantages and only slight disadvantages when
treating really huge databases (say Terabytes of data) in real time
environments. In those rare environments, a NoQSL approach might be
preferred.
As we will see in the following reads, Microsoft´s SQL Server Express software
solution will ensure ACID compliancy.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
29. ACID Compliancy in MS SQL Server (I)
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
30. ACID Compliancy in MS SQL Server (II)
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
31. ACID Compliancy in MS SQL Server (III)
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
32. Database design: a bird´s eye view
At this point, we again depict the problem in a
paper sheet to gain further insight before continuing
the database creation and management issues.
The database is thought of as part of a reservation
system that receives online reservation requests,
process them by scheduling acording to the scheme
and produces a visitation table. It also allows to
manage individually each of the visitators (just one
instance for our example), clients, reservations and
visitations.
We expanded the basic functionality of the software
by adding the possibility of having more tan one
agent of a visitations, dubbed “visitator”.
It will contain four tables: Visitators, Clients,
Reservations and Visitations.
It will implement one “Reschedule” function and
three procedures: Edit_clients, Edit_visitators and
Edit_Reservations.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
33. Database normalization
Databse normalization is the process of organizing
the fields and tables of a relational database to
minimize redundancy and dependency.
Normalization usually involves dividing large tables
into smaller (and less redundant) tables and
defininf relationships between them.
The objective is to isolate data so that additions,
deletions and modifications of a field can be made
in just one table and then propagated through the
rest of the database using the defined
relationships.
The Normal Forms (NF) of relational database
theory provide criteria for determining a table´s
degree of immunity against logical inconsistencies
and anomalies. The higher the normal form
applicable to a table, the less vulnerable it is.
For OLAP (Online Analytical Processing)
applications, such as data mining tools, it might be
preferred to use a lower normal form because they
are primarily “read only” databases that tend to
extract accumulated historical data, whereas
transaction intensive applications will usually opt
for a higher normal form.
For small problems like this one, usually 1NF, 2NF
or 3NF are the only ones being used.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
34. Database map: Table definition
The database will implement the following four tables: Visitators, Clients, Reservations and
Visitations
The tables contain the fields specified below. An asterisk (*) is added after the primary key
identifier for each of the tables.
VISITATORS: v_id (*), v_name
CLIENTS: client_id (*), name
RESERVATIONS: id (*), v_id, client_id, datetime_from, datetime_to
VISITATIONS: V_id (*), v_id, client_id, datetime_from, datetime_to, Rescheduled
NOTES:
The field for the client name has been moved out from the reservations table because having the client_id,
this field is redundant. A table has been created to contain all of the clients´names associated to their
client_id.
The field for for the client name has been moved out from the visitations table for the same reason. In case
we need to print a report containing the visitations as scheduled, a query will be able to access the Clients
table to retrieve the piece of data.
The visitator´s name has been moved out of reservations for analogous reasons. A visitators table has been
created.
The visitator´s identificator “v_id” has been added to the reservations and to the visitations table so to be
allow to choose among several of these.
Rescheduled is a boolean field that has been added to keep track of rescheduling operations. Any visitation
that undergoes a change in any other field for reservations rescheduling purposes will be marked with a
TRUE value. FALSE otherwise.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
35. Database map: Procedures and functions
The database will implement three procedures and one function that will be called from any
of the former.
The function “RESCHEDULE” will read the table of Reservations and any other needed and
will only write the table of Visitations. Its purpose will be to reschedule all rows according to
the scheme previously defined.
There will exist four procedures:
EDIT_CLIENTS: Reads and writes the Clients table. Writes the table of Reservations. Finally calls
the Reschedule function. It is used to modify any information concerning some particular client
instance, such as the name field, in all of the registers. It is also used to remove a client with
all of its reservations (and therefore its visitations).
EDIT VISITATORS: Reads and writes the Visitators table. Writes the table of Reservations. Finally
calls the Reschedule functions. It is used to modify any information concerning some particular
visitator instance, such as the name field, in all of the registers. It is also used to remove a
visitator with all of its reservations (and therefore its visitations).
EDIT RESERVATIONS: Reads and writes the Reservations table. Finally call the Reschedule
functions. It is used to edit any piece of data concerning a reservation, such as the visitator,
the client or the dates and times arranged. It is also used to delete a reservation.
NOTES:
Only the RESCHEDULE function can Access the Visitations list, being this considered the single
most valuable source of output reports from the program´s execution.
It may occur that upon the deletion of any or all of the Reservations, some garbage data
remains stored at the Clients and Visitators tables. That´s why we need specific procedures to
edit those.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
36. Database map, visually
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
Blue boxes for tables, Green disks for procedures. Arrows for data/operations fluxes.
PERT
37. References
"Microsoft SQL Server 2005 New Features" by Michael Oatley.
McGraw-Hill/Osborne 2005 (288 pages). ISBN:0072227761
“SQL Server 2000: Stored Procedure Programming” by Dejan
Sunderic and Tom Woodhead. Osborne Database Professional’s
Library
“Microsoft Excel 2007 VBA (Macros). Premier Training Limited
(London)
“Macros Visual Basic para Excel” by José Pedro García Sabater.
ROGLE – Universitat Politècnica de València.
“Microsoft SQL Server 2005 Express Edition for Dummies” by Robert
Schneider. Wiley Publishing, Inc.
“Oracle PL/SQL by Example” by VV.AA. Pearson Education as
Prentice Hall Professional Technical Reference.
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)
38. Any questions?
alfonsodelafuenteruiz@yahoo.es
http://creativecommons.org/licenses/by-nc-sa/3.0/legalcode
Please excuse any errata.
Thanks for your attention
Alfonso de la Fuente Ruiz – http://www.linkedin.com/in/alfonsofr/es - Licensed under Creative Commons BY-NC-SA (7/Sept/2013)