The document discusses designing web-based computing services to improve the performance of telemedicine database management systems. It proposes an integrated approach called IFCA (Integrated Fragmentation, Clustering, Allocation) that combines database fragmentation, website clustering, and data distribution techniques. This approach aims to reduce data transfer and access times, improve system throughput, reliability, and data availability. The document outlines the challenges in database management for telemedicine systems and reviews existing techniques for fragmentation, clustering, and allocation. It then describes the key contributions of the IFCA approach to address limitations in previous work.
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
Promoting Telemedicine Database Management with High Performance Web Services
1. Designing High Performance Web-Based
Computing Services to Promote Telemedicine
Database Management System
Ismail Hababeh, Issa Khalil, and Abdallah Khreishah
Abstract—Many web computing systems are running real time database services where their information change continuously and
expand incrementally. In this context, web data services have a major role and draw significant improvements in monitoring and
controlling the information truthfulness and data propagation. Currently, web telemedicine database services are of central importance
to distributed systems. However, the increasing complexity and the rapid growth of the real world healthcare challenging applications
make it hard to induce the database administrative staff. In this paper, we build an integrated web data services that satisfy fast
response time for large scale Tele-health database management systems. Our focus will be on database management with application
scenarios in dynamic telemedicine systems to increase care admissions and decrease care difficulties such as distance, travel, and
time limitations. We propose three-fold approach based on data fragmentation, database websites clustering and intelligent data
distribution. This approach reduces the amount of data migrated between websites during applications’ execution; achieves cost-
effective communications during applications’ processing and improves applications’ response time and throughput. The proposed
approach is validated internally by measuring the impact of using our computing services’ techniques on various performance features
like communications cost, response time, and throughput. The external validation is achieved by comparing the performance of our
approach to that of other techniques in the literature. The results show that our integrated approach significantly improves the
performance of web database systems and outperforms its counterparts.
Index Terms—Web telemedicine database systems (wtds), database fragmentation, data distribution, sites clustering
Ç
1 INTRODUCTION
THE rapid growth and continuous change of the real
world software applications have provoked research-
ers to propose several computing services’ techniques to
achieve more efficient and effective management of
web telemedicine database systems (WTDS). Significant
research progress has been made in the past few years to
improve WTDS performance. In particular, databases as a
critical component of these systems have attracted many
researchers. The web plays an important role in enabling
healthcare services like telemedicine to serve inaccessible
areas where there are few medical resources. It offers an
easy and global access to patients’ data without having to
interact with them in person and it provides fast channels
to consult specialists in emergency situations. Different
kinds of patient’s information such as ECG, temperature,
and heart rate need to be accessed by means of various cli-
ent devices in heterogeneous communications environ-
ments. WTDS enable high quality continuous delivery of
patient’s information wherever and whenever needed.
Several benefits can be achieved by using web telemedi-
cine services including: medical consultation delivery,
transportation cost savings, data storage savings, and
mobile applications support that overcome obstacles
related to the performance (e.g., bandwidth, battery life,
and storage), security (e.g., privacy, and reliability), and
environment (e.g., scalability, heterogeneity, and availabil-
ity). The objectives of such services are to: (i) develop large
applications that scale as the scope and workload increases,
(ii) achieve precise control and monitoring on medical data
to generate high telemedicine database system performance,
(iii) provide large data archive of medical data records, accu-
rate decision support systems, and trusted event-based noti-
fications in typical clinical centers.
Recently, many researchers have focused on designing
web medical database management systems that satisfy cer-
tain performance levels. Such performance is evaluated by
measuring the amount of relevant and irrelevant data
accessed and the amount of transferred medical data during
transactions’ processing time. Several techniques have been
proposed in order to improve telemedicine database perfor-
mance, optimize medical data distribution, and control
medical data proliferation. These techniques believed that
high performance for such systems can be achieved by
improving at least one of the database web management
services, namely—database fragmentation, data distribu-
tion, websites clustering, distributed caching, and database
scalability. However, the intractable time complexity of
processing large number of medical transactions and man-
aging huge number of communications make the design of
such methods a non-trivial task. Moreover, none of the
I. Hababeh is with the Faculty of Computer Engineering Information
Technology, German-Jordanian University, Amman, Jordan.
E-mail: Hababeh@gju.edu.jo.
I. Khalil is with the Qatar Computing Research Institute (QCRI), Qatar
Foundation, Doha, Qatar. E-mail: ikhalil@qf.org.qa.
A. Khreishah is with the Department of Electrical Computer Engineer-
ing, New Jersey Institute of Technology, 181 Waren Street, Newark, NJ
07102. E-mail: Abdallah@njit.edu.
Manuscript received 15 Feb. 2013; revised 17 Nov. 2013; accepted 7 Jan. 2014.
Date of publication 15 Jan. 2014; date of current version 6 Feb. 2015.
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TSC.2014.2300499
IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 8, NO. 1, JANUARY/FEBRUARY 2015 47
1939-1374 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
2. existing methods consider the three-fold services together
which makes them impracticable in the field of web data-
base systems. Additionally, using multiple medical services
from different web database providers may not fit the needs
for improving the telemedicine database system perfor-
mance. Furthermore, the services from different web data-
base providers may not be compatible or in some cases it
may increase the processing time because of the constraints
on the network [1]. Finally, there has been lack in the tools
that support the design, analysis and cost-effective deploy-
ments of web telemedicine database systems [1].
Designing and developing fast, efficient, and reliable
incorporated techniques that can handle huge number of
medical transactions on large number of web healthcare
sites in near optimal polynomial time are key challenges in
the area of WTDS. Data fragmentation, websites clustering,
and data allocation are the main components of the WTDS
that continue to create great research challenges as their cur-
rent best near optimal solutions are all NP-Complete.
To improve the performance of medical distributed data-
base systems, we incorporate data fragmentation, websites
clustering, and data distribution computing services
together in a new web telemedicine database system
approach. This new approach intends to decrease data com-
munication, increase system throughput, reliability, and
data availability.
The decomposition of web telemedicine database rela-
tions into disjoint fragments allows database transactions
to be executed concurrently and hence minimizes the
total response time. Fragmentation typically increases the
level of concurrency and, therefore, the system through-
put. The benefits of generating telemedicine disjoint frag-
ments cannot be deemed unless distributing these
fragments over the websites, so that they reduce commu-
nication cost of database transactions. Database disjoint
fragments are initially distributed over logical clusters (a
group of websites that satisfy a certain physical property,
e.g., communications cost). Distributing database disjoint
fragments to clusters where a benefit allocation is
achieved, rather than allocating the fragments to all web-
sites, have an important impact on database system
throughput. This type of distribution reduces the number
of communications required for query processing in
terms of retrieval and update transactions; it has always a
significant impact on the web telemedicine database sys-
tem performance. Moreover, distributing disjoint frag-
ments among the websites where it is needed most,
improves database system performance by minimizing
the data transferred and accessed during the execution
time, reducing the storage overheads, and increasing
availability and reliability as multiple copies of the same
data are allocated.
Database partitioning techniques aim at improving data-
base systems throughput by reducing the amount of irrele-
vant data packets (fragments) to be accessed and transferred
among different websites. However, data fragmentation
raises some difficulties; particularly when web telemedicine
database applications have contradictory requirements that
avert breakdown of the relation into mutually exclusive frag-
ments. Those applications whose views are defined on more
than one fragment may suffer performance ruin. In this case,
it might be necessary to retrieve data from two or more frag-
ments and take their join, which is costly [31]. Data fragmen-
tation technique describes how each fragment is derived
from the database global relations. Three main classes of
data fragmentation have been discussed in the literature;
horizontal [2], [3], vertical [4], [5], and hybrid [6], [7].
Although there are various schemes describing data parti-
tioning, few are known for the efficiency of their algorithms
and the validity of their results [33].
The Clustering technique identifies groups of network
sites in large web database systems and discovers better
data distributions among them. This technique is consid-
ered to be an efficient method that has a major role in reduc-
ing the amount of transferred and accessed data during
processing database transactions. Accordingly, clustering
techniques help in eliminating the extra communications
costs between websites and thus enhances distributed data-
base systems performance [32]. However, the assumptions
on the web communications and the restrictions on the
number of network sites, make clustering solutions imprac-
tical [16], [31]. Moreover, some constraints about network
connectivity and transactions processing time bound the
applicability of the proposed solutions to small number of
clusters [9], [10].
Data distribution describes the way of allocating the dis-
joint fragments among the web clusters and their respective
sites of the database system. This process addresses the
assignment of each data fragment to the distributed database
websites [8], [17], [18], [21]. Data distribution related techni-
ques aim at improving distributed database systems perfor-
mance. This can be accomplished by reducing the number of
database fragments that are transferred and accessed during
the execution time. Additionally, Data distribution techni-
ques attempt to increase data availability, elevate database
reliability, and reduce storage overhead [11], [27]. However,
the restrictions on database retrieval and update frequencies
in some data allocation methods may negatively affect the
fragments distribution over the websites [20].
In this work, we address the previous drawbacks and
propose a three-fold approach that manages the computing
web services that are required to promote telemedicine
database system performance. The main contributions are:
Develop a fragmentation computing service tech-
nique by splitting telemedicine database relations
into small disjoint fragments. This technique gener-
ates the minimum number of disjoint fragments that
would be allocated to the web servers in the data dis-
tribution phase. This in turn reduces the data trans-
ferred and accessed through different websites and
accordingly reduces the communications cost.
Introduce a high speed clustering service technique
that groups the web telemedicine database sites into
sets of clusters according to their communications
cost. This helps in grouping the websites that are
more suitable to be in one cluster to minimize data
allocation operations, which in turn helps to avoid
allocating redundant data.
Propose a new computing service technique for tele-
medicine data allocation and redistribution services
based on transactions’ processing cost functions.
48 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 8, NO. 1, JANUARY/FEBRUARY 2015
3. These functions guarantee the minimum communi-
cations cost among websites and hence accomplish
better data distribution compared to allocating data
to all websites evenly.
Develop a user-friendly experimental tool to perform
services of telemedicine data fragmentation, web-
sites clustering, and fragments allocation, as well as
assist database administrators in measuring WTDS
performance.
Integrate telemedicine database fragmentation, web-
sites clustering, and data fragments allocation into
one scenario to accomplish ultimate web telemedi-
cine system throughput in terms of concurrency, reli-
ability, and data availability. We call this scenario
Integrated-Fragmentation-Clustering-Allocation (IFCA)
approach. Fig. 1 depicts the architecture of the pro-
posed telemedicine IFCA approach.
In Fig. 1, the data request is initiated from the telemedi-
cine database system sites. The requested data is defined as
SQL queries that are executed on the database relations to
generate data set records. Some of these data records may be
overlapped or even redundant, which increase the I/O trans-
actions’ processing time and so the system communications
overhead. To solve this problem, we execute the proposed
fragmentation technique which generates telemedicine dis-
joint fragments that represent the minimum number of data
records. The web telemedicine database sites are grouped
into clusters by using our clustering service technique in a
phase prior to data allocation. The purpose of this clustering
is to reduce the communications cost needed for data alloca-
tion. Accordingly, the proposed allocation service technique
is applied to allocate the generated disjoint fragments at the
clusters that show positive benefit allocation. Then the frag-
ments are allocated to the sites within the selected clusters.
Database administrator is responsible for recovering any site
failure in the WTDS.
The remainder of the paper is organized as follows.
Section 22 summarizes the related work. Basic concepts of
the web telemedicine database settings and assumptions
are discussed in Section 3. Telemedicine computation serv-
ices and estimation model are discussed in Section 4. Exper-
imental results and performance evaluation are presented
in Section 35. Finally, in Section 6, we draw conclusions and
outline the future work.
2 RELATED WORK
Many research works have attempted to improve the
performance of distributed database systems. These
works have mostly investigated fragmentation, allocation
and sometimes clustering problems. In this section, we
present the main contributions related to these problems,
discuss and compare their contributions with our pro-
posed solutions.
2.1 Data Fragmentation
With respect to fragmentation, the unit of data distribution
is a vital issue. A relation is not appropriate for distribu-
tion as application views are usually subsets of relations
[31]. Therefore, the locality of applications’ accesses is
defined on the derivative relations subsets. Hence it is
important to divide the relation into smaller data frag-
ments and consider it for distribution over the network
sites. The authors in [8] considered each record in each
database relation as a disjoint fragment that is subject for
allocation in a distributed database sites. However, large
number of database fragments is generated in this method,
causing a high communication cost for transmitting and
processing the fragments. In contrast to this approach, the
authors in [11] considered the whole relation as a frag-
ment, not all the records of the fragment have to be
retrieved or updated, and a selectivity matrix that indi-
cates the percentage of accessing a fragment by a transac-
tion is proposed. However, this research suffers from data
redundancy and fragments overlapping.
2.2 Clustering Websites
Clustering service technique identifies groups of net-
working sites and discovers interesting distributions
among large web database systems. This technique is
considered as an efficient method that has a major role in
reducing transferred and accessed data during transac-
tions processing [9]. Moreover, grouping distributed net-
work sites into clusters helps to eliminate the extra
communication costs between the sites and then enhan-
ces the distributed database system performance by mini-
mizing the communication costs required for processing
the transactions at run time.
In a web database system environment where the num-
ber of sites has expanded tremendously and amount of
data has increased enormously, the sites are required to
manage these data and should allow data transparency to
the users of the database. Moreover, to have a reliable data-
base system, the transactions should be executed very fast
in a flexible load balancing database environment. When
the number of sites in a web database system increases to a
large scale, the problem of supporting high system perfor-
mance with consistency and availability constraints
becomes crucial. Different techniques could be developed
for this purpose; one of them is websites clustering.
Grouping websites into clusters reduces communica-
tions cost and then enhances the performance of the web
database system. However, clustering network sites is
still an open problem and the optimal solution to this
problem is NP-Complete [12]. Moreover, in case of a
complex network where large numbers of sites are
Fig. 1. IFCA computing services architecture.
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE... 49
4. connected to each other, a huge number of communica-
tions are required, which increases the system load and
degrades its performance.
The authors in [13] have proposed a hierarchical cluster-
ing algorithm that uses similarity upper approximation
derived from a tolerance(similarity) relation and based on
rough set theory that does not require any prior informa-
tion about the data. The presented approach results in
rough clusters in which an object is a member of more than
one cluster. Rough clustering can help researchers to dis-
cover multiple needs and interests in a session by looking
at the multiple clusters that a session belongs to. However,
in order to carry out rough clustering, two additional
requirements, namely, an ordered value set of each attri-
bute and a distance measure for clustering need to be spec-
ified [14]. Clustering coefficients are needed in many
approaches in order to quantify the structural network
properties. In [15], the authors proposed higher order clus-
tering coefficients defined as probabilities that determine
the shortest distance between any two nearest neighbors of
a certain node when neglecting all paths crossing this
node. The outcomes of this method declare that the aver-
age shortest distance in the node’s neighborhood is smaller
than all network distances. However, independent con-
stant values and natural logarithm function are used in the
shortest distance approximation function to determine the
clustering mechanism, which results in generating small
number of clusters.
2.3 Data Allocation (Distribution)
Data allocation describes the way of distributing the data-
base fragments among the clusters and their respective
sites in distributed database systems. This process
addresses the assignment of network node(s) to each frag-
ment [8]. However, finding an optimal data allocation is
NP-complete problem [12]. Distributing data fragments
among database websites improves database system per-
formance by minimizing the data transferred and
accessed during execution, reducing the storage over-
head, and increasing availability and reliability where
multiple copies of the same data are allocated.
Many data allocation algorithms are described in the
literature. The efficiency of these algorithms is measured
in term of response time. Authors in [19] proposed an
approach that handles the full replication of data alloca-
tion in database systems. In this approach, a database file
is fully copied to all participating nodes through the mas-
ter node. This approach distributes the sequences through
fragments with a round-robin strategy for sequence input
set already ordered by size, where the number of sequen-
ces is about the same and number of characters at each
fragment is similar. However, this replicated schema
does not achieve any performance gain when increasing
the number of nodes. When a non-previously determined
number of input sequences are present, the replication
model may not be the best solution and other fragmenta-
tion strategies have to be considered. In [20], the author
has addressed the fragment allocation problem in web
database systems. He presented an integer programming
formulations for the non-redundant version of the frag-
ment allocation problem. This formulation is extended to
address problems, which have both storage and process-
ing capacity constraints. In this method, the constraints
essentially state that there has been exactly one copy of a
fragment across all sites, which increase the risk of data
inconsistency and unavailability in case of any site fail-
ure. However, the fragment size is not addressed while
the storage capacity constraint is one of the major objec-
tives of this approach. In addition, the retrieval and
update frequencies are not considered in the formula-
tions, they are assumed to be the same, which affects the
fragments distribution over the sites. Moreover, this
research is limited by the fact that none of the approaches
presented have been implemented and tested on a real
web database system.
A dynamic method for data fragmentation, allocation,
and replication is proposed in [25]. The objective of this
approach is to minimize the cost of access, re-fragmentation,
and reallocation. DYFRAM algorithm of this method
examines accesses for each replica and evaluates possible
re-fragmentations and reallocations based on recent his-
tory. The algorithm runs at given intervals, individually
for each replica. However, data consistency and concur-
rency control are not considered in DYFRAM. Addition-
ally, DYFRAM doesn’t guarantee data availability and
system reliability when all sites have negative utility val-
ues. In [28], the authors present a horizontal fragmenta-
tion technique that is capable of taking a fragmentation
decision at the initial stage, and then allocates the frag-
ments among the sites of DDBMS. A modified matrix
MCRUD is constructed by placing predicates of attributes
of a relation in rows and applications of the sites of a
DDBMS in columns. Attribute locality precedence ALP;
the value of importance of an attribute with respect to
sites of distributed database is generated as a table from
MCRUD. However, when all attributes have the same
locality precedence, the same fragment has to be allocated
in all sites, and a huge data redundancy occurs. Moreover,
the initial values of frequencies and weights don’t reflect
the actual ones in real systems, and this may affect the
number of fragments and their allocation accordingly.
The authors in [29] presented a method for modeling
the distributed database fragmentation by using UML 2.0
to improve applications performance. This method is
based on a probability distribution function where the
execution frequency of a transaction is estimated mainly
by the most likely time. However, the most likely time is
not determined to distinguish the priorities between
transactions. Furthermore, no performance evaluations
are performed and no significant results are generated
from this method. A database tool shown in [30]
addresses the problem of designing DDBs in the context
of the relational data model. Conceptual design, fragmen-
tation issues, as well as the allocation problem are consid-
ered based on other methods in the literature. However,
this tool doesn’t consider the local optimization of frag-
ment allocation problem over the distributed network
sites. In addition, many design parameters need to be esti-
mated and entered by designers where different results
may be generated for the same application case.
Our fragmentation approach circumvents the problems
associated with the aforementioned studies by introducing
50 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 8, NO. 1, JANUARY/FEBRUARY 2015
5. a computing service technique that generates disjoint frag-
ments, avoids data redundancy and considers that all
records of the fragment are retrieved or updated by a trans-
action. By such a technique, less communication costs are
required to transmit and process the disjoint database frag-
ments. In addition, by applying the clustering service tech-
nique, the complexity of allocation problem is significantly
reduced. Therefore, the intractable solution of fragment
allocation is turned out into fragment distribution among
the clusters and then replicating it among the related sites.
2.4 Commercial Outsource Databases
This section investigates current commercial outsource
databases and compares them with our IFCA. The commer-
cial outsource databases support enormous data storage
architecture that distributes storage and processing across
several servers. It can be used to address web database sys-
tem performance and scalability requirements.
Amazon Dynamo [34] is a cloud distributed database
system with high available and scalable capabilities. In con-
trast to traditional relational distributed database manage-
ment systems, Dynamo only supports database applications
with simple read and write queries on data records. How-
ever, in Dynamo availability is more important than consis-
tency. Accordingly, in certain times, data consistency can’t
be guaranteed. MongoDB [35] is an open source, document
oriented cloud distributed database developed by the 10gen
software company. MongoDB is highly available and scal-
able system that supports search by fields in the documents.
However, MongoDB uses a special query optimizer to make
queries more efficient, and executes different query plans
concurrently, which increases the search time complexity.
Google BigTable [36] is a cloud distributed database
system built on Google file system and a highly available
and persistent distributed lock service. The BigTable data
model is designed as a distributed multidimensional
(row key, column key, timestamps) sorted map, and
implemented on three parts: tablet servers, client library,
and master server. However, as the failure of the master
server results in the failure of the whole database system,
and thus another server usually backs up the master
server. This incurs additional costs to the cloud distrib-
uted database system in addition to the costs of operating
and maintaining the new servers.
Due to their contrast in priorities, architecture, data con-
sistency, and search time complexity compared to our
IFCA, the previous outsource databases can’t optimize and
secure telemedicine data in terms of data fragmentation,
clustering websites, and data distribution altogether as our
approach successfully does. Table 1 compares our approach
with outsource approaches in terms of integrity, reliability,
communications overhead, manageability, data consistency,
security, and performance evaluation.
3 TELEMEDICINE IFCA ASSUMPTIONS
AND DEFINITIONS
Incorporating database fragmentation, web database sites’
clustering, and data fragments computing services’ alloca-
tion techniques in one scenario distinguishes our approach
from other approaches. The functionality of such approach
depends on the settings, assumptions, and definitions
that identify the WTDS implementation environment,
to guarantee its efficiency and continuity. Below are the
TABLE 1
Comparison between Existing Methods in the Literature and the Proposed Approach
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE... 51
6. description of the IFCA settings, assumptions, and
definitions.
3.1 Web Architecture and Communications
Assumptions
The telemedicine IFCA approach is designed to support
web database provider with computing services that can be
implemented over multiple servers, where the data storage,
communication and processing transactions are fully con-
trolled, costs of communication are symmetric, and the
patients’ information privacy and security are met. We pro-
pose fully connected sites on a web telemedicine heteroge-
neous network system with different bandwidths;
128 kbps, 512 kbps, or multiples. In this environment, some
servers are used to execute the telemedicine queries trig-
gered from different web database sites. Few servers are
run the database programs and perform the fragmentation-
clustering-allocation computing services while the other
servers are used to store the database fragments. Commu-
nications cost (ms/byte) is the cost of loading and process-
ing data fragments between any two sites in WTDS. To
control and simplify the proposed web telemedicine com-
munication system, we assume that communication costs
between sites are symmetric and proportional to the dis-
tance between them. Communication costs within the same
site are neglected.
3.2 Fragmentation and Clustering Assumptions
Telemedicine queries are triggered from web servers as
transactions to determine the specific information that
should be extracted from the database. Transactions include
but not limited to: read, write, update, and delete. To con-
trol the process of database fragmentation and to achieve
data consistency in the telemedicine database system, IFCA
fragmentation service technique partitions each database
relation according to the Inclusion-Integration-Disjoint
assumptions where the generated fragments must contain
all records in the database relations, the original relation
should be able to be formed from its fragments, and the
fragments should be neither repeated nor intersected. The
logical clustering decision is defined as a Logical value that
specifies whether a website is included or excluded from a
certain cluster, based on the communications cost range.
The communications cost range is defined as a value
(ms/byte) that specifies how much time is allowed for the
websites to transmit or receive their data to be considered in
the same cluster, this value is determined by the telemedi-
cine database administrator.
3.3 Fragments Allocation Assumptions
The allocation decision value ADV is defined as a logical
value (1, 0) that determines the fragment allocation status
for a specific cluster. The fragments that achieve allocation
decision value of (1) are considered for allocation and repli-
cation process. The advantage that can be generated from
this assumption is that, more communications costs are
saved due to the fact that the fragments’ locations are in the
same place where it is processed, hence improve the WTDS
performance. On the other hand, the fragments that carry
out allocation decision value of (0) are considered for
allocation process only in order to ensure data availability
and fault-tolerant in the WTDS. In this case, each fragment
should be allocated to at least one cluster and one site in
this cluster. The allocation decision value ADV is assumed
to be computed as the result of the comparison between the
cost of allocating the fragment to the cluster and the cost of
not allocating the fragment to the same cluster. The alloca-
tion cost function is composed of the following sub-cost
functions that are required to perform the fragment transac-
tions locally: cost of local retrieval, cost of local update to
maintain consistency among all the fragments distributed
over the websites, and cost of storage, or cost of remote
update and remote communications (for remote clusters
that do not have the fragment and still need to perform the
required transactions on that fragment). The not allocation
cost function consists of the following sub-cost functions:
cost of local retrieval and cost of remote retrievals required
to perform the fragment transactions remotely when the
fragment is not allocated to the cluster.
4 TELEMEDICINE IFCA COMPUTATION SERVICES
AND ESTIMATION MODEL
In the following sections, we present our IFCA and provide
mathematical models of its computations’ services.
4.1 Fragmentation Computing Service
To control the process of database fragmentation and main-
tain data consistency, the fragmentation technique parti-
tions each database relation into data set records that
guarantee data inclusion, integration and non-overlapping.
In a WTDS, neither complete relation nor attributes are suit-
able data units for distribution, especially when considering
very large data. Therefore, it is appropriate to use data frag-
ments that would be allocated to the WTDS sites. Data frag-
mentation is based on the data records generated by
executing the telemedicine SQL queries on the database
relations. The fragmentation process goes through two con-
secutive internal processes: (i) Overlapped and redundant
data records fragmentation and (ii) Non-overlapped data
records fragmentation.
The fragmentation service generates disjoint fragments
that represent the minimum number of data records to be
distributed over the websites by the data allocation service.
The proposed fragmentation Service architecture is
described through Input-Processing-Output phases
depicted in Fig. 2. Based on this fragmentation service, the
global database is partitioned into disjoint fragments. The
overlapped and redundant data records fragmentation
process is described in Table 2. In this algorithm, database
fragmentation starts with any two random data fragments
Fig. 2. Data fragmentation service architecture.
52 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 8, NO. 1, JANUARY/FEBRUARY 2015
7. (i, j) with intersection records and proceeds as follows:
If intersection exists between the two fragments, three
disjoint fragments will be generated: Fk ¼ Fi Fj, Fkþ1 ¼ Fi
– Fk, and Fkþ2 ¼ Fj–F. The first contains the common
records, the second contains the records in the first frag-
ment but not in the second fragment, and the third contains
the records in the second fragment but not in the first.
The intersecting fragments are then removed from the
fragments list. This process continues until no more inter-
secting fragments or data set records still exist for this tele-
medicine relation, and so for the other relations.
The Non-overlapped data records fragmentation process
is shown Table 3. The new derived fragments from Table 2
and the fragments from non-overlapping fragments
resulted in totally disjoint fragments. The non-overlapped
fragments that do not intersect with any other fragment in
Table 2 are considered in the final set of the relation disjoint
fragments. For example, consider that the transactions trig-
gered from the telemedicine database sites hold queries
that require extracting records from different telemedicine
database relations. Let us assume three transactions for
patient relation (T1, T2, T3) on sites S1, S2, and S3 respec-
tively. T1 defines Patients who medicated by Doctor #1, T2
defines Patients who took medicine #4, and T3 defines
Patients who paid more than $1,000. The data set 1 in
patient relation intersects with data set 2 in the same rela-
tion. This result in new disjoint fragments: F1, F2 and F3.
The data sets 1 and 2 are omitted. Fragment F1 and data set
3 in the same relation are intersected. This results in the
new disjoint fragments F4, F5, and F6. Then F1 and the data
set 3 are omitted, and the next intersection is between F2
and F6. The result is new disjoint fragments; F7, F8, and F9.
Then F2 and F6 are omitted. The final disjoint fragments
generated from the transactions over patient relations are:
F3, F4, F5, F7, F8, and F9. Table 4 depicts the fragmentation
process over patient relation.
4.2 Clustering Computing Service
The benefit of generating database disjoint fragments can’t
be completed unless it enhances the performance of the
WTDS. As the number of database sites becomes too large,
supporting high system performance with consistency and
availability constraints becomes crucial. Several techniques
are developed for this purpose; one of them consists of clus-
tering websites. However, grouping sites into clusters is still
an open problem with NP-Complete optimal solution [12].
Therefore, developing a near-optimal solution for grouping
web database sites into clusters helps to eliminate the extra
communication costs between the sites during the process
of data allocation. Clustering web telemedicine database
sites speeds up the process of data allocation by distributing
the fragments at the clusters that accomplish benefit alloca-
tion rather than distributing the fragments once at all web-
sites. Thus, the communication costs are minimized and the
TABLE 2
Overlapped and Redundant Data Records Fragmentation Algorithm
TABLE 3
Non-Overlapped Data Records Fragmentation Algorithm
TABLE 4
Fragmentation Process over Patient Relation
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE... 53
8. WTDS performance is improved. In this work, we introduce
a high speed clustering service based on the least average
communication cost between sites. The parameters used to
control the input/output computations for generating clus-
ters and determining the set of sites in each are computed
as follows:
Communications cost between sites CC(Si, Sj) ¼ data
creation cost þ data transmission cost between Si, Sj.
Communication cost range CCR (ms/byte) which is
determined by the telemedicine database system
administrator.
Clustering Decision Value (cdv):
cdvðSi; SjÞ ¼ f1 : IF CCðSi; SjÞ CCR ^ i 6¼ j and
0 : IF CCðSi; SjÞ CCR _ i ¼ j
: (1)
Accordingly, if cdv(Si, Sj) is equal to 1, then the sites Si,
Sj are assigned to one cluster, otherwise they are assigned
to different clusters. If site Si can be assigned to more than
one cluster, it will be considered for the cluster of the least
average communication cost. Based on this clustering ser-
vice, we develop the clustering algorithm shown in Table 5.
The communications cost within and between clusters is
required for the fragment allocation phase where fragments
are allocated to web clusters and then to their respective
websites. The optimal way to compute this cost is to find the
cheapest path between the clusters which is an NP-complete
problem [12]. Instead, we use the cluster symmetry average
communication cost as it has been shown to be fast, reliable
and efficient method for the computation of the fragments’
allocation and replication service in many heterogeneous
web telemedicine database environments. To test the valid-
ity of our clustering service technique, and based on the
assumptions on the communications costs between sites
(symmetric between any two sites, zero within the same
site, and proportional to the distance between the sites), we
collect a real sample of communication costs (multiple of
0.125 ms/byte) between 12 websites (hospitals) and present
it in Table 6. We then apply the clustering service in our
IFCA tool on the communication costs in Table 6 with clus-
tering range of 2.5.
Fig. 3 depicts the experimental results that generate the
clusters and their respective sites. It is inferred from the
figure that sites 1 and 3 are located in different clusters
because the communication costs of the other sites can’t
match the communications cost range with them. Moreover,
the number of clusters is increased as the communication
cost between sites increased or the communication cost
range becomes small.
4.3 Data Allocation and Replication
Data allocation techniques aim at distributing the database
fragments on the web database clusters and their respec-
tive sites. We introduce a heuristic fragment allocation and
replication computing service to perform the processes of
fragments allocation in the WTDS. Initially, all fragments
are subject for allocation to all clusters that need these
fragments at their sites. If the fragment shows positive
allocation decision value (i.e., allocation benefit greater
TABLE 5
Clustering Algorithm
TABLE 6
The Communication Costs between Websites
54 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 8, NO. 1, JANUARY/FEBRUARY 2015
9. than zero) for a specific cluster, then the fragment is allo-
cated to this cluster and tested for allocation at each of its
sites, otherwise the fragment is not allocated to this cluster.
This fragment is subsequently tested for replication in each
cluster of the WTDS. Accordingly, the fragment that shows
positive allocation decision value for any WTDS cluster
will be allocated at that cluster and then tested for alloca-
tion at its sites. Consequently, if the fragment shows posi-
tive allocation decision value at any site of cluster that
already shows positive allocation decision value, then the
fragment is allocated to that site, otherwise, the fragment
is not allocated. This process is repeated for all sites in
each cluster that shows positive allocation decision value.
Fig. 4 illustrates the structure of our data allocation and
replication technique.
In case a fragment shows negative allocation decision
value at all clusters, the fragment is allocated to the cluster
that holds the least average communications cost, and then
to the site that achieve the least communications cost with
other sites in the current cluster. In order to better under-
stand the computation of the queries processing cost func-
tions, a mathematical model will be used to formulate these
cost functions.
The allocation decision value ADV is computed as the
logical result of the comparison between two compound
cost functions components; the cost of allocating the frag-
ment to the cluster and the cost of not allocating the frag-
ment to the same cluster. The cost of allocating the
fragment Fi issued by the transaction Tk to the cluster Cj,
denoted as CA(Tk, Fi, Cj), is defined as the sum of the fol-
lowing sub-costs: retrievals and updates issued by the
transaction Tk to the fragment Fi at cluster Cj, storage
occupied by the fragment Fi in the cluster Cj, and remote
updates and remote communications sent from remote
clusters. The mathematical model for each sub-cost func-
tion is detailed below:
Cost of local retrievals issued by the transaction Tk to
the fragment Fi at cluster Cj. This cost is determined
by the frequency and cost of retrievals. The fre-
quency of retrievals represents the average number
(!0) of retrieval transactions that occur on each frag-
ment at all sites of a certain cluster. The cost of
retrievals is the average cost of retrieval transactions
that occur on each fragment at all sites of a specific
cluster, which is set by the telemedicine database
system administrator in time units (millisecond,
microsecond,. . .,etc.) / byte. The cost of local retriev-
als is computed as the multiplication of the average
cost of local retrievals for all sites (m) at cluster Cj
and the average frequency of local retrievals issued
by the transaction Tk to the fragment Fi for all sites at
cluster Cj:
CLRðTk; Fi; CjÞ ¼
Xm
q¼1
CLRðTk; Fi; Cj; SqÞ
m
Â
Xm
q¼1
FREQLRðTk; Fi; Cj; SqÞ
m
: (2)
Cost of local updates issued by the transaction Tk
to the fragment Fi at cluster Cj. This cost is deter-
mined by the frequency and cost of updates. The
frequency of updates is computed as the average
number of update transactions that occur on each
fragment at all sites of a certain cluster. The cost of
updates is the average cost of update transactions
that occur on each fragment at all sites of a specific
cluster, and is determined by the WTDS adminis-
trator in time units per byte. The cost of local
updates is computed as the multiplication of the
average cost of local updates for all sites (m) at
cluster Cj and the average frequency of localFig. 4. Data allocation and replication technique.
Fig. 3. Clustering telemedicine websites.
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE... 55
10. updates issued by the transaction Tk to the frag-
ment Fi for all sites at cluster Cj :
CLUðTk; Fi; CjÞ ¼
Xm
q¼1
CLUðTk; Fi; Cj; SqÞ
m
Â
Xm
q¼1
FREQLUðTk; Fi; Cj; SqÞ
m
: (3)
Cost of storage occupied by the fragment Fi in the
cluster Cj. This cost is determined by the storage cost
and the fragment size. The storage cost is the cost of
storing a data byte at a certain site in a specific clus-
ter. The fragment size is the storage occupied by the
fragment in bytes. The cost of storage is computed as
the result of multiplication of the average storage
cost for all sites ðmÞ at cluster Cj occupied by the
fragment Fi and the fragment size:
SCPðTk; Fi; CjÞ ¼
Xm
q¼1
SCPðTk; Fi; Cj; SqÞ
m
 FsizeðTk; FiÞ:
(4)
Cost of remote updates issued by the transaction
Tk to the fragment Fi sent from all clusters ðnÞ in
the WTDS except the current cluster Cj. This cost
is determined by the cost of local updates and the
frequency of remote updates. The frequency of
remote updates is the number of update transac-
tions that occur on each fragment at remote web
clusters in the WTDS. The cost of remote update is
the multiplication of average cost of local updates
at cluster Cq and the average frequency of remote
updates issued by the Tk to the fragment Fi at each
remote cluster in the WTDS:
CRUðTk; Fi; CjÞ ¼
Xn
q¼1;q6¼j
CLUðTk; Fi; CqÞ
Â
X
q¼1;q6¼j
Xn
a
FREQRUðTk; Fi; Cq; SaÞ
m
:
(5)
Update ratio is used in the evaluation of cost of
remote communications and represents the esti-
mated percentage of update transactions in the web
telemedicine database system. It is computed by
dividing the number of local update frequencies in
all WTDS sites by the total number of all local
retrievals and update frequencies in all sites:
Uratio ¼
Xm
q¼1
FREQLUðTk; Fi; Cj; SqÞ
Xm
q¼1
FREQLUðTk; Fi; Cj; SqÞ
þ
Xm
q¼1
FREQLUðTk; Fi; Cj; SqÞ
: (6)
Cost of remote communications issued for the trans-
action Tk to the fragment Fi sent from remote clus-
ters ðnÞ in the WTDS. This cost defines the required
cost needed for updating the fragment from remote
clusters. It is computed as the product of the average
cost of remote communication, the average fre-
quency of local update, and update ratio:
CRCðTk; Fi; CjÞ ¼
Xn
q¼1;q6¼j
Xm
a¼1;b¼1;a6¼b
CRCðTk; Fi; Cq; ðSa; SbÞÞ
m
!
Â
Xm
q¼1
FREQLUðTk; Fi; Cj; SqÞ
m
 Uratio:
(7)
According to the definition of the cost of allocating
fragments, and based on the previous allocation sub-
costs, the mathematical model that represents the
cost of fragment allocation to a certain cluster in the
WTDS is computed as:
CAðTk; Fi; CjÞ ¼ CLRðTk; Fi; CjÞ þ CLUðTk; Fi; CjÞ
þ SCPðTk; Fi; CjÞ þ CRUðTk; Fi; CjÞ
þ CRCðTk; Fi; CjÞ: (8)
To complete the process of ADV computation, the cost of
not allocating fragment to a certain cluster (the fragment
retrieved remotely) should be defined and computed.
The cost of not allocating the fragment Fi issued by the
transaction Tk to the cluster Cj, denoted CN(Tk, Fi, Cj), is
defined as the sum of the following sub-costs: cost of local
retrievals issued by the transaction Tk to the fragment Fi
at cluster Cj (computed in equation 2), and the cost of
remote retrievals:
Retrieval ratio is used in the evaluation of the cost
of remote retrievals and represents the estimated
percentage of retrieval transactions in the WTDS.
The retrieval ratio is computed by dividing the
number of local retrieval frequencies in all WTDS
sites by the total number of all local retrievals
and update frequencies in all WTDS sites:
Rratio ¼
Xm
q¼1
FREQLRðTk; Fi; Cj; SqÞ
Xm
q¼1
FREQLRðTk; Fi; Cj; SqÞ
þ
Xm
q¼1
FREQLUðTk; Fi; Cj; SqÞ
!
:
(9)
Cost of remote retrievals issued by the transaction
Tk to the fragment Fi sent from remote clusters ðnÞ
in the WTDS. The cost of remote retrievals is
determined by retrieval ratio, retrieval frequency,
and communications cost that represents the
communications between all clusters. This cost is
computed as the product of the average cost of
56 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 8, NO. 1, JANUARY/FEBRUARY 2015
11. communications between all clusters, the average
frequency of remote retrieval, and the retrieval ratio:
CRRðTk; Fi; CjÞ ¼
Xn
q¼1
X
a¼1;b¼1;a6¼b
CCCðTk; Fi; Cq; ðSa; SbÞÞ
m
Â
Xn
q¼1;q6¼j
Xm
a¼1
FREQRRðTk; Fi; Cq; SaÞÞ
m
 Rraio:
(10)
Based on the previous not allocation sub-costs, the
mathematical model that represents the cost of not
allocating fragment to a certain cluster in the WTDS
is computed as:
CNðTk; Fi; CjÞ ¼ CLRðTk; Fi; CjÞ þ CRRðTk; Fi; CjÞ: (11)
Accordingly, the allocation decision value ADV deter-
mines the allocation status of the fragment at the cluster
and its sites. ADV is computed as a logical value from the
comparison between the cost of allocating the fragment to
the cluster CA(Tk, Fi, Cj) and the cost of not allocating the
fragment to the same cluster CN(Tk, Fi, Cj):
ADV ðTk; Fi; CjÞ ¼ f1; CNðTk; Fi; CjÞ
! CAðTk; Fi; CjÞ and 0; CNðTk; Fi; CjÞ
CAðTk; Fi; CjÞ
:
(12)
Therefore, if CN(Tk, Fi, Cj) is greater than or equal to
CA(Tk, Fi, Cj), then the allocation status ADV(Tk, Fi, Cj)
shows positive allocation benefit and the fragment is allo-
cated at that cluster, otherwise, CN(Tk, Fi, Cj) is less than
CA(Tk, Fi, Cj) where the allocation status ADV(Tk, Fi, Cj)
shows non-positive allocation benefit, so the fragment is not
allocated at that cluster. To formalize the fragment alloca-
tion procedures, we developed the allocation and replica-
tion algorithm shown in Table 7.
Once the fragments are allocated and replicated at the
web clusters, then the investigation of the benefit of allo-
cating the fragments at all sites of each cluster becomes
crucial. To get better WTDS performance, fragments
should be allocated over the sites of clusters where a
clear allocation benefit is accomplished or where data
availability is required. The same allocation and replica-
tion algorithm will be applied in this case, taking into
consideration the costs between sites in each cluster. For
example, Consider fragment 1 for allocation in clusters 1,
2, and 3 respectively. Based on Equations (2)-(12) that
determine the allocation decision value, Tables 8, 9, and
10 depict the calculations of such process.
The allocation decision value for fragment 1 will be com-
puted in the same way for the other clusters and for cluster
1 sites (5, 6, and 7) to determine which cluster/site can allo-
cate the fragment and which one will not.
4.4 Complexity of Computation
The time complexity of our approach is bounded by the time
complexity of the following incorporated entities: defining
fragments, clustering websites, fragments allocation, com-
puting average retrieval, and update frequencies. The time
complexity of each entity is computed as follows:
The complexity of defining fragments is O(RÃ
(R.size
()–1)2
), where R is the number of relations, and R.
size() is the average current number of records in
each relation.
The complexity of clustering websites is O((N2
– N)
log2n), where N is the number of sites in the WTDS.
The complexity of fragments allocation is O(R.size
()Ã
RÃ
S), where R.size() is the average current number
of records in each relation, R is the number of rela-
tions, and S is the number of sites.
The complexity of computing average retrieval and
update frequencies is O(FÃ
SÃ
AÃ
A.size()) where F is
the number of the fragments in the database, S is the
number of sites, A is the number of applications at
each site, and A.size() is the average current number
of records in each application.
Based on the complexity computations of the previous
incorporated entities, the time complexity of this approach
is bounded by: O(RÃ
(R.size() – 1)2
þ O((N2
– N) log2n) þ R.
size()Ã
RÃ
S þ FÃ
SÃ
AÃ
A.size()).
5 EXPERIMENTAL RESULTS AND PERFORMANCE
EVALUATION
To analyze the behavior of its computing service techni-
ques, we develop an IFCA software tool that is used to
validate the telemedicine database system performance.
This tool is not only experimental but it also supports the
use of knowledge extraction and decision making. In this
tool, we test the feasibility of the three proposed comput-
ing services’ techniques. The experimental results and the
performance evaluation of our approach are discussed in
the following sections.
5.1 Evaluation of Fragmentation Service
The internal validity of our fragmentation service technique
is tested on multiple relations of a WTDS. We apply the
fragmentation computing service in our IFCA tool on a set
of data records obtained from eight different transactions as
(Table 11). Fig. 5 depicts the experimental results that gener-
ate the disjoint fragments and their respective records.
Fig. 5 shows that our fragmentation computing service
satisfies the optimal solution of data fragmentation in
WTDS and generates the minimum number of fragments
for each relation. This helps in reducing the communica-
tions cost and the data records storage, hence improving
WTDS performance. We now evaluate the fragmentation
performance in terms of fragments storage size reduction.
Let Fperr represent the fragmentation performance percent-
age of the relation r, Rdrsr represent the relation data
records size generated from the database queries, and Rfsr
represent the relation final fragment data records size, then
Fperr is computed as:
Fperr ¼ ðRdrsr À RfsrÞ=Rdrsr: (13)
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE... 57
12. For example, when the fragmentation is performed for
relation 1 on data records sets generated from the database
queries of total size 115 kb and a final number of fragments
of a total size 33 kb, then the storage size that will be
considered for this relation is reduced from 115 to 33 kb
with no data redundancy. This action helps in improving
the system performance by reducing the data storage size of
about 71 percent.
TABLE 7
Fragments Allocation and Replication Algorithm
58 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 8, NO. 1, JANUARY/FEBRUARY 2015
13. To externally validate our approach, we implemented
the two fragmentation techniques that are proposed by Ma
et al. [8] and Huang and Chen [12] respectively and com-
pare their performance against our approach. Fig. 6 shows
number of fragments against the relation number for the
three techniques. The results in the figure show that more
large sized fragments are produced by methods in [8] and
[12] due to their fragmentation assumptions (See Section 2).
The results also show that less number of small size frag-
ments is generated by our computing service. Therefore, it
is clear that our fragmentation technique significantly out-
performs the approaches proposed by Ma et al. [8] and
Huang and Chen [12].
5.2 Evaluation of Clustering Service
To evaluate the performance accomplished by grouping
the telemedicine websites running under our clustering
service technique, we introduce a mathematical model to
calculate the performance gain in terms of the reduced
communication costs that can be saved from clustering
websites. The clustering performance gain is computed
as the result of the reduced costs of communications
divided by the sum of communications costs between
sites. The reduced communications costs are specifically
defined as the difference between the sum of costs that
are required for each site to communicate with remote
sites in the web system and the sum of costs that is
required for each cluster to communicate with remote
clusters. The performance evaluation of such mathemati-
cal model is expressed as follows: Let CPE represent
(clustering performance evaluation) and CC(Si, Sj)
TABLE 11
Computation of Fragment Allocation Decision Value
Fig. 5. Generating database fragments. Fig. 6. Database fragmentation performances.
TABLE 8
Cluster 1 Costs of Retrieval, Update, and Storage
TABLE 9
Average # of Retrieval, Update Frequencies, and Communication Costs
TABLE 10
Computation of Fragment Allocation Decision Value
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE... 59
14. represent the communication cost between any two sites;
Si and Sj in the cluster Ci where i, j ¼ 1; 2; 3; . . . ; n. Let
C C(Ci, Cj) represent the communication cost between
any two clusters; Ci and Cj where i, j ¼ 1; 2; 3; . . . ; n.
CPE can be defined as:
CPE ¼
Xn
i¼1
Xn
j¼1
CCðSi; SjÞ À
Xn
i¼1
Xn
j¼1
CCðCi; CjÞ
!
,
Xn
i¼1
Xn
j¼1
CCðSi; SjÞ:
(14)
For example, consider a telemedicine database system
simulated on 10 websites grouped in four clusters, each site
communicates with the other nine sites, and each cluster
communicate with the other three clusters taking into con-
sideration the communications within the cluster itself. For
simplicity, we assume the sum of communications costs
between sites is 697.5 ms/byte and between clusters is
181.42 ms/byte. By applying CPE (Equation (14)), the
communications cost is decreased to 74 percent of the origi-
nal communications cost.
To externally validate our clustering service, we imple-
ment the clustering techniques by Kumar et al. [13] and
Franczak et al. [15] and compare them with our clustering
service in terms of the number of generated clusters
against the number of websites (Fig. 7). The results in
Fig. 7 show that our clustering service generates more
clusters for smaller number of sites; hence it induces less
communication costs within the clusters. On the other
hand, the other techniques generate less clusters for large
number of sites, thus, they induce more communication
costs within the clusters. Fig. 7 shows that the clustering
trend increases with the increase in the number of net-
work sites in [13] and ours. In contrast, the number of
clusters generated by the clustering approach in [15] is
less due to their clustering approximation function that
uses natural logarithmic function. This in turn results in
maximizing the number of sites in each cluster which
increases the communications cost.
5.3 Evaluation of Fragment Allocation Service
To evaluate our fragment allocation computing, we use a set
of disjoint fragments obtained from our fragmentation ser-
vice, a set of the clusters and their sites generated from our
clustering service, and a set of average number of retrievals
and updates at each site (Table 12a). We apply the fragmen-
tation service in our IFCA software too using the site costs
of storage, retrieval, and update depicted in Table 12b. the
results are shown in Fig. 8.
We propose the following mathematical model to evalu-
ate the performance of fragment allocation and replication
service. Let IAFS represent the initial size of allocated frag-
ments at site S, n represent the maximum number of sites in
the current cluster, FAFS represent the final size of allocated
fragments at site S, and APEC represents the fragment allo-
cation and replication performance evaluation at cluster C,
then APEC is computed as:
Fig. 7. Clustering performance comparisons.
TABLE 12
(a) Fragments Retrieval and Update Frequencies (b) Costs of Storage, Retrieval, and Update
60 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 8, NO. 1, JANUARY/FEBRUARY 2015
15. APECc ¼
Xn
s¼1
IAFSs À
Xn
s¼1
FAFSs
,
Xn
s¼1
IAFSs
!
: (15)
For example, assume that 35 fragment allocations with
a total size of 91 kb are initially allocated to the WTDS
clusters. The final fragment allocation using our approach
is 14 with a total size of 27 kb. Therefore, the storage gain
of our allocation technique is about 70 percent which in
turn reduces fragments allocation average computation
time. Our proposed allocation technique is compared
with the allocation methods by Ma et al. [8] and Menon
[20]. These two approaches allocate large numbers of
fragments to each site and hence consume more storage
which in turn increase the fragments allocation average
computation time in each cluster. Fig. 9 illustrates the
effectiveness of our fragment allocation and replication
technique in terms of average computation time com-
pared to the fragment allocation methods in [8] and [20].
The figure shows that our fragment allocation and repli-
cation technique incurs the least average computation
time required for fragment allocation. This is because our
clustering technique produces more clusters of small
number of websites which reduces the fragment alloca-
tion average computation time in each cluster. We Infer
from Fig. 9 that the average computation time of fragment
allocation increases as the number of websites in the
cluster increases. Most importantly, the figure shows that
that our technique outperforms the ones in [8] and [20].
5.4 Evaluation of Web Servers Load
and Network Delay
The telemedicine database system network workload
involves queries from a number of database users who
access the same database simultaneously. The amount of
data required per second and the time in which the
queries should be processed depend on the database serv-
ers running under such network. For simplicity, 12 web-
sites grouped in four clusters are considered for network
simulation to evaluate the performance of a telemedicine
database system. The performance factors under evalua-
tion are the servers load and the network delay that is sim-
ulated in OPNET [23].
The proposed network simulation for building WTDS
is represented by web nodes. Each node is a stack of
two 3Com Super Stack II 1100 and two Super stack II
3300 chassis (3C_SSII_1100_3300) with four slots (4s), 52
auto-sensing Ethernet ports (ae52), 48 Ethernet ports
(e48), and 3 Gigabit Ethernet ports (ge3). The center
node model is a 3Comswitch, the periphery node model
is internet workstation (Sm_Int_wkstn), and the link
model is 10BaseT. The center nodes are connected with
internet servers (Sm_Int_server) by 10BaseT. The servers
are also connected with a router (Cisco 2514, node 15)
by 10BaseT. Fig. 10 depicts the network topology for the
simulated WTDS which consists of 12 websites nodes (0,
Fig. 8. Fragments allocation at clusters.
Fig. 9. Fragments allocation after clustering. Fig. 10. The WTDS network topology.
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE... 61
16. 1, 2, 3, 4, 9, 10, 11, 16, 17, 18, 20) and four clusters (6, 13,
14, and 22) connected with internet servers (7 and 8) that
are also connected with a router (Cisco 2514, node 15)
by 10BaseT.
The following sections evaluate the effect of server
load and network delay on the distributed database
performance.
5.4.1 Server Load
The server load determines the speed of the server in
terms of (bits/sec). Fig. 11 shows server’s load compari-
son between our approach and the approaches in Kumar
et al. [13] and Franczak et al. [15]. The servers are
denoted by the cluster legends C1, C2, C3, and C4
respectively. The results of the comparison are drawn
from Figs. 11 and 12, and are summarized in Table 13.
Table 13 shows that the server load increases as the
number of represented nodes (sites) decreases; this is
due to load distribution over the nodes assuming that
all nodes have the same capacity. On the other hand,
the server load decreases as the number of represented
nodes increases. The results state that the network is
almost balanced throughout processing the web services.
However, some variations are expected due to the possi-
ble variations in the number of processing transactions
on each node. Fig. 12 shows the maximum load (bits/sec)
on the servers’ clusters in our proposed clustering tech-
nique, and the clustering methods in [13] and [15]. The
results clearly show that our approach outperforms the
approaches in [13] and [15].
5.4.2 Network Delay
The network delay is the delay caused by the transactions
traffic on the web database system servers. The network
delay is defined as the maximum time (millisecond)
required for the network system to reach the steady state.
Fig. 13 shows the network delay caused by the WTDS
servers represented by the legends (C1, C2, C3, C4). The
figure indicates that the web database system reaches the
steady state after 0.065 milliseconds. It shows that the net-
work delay is less when distributing the websites over
four servers compared to the delay consumed when all
websites connect 1, 2, or 3 servers.
Fig. 14 shows the maximum network delay (sec) caused
by web servers against distance range for our clustering
technique, and the techniques in [13] and [15]. Note that the
network delay in our approach is always less than in [13]
and [15]; this is due to the better clustering computations of
our technique.
5.5 Threat to Validity
Threats to external validity limit the ability to generalize the
results of the experiments to industrial practice. In order to
avoid such threats in evaluation of our approach, we have
compared each of our proposed computing services’ techni-
ques; fragmentation, clustering and data allocation with
two other similar techniques proposed by different groups
of researchers. Each of our proposed techniques is imple-
mented with two comparable techniques well accepted by
the database systems community. The performance of our
proposed techniques is compared. The results show that
our fragmentation, clustering and allocation techniques out-
perform their counterparts proposed in the literatures.
6 CONCLUSION
In this work, we proposed a new approach to promote
WTDS performance. Our approach integrates three
enhanced computing services’ techniques namely, database
fragmentation, network sites clustering and fragments allo-
cation. We develop these techniques to solve technical chal-
lenges, like distributing data fragments among multiple
web servers, handling failures, and making tradeoff
between data availability and consistency. We propose an
estimation model to compute communications cost which
helps in finding cost-effective data allocation solutions. The
novelty of our approach lies in the integration of web data-
base sites clustering as a new component of the process of
Fig. 11. WTDS servers’ load comparison.
Fig. 12. Servers’ Max. load for clustering techniques.
TABLE 13
WTDS Servers’ Load Comparison
62 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 8, NO. 1, JANUARY/FEBRUARY 2015
17. WTDS design in order to improve performance and satisfy a
certain level of quality in web services.
We perform both external and internal evaluation of
our integrated approach. In the internal evaluation, we
measure the impact of using our techniques on WTDS
and web service performance measures like communica-
tions cost, response time and throughput. In the external
evaluation, we compare the performance of our approach
to that of other techniques in the literature. The results
show that our integrated approach significantly improves
services requirement satisfaction in web systems. This
conclusion requires more investigation and experiments.
Therefore, as future work we plan to investigate our
approach on larger scale networks involving large num-
ber of sites over the cloud. We will consider applying dif-
ferent types of clustering and introduce search based
technique to perform more intelligent data redistribution.
Finally, we intend to introduce security concerns that
need to be addressed over data fragments.
REFERENCES
[1] J.-C. Hsieh and M.-W. Hsu, “A Cloud Computing Based 12-Lead
ECG Telemedicine Service,” BMC Medical Informatics and Decision
Making, vol. 12, pp. 12-77, 2012.
[2] A. Tamhanka and S. Ram, “Database Fragmentation and Alloca-
tion: An Integrated Methodology and Case Study,” IEEE Trans.
Systems, Man and Cybernetics, Part A: Systems and Humans, vol. 28,
no. 3, pp. 288-305, May 1998.
[3] L. Borzemski, “Optimal Partitioning of a Distributed Relational
Database for Multistage Decision-Making Support systems,”
Cybernetics and Systems Research, vol. 2, no. 13, pp. 809-814, 1996.
[4] J. Son and M. Kim, “An Adaptable Vertical Partitioning Method in
Distributed Systems,” J. Systems and Software, vol. 73, no. 3,
pp. 551-561, 2004.
[5] S. Lim and Y. Ng, “Vertical Fragmentation and Allocation in Dis-
tributed Deductive Database Systems,” J. Information Systems,
vol. 22, no. 1, pp. 1-24, 1997.
[6] S. Agrawal, V. Narasayya, and B. Yang, “Integrating Vertical and
Horizontal Partitioning into Automated Physical Database
Design,” Proc. ACM SIGMOD Int’l Conf. Management of Data,
pp. 359-370, 2004.
[7] S. Navathe, K. Karlapalem, and R. Minyoung, “A Mixed Fragmen-
tation Methodology for Initial Distributed Database Design,”
J. Computer and Software Eng., vol. 3, no. 4, pp. 395-425, 1995.
[8] H. Ma, K. Scchewe, and Q. Wang, “Distribution Design for
Higher-Order Data Models,” Data and Knowledge Eng., vol. 60,
pp. 400-434, 2007.
[9] W. Yee, M. Donahoo, and S. Navathe, “A Framework for Server
Data Fragment Grouping to Improve Server Scalability in Inter-
mittently Synchronized Databases,” Proc. ACM Conf. Information
and Knowledge Management (CIKM), 2000.
[10] A. Jain, M. Murty, and P. Flynn, “Data Clustering: A Review,”
ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.
[11] Lepakshi Goud, “Achieving Availability, Elasticity and Reliability
of the Data Access in Cloud Computing,” Int’l J. Advanced Eng.
Sciences and Technologies, vol. 5, no. 2, pp. 150-155, 2011.
[12] Y. Huang and J. Chen, “Fragment Allocation in Distributed Data-
base Design,” J. Information Science and Eng., vol. 17, pp. 491-506,
2001.
[13] P. Kumar, P. Krishna, R. Bapi, and S. Kumar, “Rough Clustering
of Sequential Data,” Data and Knowledge Eng., vol. 63, pp. 183-199,
2007.
[14] K. Voges, N. Pope, and M. Brown, “Cluster Analysis of Marketing
Data Examining Online Shopping Orientation: A comparison of
K-means and Rough Clustering Approaches,” Heuristics and Opti-
mization for Knowledge Discovery, H.A. Abbass, R.A. Sarker, and C.
S. Newton, eds., pp. 207-224, Idea Group Publishing, 2002.
[15] A. Fronczak, J. Holyst, M. Jedyank, and J. Sienkiewicz, “Higher
Order Clustering Coefficients,” Barabasi-Albert Networks, Physica
A: Statistical Mechanics and Its Applications, vol. 316, no. 1-4,
pp. 688-694, 2002.
[16] M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “Clustering Algo-
rithms and Validity Measures,” Proc. 13th Int’l Conf. Scientific and
Statistical Database Management (SSDBM), 2001.
[17] A. Ishfaq, K. Karlapalem, and Y. Kaiso, “Evolutionary Algorithms
for Allocating Data in Distributed Database Systems,” Distributed
and Parallel Databases, vol. 11, pp. 5-32, Kluwer Academic Pub-
lishers, 2002.
[18] C. Danilowicz and N. Nguyen, “Consensus Methods for Solving
Inconsistency of Replicated Data in Distributed Systems,” Distrib-
uted and Parallel Databases, vol. 14, pp. 53-69, 2003.
[19] R. Costa and S. Lifschitz, “Database Allocation Strategies for Par-
allel BLAST Evaluation on Clusters,” Distributed and Parallel Data-
bases, vol. 13, pp. 99-127, 2003.
[20] S. Menon, “Allocating Fragments in Distributed Databases,” IEEE
Trans. Parallel and Distributed Systems, vol. 16, no. 7, pp. 577-585,
July 2005.
[21] N. Daudpota, “Five Steps to Construct a Model of Data Allocation
for Distributed Database Systems,” J. Intelligent Information Sys-
tems: Integrating Artificial Intelligence and Database Technologies,
vol. 11, no. 2, pp. 153-68, 1998.
[22] Microsoft SQL Server Available from: http://www.microsoft.
com/sql/editions/express/default.mspx, 2012.
[23] MySQL 5.6 http://www.mysql.com/, 2013.
[24] OPNET IT Guru Academic, OPNET Technologies, Inc. http://
www.opnet.com/university_program/itguru_academic_edition/,
2003.
[25] J. Hauglid, N. Ryeng, and K. Norvag, “DYFRAM: Dynamic Frag-
mentation and Replica Management in Distributed Database Sys-
tems,” Distributed and Parallel Databases, vol. 28, pp. 157-185, 2010.
[26] Z. Wang, T. Li, N. Xiong, and Y. Pan, “A Novel Dynamic Network
Data Replication Scheme Based on Historical Access Record and
Proactive Deletion,” J. Supercomputing, vol. 62, pp. 227-250, Oct.
2011, DOI: 10.1007/s11227-011-0708-z. Published online.
[27] S. Khan and I. Ahmad, “Replicating Data Objects in Large Distrib-
uted Database Systems: An Axiomatic Game Theoretic Mecha-
nism Design Approach,” Distributed and Parallel Databases, vol. 28,
no. 2/3, pp. 187-218, 2010.
[28] H. KhanS and L. Hoque, “A New Technique for Database Frag-
mentation in Distributed Systems,” Int’l J. Computer Applications,
vol. 5, no. 9, pp. 20-24, 2010.
[29] S. Jagannatha, M. Mrunalini, T. Kumar, and K. Kanth, “Modeling
of Mixed Fragmentation in Distributed Database Using UML 2.0,”
IPCSIT, V. 2, pp. 190-194, 2011.
Fig. 13. The WTDS network delay.
Fig. 14. Max. Net. delay for clustering techniques.
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE... 63
18. [30] A. Morffi, C. Gonzalez, W. Lemahieu, and L. Gonzalez,
“SIADBDD: An Integrated Tool to Design Distributed Databases,”
Revista Facultad de Ingenieria Universidad de Antioquia ISSN
(Version impresa): 0120-6230, No. 47, pp. 155-163, Mar. 2009.
[31] M.T. €Ozsu and P. Valduriez, Principles of Distributed Databases.
Third ed., 2011.
[32] G. Mao, M. Gao, and W. Yao, “An Algorithm for Clustering XML
Data Stream Using Sliding Window,” Proc. the Third Int’l Conf.
Advances in Databases, Knowledge, and Data Applications, pp. 96-101,
2011.
[33] M.P. Paix~ao, L. Silva, and G. Elias, “Clustering Large-Scale, Dis-
tributed Software Component Repositories,” Proc. the Fourth Int’l
Conf. Advances in Databases, Knowledge, and Data Applications,
pp. 124-129, 2012.
[34] G. Decandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman,
A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels,
“Dynamo: Amazon’s Highly Available Key-Value Store,” Proc.
ACM Symp. Operating Systems Principles, pp. 205-220, 2007.
[35] http://en.wikipedia.org/wiki/MongoDB, Nov. 2013.
[36] F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M.
Burrows, T. Chandra, A. Fikes, and R.E. Gruber, “Big Table: A
Distributed Storage System for Structured Data,” Proc. Seventh
Symp. Operating Systems Design and Implementation, pp. 205-218,
2006.
Ismail Hababeh received the bachelor’s degree
in computer science from the University of
Jordan, the master’s degree in computer science
from Western Michigan University, and the PhD
degree in computer science from Leeds Metro-
politan University, United Kingdom. His research
areas of particular interest include, but are not
limited to the following: distributed databases,
cloud computing, network security, and systems
performance.
Issa Khalil received the BSc and MS degrees
from Jordan University of Science and Technol-
ogy in 1994 and 1996, and the PhD degree from
Purdue University, in 2007, all in computer engi-
neering. Immediately thereafter, he joined the
College of Information Technology (CIT) of the
United Arab Emirates University (UAEU) where
he was promoted to an associate professor in
September 2011. In June 2013, he joined Qatar
Computing Research Institute (QCRI) as a senior
scientist with the cyber security group. His
research interests span the areas of wireless and wire-line communica-
tion networks. He is especially interested in security, routing, and perfor-
mance of wireless sensor, ad hoc and mesh networks. His recent
research interests include malware analysis, advanced persistent
threats, and ICS/SCADA security. He served as the technical program
co-chair of the Sixth International Conference on Innovations in Informa-
tion Technology, and was appointed as a Technical Program Committee
member and a reviewer for many international conferences and journals.
In June 2011, he was granted the CIT outstanding professor award for
outstanding performance in research, teaching, and service.
Abdallah Khreishah received the BS degree
with honors from Jordan University of Science
Technology in 2004. He received the MS and
PhD degrees in electrical and computer engi-
neering from Purdue University in 2006 and
2010, respectively. In Fall 2012, he joined the
ECE Department of New Jersey Institute of Tech-
nology as an assistant professor. His research
spans the areas of network coding, wireless net-
works, cloud computing, and network security.
64 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 8, NO. 1, JANUARY/FEBRUARY 2015