NOVEL FUNCTIONAL DEPENDENCY APPROACH FOR STORAGE SPACE OPTIMISATION IN GREEN MICROBIAL DATA CENTERS
1. Problem statement
One prominent concern in the establishment of green data centers is
to decrease carbon footprint and operating costs (e.g. cooling
systems for data centers) by reducing the amount of physical data
storages required. Scientific applications which rely on large data
volumes require physical data storages that are not only
impractically large to maintain, but also contribute to inefficient
power consumption as more electrical power is needed to run the
additional data servers and to cooling-off those servers.
The issue concerning data centers has been raised in a recent
estimation which stated that the world’s data centers currently
consume about 330 billion kWh of electricity every year, which is
almost equal to the entire electricity demand of the UK [1]. In
addition, power consumption that exceeds 100 billion kWh generate
approximately 40, 568, 000 tons of CO2 emissions [2,3,4]. Thus, in
establishing successful green data centers, adding more data
servers is not an interesting option to choose in dealing with the
storage space issue as this option leads to undesirable increase in
power consumption and in CO2 emissions. Figure 1 illustrates data
servers and the cooling process in Microsoft’s green data centers
which contribute to power consumption.
Figure 1: Cooling process in Microsoft’s Green Data Center [5]
By optimising the available database storage required to store large
data volumes, the requirements for physical data storages can be
reduced. Nevertheless, studies on how to accurately optimise
storage space that consider knowledge of semantics of applications
is limited. Space optimisation techniques that are available to date
(e.g. data compression) are designed based on the assumption that
all data within the optimised-to-be database can be exploited for
space optimisation.
Objectives
1. To design an algorithm for the storage space optimisation
(proxy-based).
2. To evaluate the accuracy of queries submitted against the
smaller, optimised database and the amount of space saved.
3. To approximate the correlation between data center’s power
consumption and space saving.
NOVEL FUNCTIONAL DEPENDENCY APPROACH FOR STORAGE SPACE
OPTIMISATION IN GREEN MICROBIAL DATA CENTERS
Nurul A. Emran1, Hamidah Ibrahim2, Azah K. Muda1, Mohd N.M. Isa3
Universiti Teknikal Malaysia Melaka (UTeM)1, Universiti Putra Malaysia (UPM)2, Malaysia Genome Institute3
Introduction
Methodology
Conclusion
References
Results & Discussion
Literature Review
Acknowledgement
[1] G. Cook and J. Van Horn. How Dirty Is Your Data? A Look at the Energy Choices That
Power Cloud. Greenpeace International, 2011.
[2] V. Kumar. Algorithm for Constraints-Satisfaction Problems: A Survey. AI Magazine 13(1),
1992.
[3] K. Kang, S. Cohen, J. Hess and W. Novak and A. Peterson. Feature-Oriented Domain
Analysis (FODA) Feasibility Study, 1990.
[4] S. Hazelhurst. Scientific Computing Using Virtual high-Performance Computing: A Case
Study Using the Amazon Elastic Computing Cloud. Proceedings of the 2008 Annual
Research Conference of the South African Institute of Computer Scientists and Information
Technologists on IT Research in Developing Countries: Riding the Wave of Technology,
pages 94-103. ACM, 2008.
[5] Gregwid, Datacenter Architecture for Environmental Sustainability – “Green Datacenters”.
Technet Blogs. http://blogs.technet.com/b/nymciblog/archive/2008/03/21/datacenter-
architecture-for-environmental-sustainability-green-datacenters.aspx, 2008. [Online;
accessed 25-January-2012].
[6] E. Lai. Oracle Pushes Compression as Cheaper Database Scale-Up Method.
Computerworld White Paper, 2008.
[7] C. Eaton. Compression Comparison to Oracle and Microsoft.
http://it.toolbox.com/blogs/db2luw/compression-comparison-to-oracle-and-microsoft-8871,
2006. [Online; accessed 25-January-2012].
[8] L. Freeman. Looking Beyond the Hype: Evaluating Data Deduplication Solutions.
http://www.techrepublic.com/whitepapers/looking-beyond-the-hype-evaluating-data-
deduplication-solutions/1294015, 2007. [Online; accessed 25-January-2012].
[9] Emran, N.A., Abdullah, N. & Isa, M.N.M., 2013. Storage space optimisation for green data
center. In Procedia Engineering. pp. 483–490.
[10] Emran, N.A. et al., 2013. Reference Architectures to Measure Data Completeness
across Integrated Databases. In ACIIDS 2003 Part 1. Springer-Verlag Berlin Heidelberg, pp.
216–225.
[11] Emran, N.A., Embury, S. & Missier, P., 2014. Measuring Population-Based Completeness
for Single Nucleotide Polymorphism (SNP) Databases. In J. Sobecki, V. Boonjing, & S.
Chittayasothorn, eds. Advanced Approaches to Intelligent Information and Database
Systems. Cham: Springer International Publishing, pp. 173–182.
[12] Emran, N., Embury, S. & Missier, P., 2008. Model-driven component generation for
families of completeness. In 6th International Workshop on Quality in Databases and
Management of Uncertain Data, Very Large Databases (VLDB).
[13] Emran, N.A, (2015), “Data Completeness Measures” Advances in Intelligent Systems
and Computing, (ISSN 2194-5357), Springer.
The researchers would like to thank the financial assistance provided
by the Ministry of Higher Education, Malaysia during the course of this
research. This research is registered under the research grant with
Vott Number : FRGS (RACE)/2012/FTMK/SG05/02/1 F00155
One way to reduce storage space requirement is by optimising
the available database space. In fact, the need to optimise space is
not new, as tools and techniques for this purpose provided by
enterprise data storage vendors (such as Oracle [5,6] and DB2 [7])
have been available in the market for about a decade. At the relational
table level, data compression tools, for example, apply a repeated
values removal technique to gain free space [6]. In addition, data
deduplication techniques remove duplicate records in the table to gain
storage space [8]. The idea behind these space optimisation solutions
is to exploit the presence of overlaps (of values or records) within
tables. Both of these techniques are performed at the level of whole
tables. A key (though often unstated) assumption behind these
optimisation techniques is that all columns can be exploited for space
optimisation. Because of this assumption, knowledge of semantics of
applications (i.e., how the columns are used) is ignored and as the
consequence, data center providers need to bear unnecessary query
processing overhead for frequent compression (and decompression)
of heavily queried data.
This study will conclude with the recommendations on the suitability of
the proxy-based technique to optimise database space for a microbial
data center, which is chosen as a case study to support the
establishment of green data center in the microbial domain.
100 97
58
36
00 0.03
0.42
0.64
1
0
0.2
0.4
0.6
0.8
1
1.2
0
20
40
60
80
100
120
G3error
FDaccuracy(%)
Proxy candidates
FDs Accuracy
Percentage (%)
G3 Errors