SlideShare a Scribd company logo
1 of 11
Download to read offline
Energy Systems Research Unit Email: esru@strath.ac.uk
Dept. of Mechanical and Aerospace Engineering Tel: +44 (0)141 548 2314
75 Montrose Street, Glasgow G1 1XJ http://www. strath.ac.uk/esru
The University of Strathclyde is a charitable body, registered in Scotland, number SC015263
ESRU occasional paper
Survey of a matrix of computing hardware and
compilation influences on the deployment of ESP-r and
EnergyPlus
Dr. Jon W. Hand
Energy Systems Research Unit
13 October 2015
1
SURVEY OF A MATRIX OF HARDWARE AND COMPILATION INFLUENCES ON1
THE DEPLOYMENT OF ESP-r AND ENERGYPLUS2
3
Dr. Jon W. Hand1
4
1
Energy Systems Research Unit, University of Strathclyde, Glasgow, Scotland5
6
13 October 20157
8
9
10
Contents'11
!12
ABSTRACT .............................................................................................................................................................2!13
INTRODUCTION ....................................................................................................................................................2!14
HARDWARE ISSUES.............................................................................................................................................2!15
Disk types and size ...............................................................................................................................................2!16
Computer memory................................................................................................................................................3!17
Virtual (cache) memory........................................................................................................................................3!18
Processor type.......................................................................................................................................................3!19
Computational platforms considered....................................................................................................................4!20
COMPILER DIRECTIVES......................................................................................................................................4!21
The test models.....................................................................................................................................................5!22
Building models for ESP-r & EnergyPlus............................................................................................................6!23
Data recovery timings...........................................................................................................................................7!24
EnergyPlus models ...............................................................................................................................................8!25
CONCLUSION.........................................................................................................................................................8!26
ACKNOWLEDGEMENT........................................................................................................................................9!27
REFERENCES .........................................................................................................................................................9!28
29
30
31
32
33
2
1
ABSTRACT
This is an interim report from an ongoing
investigation of the relative contribution of various
hardware and compiler options on the efficacy of
specific ESP-r and EnergyPlus simulation tasks. The
paper draws on a range of techniques and
methodologies developed to port ESP-r to ARM
platforms such as the Raspberry Pi and extends them
across a range of traditional and emerging computing
platforms. Adapting to the hardware limits of ultra
low cost computers exposed a number of issues
related to disk access, disk type, memory, number of
computing cores, compiler options as well as the use
of virtual machines for a range of tool development
and simulation tasks. Among other findings is that
CFD convergence times can be reduced from 41s to
less than 10s, annual multi-domain assessments from
950s to 280s and non-optimal memory and virtual
computing can extend data recovery times by more
than a factor of 10.
INTRODUCTION
In 2012 the Raspberry Pi (www.raspberrypi.org) was
introduced, primarily as a vehicle to address a lack of
programming skills in UK schools. Its combination
of price and computational power proved attractive to
a far wider audience, including the 'maker'
community. This spawned competitors such as the
BeagleBone Black (beagleboard.org) and an
ecosystem of related add-on devices as well as a
community of developers testing the bounds of this
new class of device which were typically distributed
with Linux.
Few perceived such lightweight platforms might
support numerical simulation. However, in a
historical context, the development of simulation
tools and many classic numerical studies were
accomplished with even more constrained
computational resources At the eSim conference in
2014 the author presented the results of an initial port
of ESP-r to the original Raspberry Pi and
BeagleBone Black ARM-based computers and
observations of their use as software development
platforms as well as for carrying out various
performance assessment goals for different user
types.
From the subset of simulation tasks that the first
generation supported, subsequent ARM-based
computers, for example the Odroid-U3 from Korea
(www.hardkernel.com), have include multiple cores,
more memory and faster disk access. The user
experience gap between a conventional desktop
computer and the $70 Odoid-U3 is surprisingly
modest. For example, simultaneously editing of a
3200 surface ESP-r model while running a CFD
assessment and an Octave (an open source equivalent
to MatLab) turbine blade analysis session does not
saturate the Odroid's resources. However, less
constrained hardware is only part of the story.
The author observed that particular adjustments to
the numerical source code and to the compiling tool
chain resulted in significant improvements in the
build process, subsequent user interactions and the
run times for assessments. The magnitude of
improvement was dependent on the specifics of the
hardware, the complexity of the model and the nature
of the assessments carried out.
This paper assesses whether the techniques explored
during the ARM study are applicable in a broader
context of numerical tools, machine configurations
and ordering of simulation work tasks. Both ESP-r
(www.esru.strath.ac.uk/publications.htm ) and
EnergyPlus (apps1.eere.energy.gov/buildings/
energyplus/) are used to test this idea. As many of
the constraints noted in ARM platforms are found in
older laptops and workstations the study also assess
the extent of improvement for such computers as
well as for computers which fit the conventional
definition of a numerical workstation. The author
observed that the details of computer hardware, e.g.
type of disk, provision of physical memory and
virtual memory and processor type had an impact on
the time it took to carry out specific simulation tasks
on models of different complexity.
HARDWARE ISSUES
Disk types and size
Most ARM single board computers (SBC) rely on
SDHC cards for disk storage as well as swap space.
SDHC cards were not designed for operating systems
and disk I/O is substantially constrained. Some SBC
and tablets make use of eMMC storage that are mid-
way between SDHC cards and rotational drives in
terms of speed. Conventional computers often
include slower rotational drives rather than SSD
drives. SBC are typically paired with 8GB or 16GB
SDHC cards or eMMC for reasons of cost and this
constrains the space available for the build process
(EnergyPlus requires ~4GB to build) as well as the
space available for simulation files and performance
prediction files.
Another class of drive are the virtual drives
implemented within virtual computers. Tests indicate
that the overheads involved result in disk access
roughly half the speed of the computers native drive.
Disk I/O associated with lots of small files or with
random access files can be a magnitude slower than
sequential reads and writes reported in benchmarks.
ESP-r models may consist of scores of small files and
performance predictions for each domain are held in
3
sequentially written binary files with data recovery
mostly involving random access. Conversly,
EnergyPlus is reading few files and sequentially
writing ASCII files and optionally a SQL database.
Benchmarking tools such as (a) Blackmagick disk
speed test on OSX and (b) CrystalDiskMark on
Windows use various disk I/O tests for sequential
and random access. Table 1 shows their reporting
across a range of devices. This is only somewhat
indicative of the mix of simulation tool data recovery
tasks which have been assessed.
Table 1 Typical disk I/O speeds MB/s
Drive Type Sequential (a) Random (b)
write read write read
SDHC class 10 10-20 20 1.6 5.3
eMMC Odroid 15 55
USB 2 stick 2-8 18 0.03 5.4
USB 3 stick 15-20 60 0.6 4.1
USB 3 rotational 60 65 1.9 0.5
Old 2.5" rotational 40 45
USB 3 SSD 110 245 1.2 1.6
Network drive 110 110 1.2 1.6
T61 SSD 110 130 33 22
T61 Rotational 60 60 0.3 0.4
Dell 7010
rotational
90 77 1.5 1.2
Dell 755 rotational 84 87
Dell 755 virtualbox 42 47
Macbook Air SSD 222 600 99 20
Many simulation tasks are disk-bound. ESP-r has a
number of user choices which impact the size of the
results files created: the extent of performance data
written (i.e. save level), the period of the assessment
and the building and system time step used. An ESP-
r model at the limits of geometric complexity running
several months at a one minute building time step
and systems time step of one second time step might
generate upwards of 40 GB. For example an annual
15 minute time step 1890s villa model constrained
performance file is 354MB and the extensive file is
4.27GB. EnergyPlus also incudes optional directives
which may constrain the number of entities which are
reported on or, for example omit or constrain SQL
outputs..
Computer memory
Constrained resource computers often run Linux
because of the memory footprint of the operating
system. With ~512MB RAM there is still ~400MB
available. One of the initial challenges of porting
ESP-r was to create a suite of executables that could
run in their usual combinations within the available
memory. Rather than purge multi-domain
functionality the route taken was to constrain model
complexity via alternative sets of header files for
small and standard deployments. Small deployments
are targeted at low resource computers, but can be
advantageous when running assessments within
virtual computers.
Memory is also an issue during the build process,
some compile options require substantially more
memory and may drive the whole process into virtual
memory. The build process on ESP-r essentially
doubled in speed when low resource linking
commands are used. EnergyPlus is hungry for
memory and disk space during the build process and
was rarely successful if there is less than ~800MB
RAM and 4GB of free disk space available.
Given sufficient memory, operating systems use
memory as a buffer for the disk I/O associated with
simulations and subsequent data-mining. There is a
considerable speedup if free memory is greater than
the size of the files being written. It is also the case
that where multiple assessments need to be carried
out it is often much faster to:
simulate_a extract_a simulate_b extract_b
rather than
simulate_a simulate_b extract_a extract_b.
Thus, critical adjustments to the scope of assessment
or the ordering of tasks can reduce the penalty for
data extraction across most platforms.
Virtual (cache) memory
Virtual memory via a swap file on the disk is used
when physical memory is depleted. If slow disks are
combined with limited memory then swap is
increasingly used and performance degrades. This
was evident in the initial porting of ESP-r to the
Raspberry Pi.
Tools such as ESP-r are composed of many modules,
for example the ESP-r project manager (prj) will
invoke the simulation engine (bps) and later the
results analysis module (res). Although usually not
an issue in conventional deployments, with
constrained memory simulation executables need to
be constrained in size to avoid running in swap. The
approach taken in ESP-r is to have alternative header
files that support different levels of model
complexity. This is also helpful when running
simulation within virtual environments or where
separate processor cores are used for parallel
assessments.
Processor type
ESP-r has had a long history of deployment on
different computer platforms and operating systems.
One of the challenges of the study was to build
EnergyPlus on ARM. Although most users perceive
EnergyPlus as a simulation engine, it runs within the
context of a set of pre-processing and post-processing
utilities. The simulation engine is relatively
straightforward to compile from the Fortran source,
however the utilities (e.g. ) in the standard 8.1
distributions are pre-compiled executables. It proved
4
difficult to compile the complete set of utilities from
scratch for use on ARM. Eventually the 8.2
EnergyPlus source was used.
Computational platforms considered
The context of the study is a range of computers
spanning ultra low cost computers, tablets,
legacycomputers as well as conventional
workstations. The mix of hardware configurations
allowed many of the hardware sensitivities to be
explored. For this study, most comparisons are done
with computers configured to run Linux or emulating
Linux. With minor variants, the command syntax,
form of user interaction, benchmarking tools,
operating system resource requirements and support
for scripting are roughly consistent. The list below
summarises the computer platforms in terms of .
Name, computer type, CPU, CPU speed, RAM,
Operating system, compiler, swop space and epoch.
1. Dell 7010, desktop, 4x Intel i5-3470 @3.2 GHz,
8GB ram, Ubuntu 14.04.1 LTS. Linux 3.11.0,
GCC 4.8.2, Cache 8061 Mb, 2012
2. Macbook Air, laptop, 2x Intel i5 @ 1.3 GHz,
4GB RAM, OSX 10.8.3, GCC 4.7, Cache 4GB,
SSD, 2013
3. Dell 755 2x, desktop, Intel Core 2 Duo E6550 @
2.33GHz 3.91GB RAM, Mint 16, GCC 4.8.1,
Cache 4049 MB, 2007
4. Dell 755 virtualbox 1.9GB memory 1 processor,
WattOS, GCC 4.7.2
5. IBM thinkpad T61, laptop, Intel Core2 2.0 GHz
2GB RAM Linux Mint, GCC 4.8, SSD, 2007
6. IBM thinkpad T61, laptop, Intel Core2 2.0 GHz
2GB RAM Linux Mint, GCC 4.8
7. EeePC901, netbook, Atom N270, 1.6GHz,1GB
RAM, WattOS, GCC 4.7.2 , Cache 512 Mb,
SSD, 2008
8. Odroid, SBC, ARMv7l, 2GB RAM, Ubuntu
14.04 GCC 4.8.2 , Cache 0 Mb, eMMC, 2014
9. HUDL, tablet, ARMv7l 1GB RAM, Debian 7.7,
GCC 4.6.3, Cache 0 Mb, eMMC, 2013
10. IBM T23, laptop, P-III 1.2Ghz 1GB RAM,
Vector Linux, GCC 4.5.2, Cache 1024Mb,2003
11. Raspberry Pi 2 SBC, ARMv7l, 762MB RAM,
Debian 7.8, GCC 4.6.3, Cache 921 Mb, SDHC,
2015
12. BBB, SBC, ARMv7l, 507GB RAM, Linux
3.8.13, ? GCC 4.7.2 , Cache 820 Mb, SDHC,
2013
13. Raspberry Pi, SBC, ARMv6l, 481MB RAM,
Debian 7.2, GCC 4.6.3, Cache 921 Mb, SDHC,
2012
Standard Linux Hardinfo benchmarks
<sourceforge.net/projects/hardinfo.berlios> are
shown in Table 2 (ordered by the FFT benchmark).
In the table VB denotes a virtual computer. Notice
the impact of running a virtual computer. The newer
ARM processors are in the same magnitude as older
laptops for single core performance. However they
tend to perform rather better for simulation tasks than
the benchmarks indicate.
Table 2 Typical benchmarks
Computer CPU
Blowfish
Crypto-
hash
Fabo-
nacci
FFT
Dell 7010 2.5 712.7 1.4 0.7
Dell 755 7.4 187.0 3.7 3.5
Dell 755 VB 14.6 89.7 3.8 7.1
IBM T61 8.7 161.1 4.1 3.9
IBM T23 37.5 32.6 8.9 23.1
EeePC901 20.6 41.2 7.5 28.5
HUDL 30.3 35.0 7.5 29.5
Odroid 28.4 37.0 6.9 28.8
Raspberry
Pi 2
55.5 17.2 14.5 69.9
BBB 47.6 - 14.7 74.3
Raspberry
Pi (orig)
73.8 - 21.7 119.
4
In practice, both EnergyPlus and ESP-r executables
run on a single core. ESP-r is a suite of applications
so there are usually more than one application active
and thus a second core is useful. This study found
that some sequential and most parallel invocations of
simulation assessments were disk-bound.
COMPILER DIRECTIVES
The impact of options in the compilation tool chain
have rarely been discussed within the simulation
community. Some tools which are normally compiled
from source, such as Radiance default to directives
for speed of execution. The standard distribution of
EnergyPlus is also optimised for speed. ESP-r,
which is distributed as source, the Install script
included no optimisation directives before is study.
The compiler tool chain is based on GNU GCC so
similar optimisations to be applied to both ESP-r and
EnergyPlus. The optimisation directives
(gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html )
for the GCC compiler suite are:
1. -O0 (no optimisation, fastest compile time,
useful for debug sessions),
2. -O1 (attempts to reduce code size and execution
time with least impact on compilation time, more
memory required),
3. -O2 (additional attempt to reduce execution
time, increased compilation time and memory
required)
4. -O3 (additional in-line function and loop
optimisation, even longer compile time)
To quantify the impact of alternative build directives,
both ESP-r and EnergyPlus were re-compiled using
the O0, O1, O2 and O3options. For low resource
computers, the small version header files were used.
Table 3 shows ESP-r build times. For most platforms
there is are significant time impacts from O0 to O1
5
but little between O2 and O3. The full install
included building the databases and configuring 280
training and validation models. Virtual box (VB) are
Linux running under the host Linux. Rotational (rot)
and SSD are also listed separately.
Table 3 Build time for ESP-r
Computer Install type -O0 -O1 -O2 -O3
Dell 7010 full 6m5s 13m34s 17m48 18m16s
Mac Air full 8m38s 12m22s 17m5s 18m39s
Dell 755 full 11m6s 28m2s 37m53s 39m17s
Dell 755 VB full 18m22s? 24m28s? 30m59s 36m36s
IBM T61 rot full 12m12s
IBM T61 SSD full 11m39s 30m03s 39m50s 41m51s
IBM T61 SSD
W7 MSYS2
full 27m58s 33m19s 53m35
IBM T23 full 51m14s 91m 113m 140m
Odroid full 34m48s 61m 92m 98m
EeePC 901 full 39m15s 63m 84m 91m
Raspberry Pi 2 full 60m8s 125m 170m 177m
Raspberry Pi full 106m 329m 478m -
Clearly for development work the -O0 option has
benefits, especially given the time involved for low-
resource computers. The subsequent discussion about
simulation task times will clearly demonstrate the
benefit of distributing -O1 or -O2 executables across
all machine types, especially older and low resources
computers.
The details of EnergyPlus builds is shown below for
two platforms where EnergyPlus and all the utilities
were compiled from scratch. For OSX and Linux on
Intel the standard distribution utilities were used with
separate EnergyPlus versions compiled with -O0 -O1
-O2 -O3.
• Dell 7010 make install -O0 47m27s
• Raspberry Pi 2 make install -O0 231m24s
The test models
Models were selected for ESP-r and EnergyPlus
representing different levels of complexity as well as
exercising different solvers. For CFD assessments
the suite of CFD benchmarks developed by Ian
Beausoleil-Morrison (2000) have been used. The four
cases highlighted are:
1. basic.dfd 960 cells, 1 inlet, 1 outlet, no
blockages k-e pressure and velocity solved ~260
iterations
2. porous.dfd 960 cells, 1 inlet, 1 outlet 18
blockages k-e pressure and velocity solved ~400
iterations
3. bi-cg.dfd 24360 cells, 4 inlets, 4 outlets k-e solve
pressure velocity ~2000 iterations
Figure 1 shows a variant with an wall inlet, ceiling
extract and a grid of internal blockages.
Figure 1 CFD domain with internal blockages
CFD is numerically intensive with little disk I/O.
Looking at CFD performance across the matrix of
hardware and software optimisation in Table 4, it is
clearly possible to use resource-constrained
computers in combination with high optimisation.
ARM has a big step in performance between O0 &
O1, moderate gains with O2 while O3 is marginal.
OSX has a big step in performance from O1 & O2
and marginal gains from O3. Linux on Intel has a big
step from O0 to O1 and little or no improvement with
O2 & O3. When running on a virtual computer a -O1
or O2 will perform roughly in line with un-optimised
software on the host computer. The Raspberry Pi 2
optimises better than the BBB, perhaps because of
differences in the ARM chip implementation.
6
Table 4 CFD performance matrix (seconds)
basic.dfd -O0 -O1 -O2 -O3
Dell 7010 1.3 <1 <1 <1
Dell 7010 VB 2.2 1.3 <1 <1
MacAir 2.3 2.2 <1 <1
Dell 755 2.8 1.3 1.2 1.2
Dell 755 VB 3.6 2.3 1.8 -
IBM T61 2.7 1.3 1.2 1.1
IBM T61 W7 3.5 1.7 1.6 -
IBM T23 10.1 7.1 5.5 7.0
Odroid 12.7 4.5 4.0 4.1
HUDL 14.5 4.7 4.2 -
Rasp Pi2 30.6 12.0 9.7 9.6
BBB 30.3 16.9 17.0 15.8
Rasp Pi 51.6 25.9 - 61.7
porous.dfd -O0 -O1 -O2 -O3
Dell 7010 4.0 1.5 1.5 1.5
Dell 7010 VB 5.6 2.6 2.1 -
Air 6.5 6.2 2.2 2.1
Dell 755 8.2 3.3 3.2 3.1
Dell 755 VB 8.8 4.9 4.4 -
IBM T61 8.7 3.3 3.2 3.6
IBM T61 W7 11.2s 4.2 4.0 -
IBM T23 29.0 15.6 13.6 15.0
Odroid 37.1 11.7 9.6 -
HUDL 41.7 10.0 10.1 10.0
BBB 79.1 33.7 33.7 32.6
Rasp Pi2 84.2 26.7 23.1 22.9
Rasp Pi 127.0 55.6 - 38.4
Bi_cg.dfd -O0 -O1 -O2 -O3
Dell 7010 229 127 127 122
Dell 7010 VB 357 227 227- -
Air 410.6 391.0 215 205
Dell 755 599.5 357 338 332
Dell 755 VB 670 574 565 -
IBM T61 544 351 332 326
IMB T61 W7 734 486 480 -
IBM T23 2378 2202 1742 2279
Odroid 2284 897 758 -
Rasp Pi2 - - - 2750
Building models for ESP-r & EnergyPlus
Student scale models are characterised by the cellular
office (Figure 2). ESP-r includes a dozen variants of
this model exploring various simulation facilities. For
EnergyPlus, the standard Supermarket.idf was used.
Assessment tasks include different periods and are
reported in Table 5. For example, a spring week for
initial model calibration and reality checks. Every
platforms could carry out the task takes less than 30
seconds. A January-February assessment explores the
distribution of peak and typical demands and all
platforms could carry out this task in less than one
minute. The four month summer assessment
highlights the benefits of optimisation. Lastly, we see
that annual assessment are problematic for low
resource machines without at least an -O1
optimisation. Four cores and single cores produce
similar timings for sequential tasks. Although the Pi
can run 4 simultaneous simulations, disk access is the
bottleneck.
Figure 2 Student scale model
A simple model from the point of view of
EnergyPlus is Supermarket.idf. For this study it was
adapted for an annual assessment at four time steps
per hour. The difference in performance between the
two Dell computers is substantial with the Odroid
ARM computer approaching the performance of the
older Dell more than the difference in performance
between the older Dell and the Odroid for ESP-r
assessments.
EnergyPlus simple model performance predictions
• Dell 7010 -O3 compile 0m34s
• Dell 755 -O3 compile 4m10s
• Odroid -O3 compile 5m56s
• Raspberry Pi 2 -O3 compile 10m10s
Table 5
Cellular office performance matrix (seconds)
Dell 7010
Period -O0 -O1 -O2 -O3
one week <1 <1 <1 <1
summer 4.1 2.1 2.0 -
annual 11.7 5.7 5.6 -
Air
one week <1 <1 <1 <1
summer 5.8 6.3 2.6 -
annual 16.4 7.3 6.8 -
Dell 755
one week 1.2 <1 <1 <1
summer 7.7 4.2 3.9 3.8
annual 21.9 11.6 10.4 10.2
T61 rotational
one week 1.3 <1 <1 <1
summer 8.2 4.7 4.4 4.4
annual 23.4 12.9 11.8 11.8
7
T61 SSD
one week 1.2 <1 <1 <1
summer 8.0 4.3 3.9 3.8
annual 22.8 11.7 10.6 10.4
Odroid
one week - 2.9 2.8 2.7
summer - 15.5 14.3 14.6
annual - 42.9 40.0 40.9
Raspberry Pi 2
one week 5.7 5.9 5.5 5.4
summer 82.3 38.6 35.3 34.2
annual 247.5 108.7 102.4 97.4
BBB
one week 14.1 9.0 8.9 8.9
summer 126.9 77.7 76.2 79.7
The second building model tested is an 1890s Stone
Villa (Figure 3) which has typically been used to
assess refurbishment options and its geometric and
compositional detail reflects this. This model
includes 13 thermal zones and 432 surfaces. The
composition includes a mix of lightweight and heavy
entities (outer walls with 600mm of various stone
types). There is considerable diversity of room use
throughout the day and for different day types.
Figure 3 Moderate complexity model
The ESP-r model was exported from ESP-r as a V7.2
IDF file and then upgraded via the usual utilities to
an 8.2 IDF. Two variants were created, a base case
using conduction transfer functions at the same time
step as ESP-r used and the other uses the finite
difference solver at 20 time step per hour. The finite
difference solver directives would be roughly
analogous to the finite volumes used in ESP-r.
In this case, the size of the results files can become
an issue. ESP-r supports multiple save levels - save 4
which includes a full energy balance at all zones and
all surfaces and save 2 which does not include the
energy balance for surfaces. For example the stone
simi ESP-r model a one week assessment is 6.9MB
and 82.4MB, a two month assessment is 57MB and
693MB, summer 118MB and 1.43GB and an annual
assessment is 354MB and 4.28 GB.
The performance matrix for ESP-r is shown in Table
6. The entries marked s2 are for constrained
performance data and s4 include a full energy
balance. Of interest is the improvement across all
platforms, especially as model complexity increases
and for the summer or annual assessments. All other
factors being consistent Linux annual run time
reductions from 950s to 230s and OSX run time
reductions from 744s to 181s has generated a number
of oh-my-goodness reactions from users. It was
possible for an optimised Dell 755 ESP-r to surpass a
newer but un-optimised Dell 7010 ESP-r. Indeed a
fully optimised Raspberry Pi 2 approached the
Lenovo T61 un-optimised performance.
Data recovery timings
The extraction of data from ESP-r results files is not
particularly sensitive to the level of build
optimisation. Rather it depends on the nature of the
disk drive, the available memory and the extent of
the results file being scanned. The SDHC cards in
several of the SBC were seen to be especially slow
for scanning large results files. Similarly, constrained
memory prevented data recovery from memory
buffer and forced disk reads for several of the cases.
The Table 7 shows timings including runs which
were impacted by lack of free memory (*)
Table 6 Stone villa performance matrix (seconds)
Dell 7010
Period -O0 -O1 -O2 -O3
s2 week 29.8 10.5 9 -
s2 summer 164 57 51 -
s2 annual 448 164 139 -
s4 summer 167 56 54
s4 annual 459 164 148
Dell 755
s2 summer 347 128 102 103
s2 annual 953 352 281 280
s4 summer 355 133 108 107
s4 annual 974 368 296 294
Dell 755 Virtual Box WattOS
s2 summer 371 162 163 162
s2 annual 1010 531 465
s4 summer 340 199 145
s4 annual - - - -
Macbook Air
s2 week 47 14 12 12
s2 summer 258 78 66 66
s2 annual 774 209 181 181
s4 summer 263 81 70 73
s4 annual 723 226 194 203
T61 rotational
s2 week 67 28.4 23.6 23.2
s2 summer 369 157 131 130
s2 annual 1010 430 351 361
8
s4 summer 375 163 146 137
s4 annual 1031 444 395 368
T61 SSD W7
s2 week 93 41 30
s2 summer 497 198 150
s2 annual 1388 533 430
Raspberry Pi 2
s2 week 704 270 220 209
s2 summer 3925 - 1190 1159
s2 annual 10803 - 3217 3218
s4 week 707 - 219 213
s4 summer 3943 - 1225 1238
Table 7 Data recovery matrix (elapsed seconds)
Computer one
week
Jan-
feb
summer annual
Cellular office model
Dell 7010 <1 <1 1 2
Mac Air <1 1 2 4
Dell 755 <1 1.5 2.2 4-5
T61 (rot) 1 1.5 2.5 4-6
T61 SSD 1 1.8 2.9 5-7
Odroid 2.4 4s 6 12
Rasp Pi 2 5 8 12 47
BBB 5 12 19 49
Stone villa model (constrained performance data)
Dell 7010 <1 3 5 8
Mac Air 1 7 13 37
Dell 755 1.5 6 13 36
Dell 755VB 3 9 17 52
T61 (rot) 2 7 14 41
T61 SSD 2 6 9 26
Odroid 6 25 49 140
Rasp Pi 2 9 28 52 140
BBB 9 29 54 150
Stone villa model (with full energy balance)
Dell 7010 1 5 10 15
Mac Air 2 13 32 390*
Dell 755 2 12 18 68
Dell 755VB 4 47 189*
T61 (rot) 3 14 35 247*
T61 SSD 3 19 43 369*
Odroid 4-9 50 104 -
Raspb Pi 2 9-10 41-48 180* -
BBB 10-11 141* 297* -
EnergyPlus models
The Supermarket.idf model, like the many of the
example models distributed with EnergyPlus, can be
used for calibration assessments or non-annual
assessments on the platforms studied.
The impact of compiler optimisations on conduction
transfer and finite difference assessments is shown
below for annual EnergyPlus 8.2 runs of the Stone
villa. The tool-chain optimisation improvements for
EnergyPlus are roughly in line with the pattern seen
with the ESP-r build process.
EnergyPlus annual run timings:
Dell 7010 -O1 2m48s with conduction transfer
Dell 7010 -O3 0m32s with conduction transfer
Dell 7010 -O1 59m20s with finite difference solution
Dell 7010 -O3 10m46s with finite difference solution
Dell 755 -O0 compile 9m55s
Dell 755 -O1 compile 4m41s
Dell 755 -O2 compile 3m32s
Dell 755 -O0 compile, finite difference 192m8s
Dell 755 -O1 compile, finite difference 102m58s
Dell 755 -O2 compile, finite difference 76m30s
Mac Air EnergyPlus 8.1 standard distribution 1m7s
Raspberry Pi 2 -O1 30m4s with conduction transfer
Raspberry Pi 2 -O3 12m34s with conduction transfer
Raspberry Pi 2 -O1 591m55s with finite difference
Raspberry Pi 2 -O3 194m28s with finite difference
It is unclear why the Dell 755 is so much less suited
to EnergyPlus finite difference production work. For
models which make use of the finite difference solver
low resource computers have a distinct disadvantage.
The GCC optimisations yield the expected pattern in
run time changes for both Fortran and C++.
However, the -O3 optimisation with GCC delivers a
less performance than the compiler used by the
EnergyPlus development team.
CONCLUSION
A matrix of computer hardware and software options
have been tested against a range of ESP-r and
EnergyPlus simulation models and for a range of
simulation tasks. Timings for numerical tasks and
performance recovery tasks have been reported.
What is more difficult to quantify in terms of timings
are user interactions associated with creating and
evolving models. Typically, these tasks require a
fraction of the available computing resource and here
the user experience of using low resource and older
hardware is less marked than standard numerical
benchmarks would suggest. The use of compiler
optimisation directives removes most of the latency
in the drawing of wire-frames and in the navigation
of models. Creating and evolving with models of
moderate complexity on all but the most constrained
of platforms would likely be acceptable to many
practitioners. Optimised software and hardware has
thus been seen to expand options for deploying
simulation to non-traditional platforms
9
It has been seen that ARM processors support a
higher degree of optimisation followed by Intel
Linux and then Windows 7. For ESP-r it makes sense
to adopt at least the -O1 level for software to be
distributed. On Intel computers optimisation beyond -
O1 produces marginal improvements for a massive
increase in build times. Developers building
EnergyPlus may choose to debug with –O0 but
should remember to rebuild with –O3 for
distribution.
Without any hardware changes, a 2007 Dell 755 with
optimised software was seen to perform similarly to
an un-optimised 2012 Dell 7010 for a number of
simulation tasks. Similarly, optimised software can
make up for much of the numerical inefficiency in
the use of virtual machines. A Raspberry Pi 2 has
been used for student projects essentially without
comment if the software is fully optimised and
instructions avoid the generation of large files and
extensive data mining tasks.
There are clear indications that critical adjustments to
the scope of assessments and simulation work flow
can improve production tasks by avoiding the use of
virtual memory and ensuring that data recovery can
make use of reads from the memory buffer rather
than disk.
The study provides evidence that careful selection of
refurbishment options for legacy hardware can
extend their life considerably. For example, the
combination of a replacement SSD and optimally
compiled software increased user productivity on
2006 laptops and workstations.
Legacy hardware that can no longer run Windows
XP can usually be repurposed as Linux computers
capable of a number of simulation related work tasks.
For example, a re-configuredNetbook which was
seen to be in line with the better ARM SBC for
browsing models and checking details while visiting
sites for consulting projects.
Although ESP-r is natively hosted on Windows
platforms, tests show that, on the same hardware
ESP-r is roughly 30% slower on Windows 7. This
might be because of OS resource requirements and it
might be due to inefficient use of virtual memory.
This suggests that there may be additional
optimisation techniques to be explored for the
Windows platform.
The full matrix of computers/ compilers/ operating
system variants and observations is being compiled
and extended for a journal paper.
ACKNOWLEDGEMENT
Some of the computers used in this study were
sourced within the University of Strathclyde. Critical
advise on compiling EnergyPlus came from Linda
Laurie.
REFERENCES
Beausoleil-Morrison, I. 2000. The adaptive coupling
of heat and airflow modelling within dynamic
whole-building simulation, Glasgow, University
of Strathclyde.
Hand J. 2014. Opportunities and constraints in the
use of simulation on low cost ARM-based
computers. eSim Conference, Ottawa Canada.
Hand, J. 2015. Strategies for deploying virtual
representations of the built environment,
Glasgow Scotland.
10

More Related Content

Similar to survey_of_matrix_for_simulation

Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentEricsson
 
Performance Evaluation of the KVM Hypervisor Running on Arm-Based Single-Boar...
Performance Evaluation of the KVM Hypervisor Running on Arm-Based Single-Boar...Performance Evaluation of the KVM Hypervisor Running on Arm-Based Single-Boar...
Performance Evaluation of the KVM Hypervisor Running on Arm-Based Single-Boar...IJCNCJournal
 
PERFORMANCE EVALUATION OF THE KVM HYPERVISOR RUNNING ON ARM-BASED SINGLE-BOAR...
PERFORMANCE EVALUATION OF THE KVM HYPERVISOR RUNNING ON ARM-BASED SINGLE-BOAR...PERFORMANCE EVALUATION OF THE KVM HYPERVISOR RUNNING ON ARM-BASED SINGLE-BOAR...
PERFORMANCE EVALUATION OF THE KVM HYPERVISOR RUNNING ON ARM-BASED SINGLE-BOAR...IJCNCJournal
 
PERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERS
PERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERSPERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERS
PERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERSijdpsjournal
 
PERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERS
PERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERSPERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERS
PERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERSijdpsjournal
 
Hard Disk Drive versus Solid State Drive
Hard Disk Drive versus Solid State DriveHard Disk Drive versus Solid State Drive
Hard Disk Drive versus Solid State DriveDac Khue Nguyen
 
Analysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific ApplicationsAnalysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific ApplicationsJames McGalliard
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor DesignSri Prasanna
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisNomanSiddiqui41
 
Dynamic Simulation of Chemical Kinetics in Microcontroller
Dynamic Simulation of Chemical Kinetics in MicrocontrollerDynamic Simulation of Chemical Kinetics in Microcontroller
Dynamic Simulation of Chemical Kinetics in MicrocontrollerIJERA Editor
 
Performance and Energy evaluation
Performance and Energy evaluationPerformance and Energy evaluation
Performance and Energy evaluationGIORGOS STAMELOS
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformYao Yao
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ijdpsjournal
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ijdpsjournal
 
I understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfI understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfanil0878
 

Similar to survey_of_matrix_for_simulation (20)

Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environment
 
Benchmarking Mobile Storage
Benchmarking Mobile StorageBenchmarking Mobile Storage
Benchmarking Mobile Storage
 
Performance Evaluation of the KVM Hypervisor Running on Arm-Based Single-Boar...
Performance Evaluation of the KVM Hypervisor Running on Arm-Based Single-Boar...Performance Evaluation of the KVM Hypervisor Running on Arm-Based Single-Boar...
Performance Evaluation of the KVM Hypervisor Running on Arm-Based Single-Boar...
 
PERFORMANCE EVALUATION OF THE KVM HYPERVISOR RUNNING ON ARM-BASED SINGLE-BOAR...
PERFORMANCE EVALUATION OF THE KVM HYPERVISOR RUNNING ON ARM-BASED SINGLE-BOAR...PERFORMANCE EVALUATION OF THE KVM HYPERVISOR RUNNING ON ARM-BASED SINGLE-BOAR...
PERFORMANCE EVALUATION OF THE KVM HYPERVISOR RUNNING ON ARM-BASED SINGLE-BOAR...
 
PERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERS
PERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERSPERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERS
PERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERS
 
PERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERS
PERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERSPERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERS
PERFORMANCE AND ENERGY-EFFICIENCY ASPECTS OF CLUSTERS OF SINGLE BOARD COMPUTERS
 
Hard Disk Drive versus Solid State Drive
Hard Disk Drive versus Solid State DriveHard Disk Drive versus Solid State Drive
Hard Disk Drive versus Solid State Drive
 
Analysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific ApplicationsAnalysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific Applications
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor Design
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
 
Dynamic Simulation of Chemical Kinetics in Microcontroller
Dynamic Simulation of Chemical Kinetics in MicrocontrollerDynamic Simulation of Chemical Kinetics in Microcontroller
Dynamic Simulation of Chemical Kinetics in Microcontroller
 
Performance and Energy evaluation
Performance and Energy evaluationPerformance and Energy evaluation
Performance and Energy evaluation
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
 
Blue Gene
Blue GeneBlue Gene
Blue Gene
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
I understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfI understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdf
 

survey_of_matrix_for_simulation

  • 1. Energy Systems Research Unit Email: esru@strath.ac.uk Dept. of Mechanical and Aerospace Engineering Tel: +44 (0)141 548 2314 75 Montrose Street, Glasgow G1 1XJ http://www. strath.ac.uk/esru The University of Strathclyde is a charitable body, registered in Scotland, number SC015263 ESRU occasional paper Survey of a matrix of computing hardware and compilation influences on the deployment of ESP-r and EnergyPlus Dr. Jon W. Hand Energy Systems Research Unit 13 October 2015
  • 2. 1 SURVEY OF A MATRIX OF HARDWARE AND COMPILATION INFLUENCES ON1 THE DEPLOYMENT OF ESP-r AND ENERGYPLUS2 3 Dr. Jon W. Hand1 4 1 Energy Systems Research Unit, University of Strathclyde, Glasgow, Scotland5 6 13 October 20157 8 9 10 Contents'11 !12 ABSTRACT .............................................................................................................................................................2!13 INTRODUCTION ....................................................................................................................................................2!14 HARDWARE ISSUES.............................................................................................................................................2!15 Disk types and size ...............................................................................................................................................2!16 Computer memory................................................................................................................................................3!17 Virtual (cache) memory........................................................................................................................................3!18 Processor type.......................................................................................................................................................3!19 Computational platforms considered....................................................................................................................4!20 COMPILER DIRECTIVES......................................................................................................................................4!21 The test models.....................................................................................................................................................5!22 Building models for ESP-r & EnergyPlus............................................................................................................6!23 Data recovery timings...........................................................................................................................................7!24 EnergyPlus models ...............................................................................................................................................8!25 CONCLUSION.........................................................................................................................................................8!26 ACKNOWLEDGEMENT........................................................................................................................................9!27 REFERENCES .........................................................................................................................................................9!28 29 30 31 32 33
  • 3. 2 1 ABSTRACT This is an interim report from an ongoing investigation of the relative contribution of various hardware and compiler options on the efficacy of specific ESP-r and EnergyPlus simulation tasks. The paper draws on a range of techniques and methodologies developed to port ESP-r to ARM platforms such as the Raspberry Pi and extends them across a range of traditional and emerging computing platforms. Adapting to the hardware limits of ultra low cost computers exposed a number of issues related to disk access, disk type, memory, number of computing cores, compiler options as well as the use of virtual machines for a range of tool development and simulation tasks. Among other findings is that CFD convergence times can be reduced from 41s to less than 10s, annual multi-domain assessments from 950s to 280s and non-optimal memory and virtual computing can extend data recovery times by more than a factor of 10. INTRODUCTION In 2012 the Raspberry Pi (www.raspberrypi.org) was introduced, primarily as a vehicle to address a lack of programming skills in UK schools. Its combination of price and computational power proved attractive to a far wider audience, including the 'maker' community. This spawned competitors such as the BeagleBone Black (beagleboard.org) and an ecosystem of related add-on devices as well as a community of developers testing the bounds of this new class of device which were typically distributed with Linux. Few perceived such lightweight platforms might support numerical simulation. However, in a historical context, the development of simulation tools and many classic numerical studies were accomplished with even more constrained computational resources At the eSim conference in 2014 the author presented the results of an initial port of ESP-r to the original Raspberry Pi and BeagleBone Black ARM-based computers and observations of their use as software development platforms as well as for carrying out various performance assessment goals for different user types. From the subset of simulation tasks that the first generation supported, subsequent ARM-based computers, for example the Odroid-U3 from Korea (www.hardkernel.com), have include multiple cores, more memory and faster disk access. The user experience gap between a conventional desktop computer and the $70 Odoid-U3 is surprisingly modest. For example, simultaneously editing of a 3200 surface ESP-r model while running a CFD assessment and an Octave (an open source equivalent to MatLab) turbine blade analysis session does not saturate the Odroid's resources. However, less constrained hardware is only part of the story. The author observed that particular adjustments to the numerical source code and to the compiling tool chain resulted in significant improvements in the build process, subsequent user interactions and the run times for assessments. The magnitude of improvement was dependent on the specifics of the hardware, the complexity of the model and the nature of the assessments carried out. This paper assesses whether the techniques explored during the ARM study are applicable in a broader context of numerical tools, machine configurations and ordering of simulation work tasks. Both ESP-r (www.esru.strath.ac.uk/publications.htm ) and EnergyPlus (apps1.eere.energy.gov/buildings/ energyplus/) are used to test this idea. As many of the constraints noted in ARM platforms are found in older laptops and workstations the study also assess the extent of improvement for such computers as well as for computers which fit the conventional definition of a numerical workstation. The author observed that the details of computer hardware, e.g. type of disk, provision of physical memory and virtual memory and processor type had an impact on the time it took to carry out specific simulation tasks on models of different complexity. HARDWARE ISSUES Disk types and size Most ARM single board computers (SBC) rely on SDHC cards for disk storage as well as swap space. SDHC cards were not designed for operating systems and disk I/O is substantially constrained. Some SBC and tablets make use of eMMC storage that are mid- way between SDHC cards and rotational drives in terms of speed. Conventional computers often include slower rotational drives rather than SSD drives. SBC are typically paired with 8GB or 16GB SDHC cards or eMMC for reasons of cost and this constrains the space available for the build process (EnergyPlus requires ~4GB to build) as well as the space available for simulation files and performance prediction files. Another class of drive are the virtual drives implemented within virtual computers. Tests indicate that the overheads involved result in disk access roughly half the speed of the computers native drive. Disk I/O associated with lots of small files or with random access files can be a magnitude slower than sequential reads and writes reported in benchmarks. ESP-r models may consist of scores of small files and performance predictions for each domain are held in
  • 4. 3 sequentially written binary files with data recovery mostly involving random access. Conversly, EnergyPlus is reading few files and sequentially writing ASCII files and optionally a SQL database. Benchmarking tools such as (a) Blackmagick disk speed test on OSX and (b) CrystalDiskMark on Windows use various disk I/O tests for sequential and random access. Table 1 shows their reporting across a range of devices. This is only somewhat indicative of the mix of simulation tool data recovery tasks which have been assessed. Table 1 Typical disk I/O speeds MB/s Drive Type Sequential (a) Random (b) write read write read SDHC class 10 10-20 20 1.6 5.3 eMMC Odroid 15 55 USB 2 stick 2-8 18 0.03 5.4 USB 3 stick 15-20 60 0.6 4.1 USB 3 rotational 60 65 1.9 0.5 Old 2.5" rotational 40 45 USB 3 SSD 110 245 1.2 1.6 Network drive 110 110 1.2 1.6 T61 SSD 110 130 33 22 T61 Rotational 60 60 0.3 0.4 Dell 7010 rotational 90 77 1.5 1.2 Dell 755 rotational 84 87 Dell 755 virtualbox 42 47 Macbook Air SSD 222 600 99 20 Many simulation tasks are disk-bound. ESP-r has a number of user choices which impact the size of the results files created: the extent of performance data written (i.e. save level), the period of the assessment and the building and system time step used. An ESP- r model at the limits of geometric complexity running several months at a one minute building time step and systems time step of one second time step might generate upwards of 40 GB. For example an annual 15 minute time step 1890s villa model constrained performance file is 354MB and the extensive file is 4.27GB. EnergyPlus also incudes optional directives which may constrain the number of entities which are reported on or, for example omit or constrain SQL outputs.. Computer memory Constrained resource computers often run Linux because of the memory footprint of the operating system. With ~512MB RAM there is still ~400MB available. One of the initial challenges of porting ESP-r was to create a suite of executables that could run in their usual combinations within the available memory. Rather than purge multi-domain functionality the route taken was to constrain model complexity via alternative sets of header files for small and standard deployments. Small deployments are targeted at low resource computers, but can be advantageous when running assessments within virtual computers. Memory is also an issue during the build process, some compile options require substantially more memory and may drive the whole process into virtual memory. The build process on ESP-r essentially doubled in speed when low resource linking commands are used. EnergyPlus is hungry for memory and disk space during the build process and was rarely successful if there is less than ~800MB RAM and 4GB of free disk space available. Given sufficient memory, operating systems use memory as a buffer for the disk I/O associated with simulations and subsequent data-mining. There is a considerable speedup if free memory is greater than the size of the files being written. It is also the case that where multiple assessments need to be carried out it is often much faster to: simulate_a extract_a simulate_b extract_b rather than simulate_a simulate_b extract_a extract_b. Thus, critical adjustments to the scope of assessment or the ordering of tasks can reduce the penalty for data extraction across most platforms. Virtual (cache) memory Virtual memory via a swap file on the disk is used when physical memory is depleted. If slow disks are combined with limited memory then swap is increasingly used and performance degrades. This was evident in the initial porting of ESP-r to the Raspberry Pi. Tools such as ESP-r are composed of many modules, for example the ESP-r project manager (prj) will invoke the simulation engine (bps) and later the results analysis module (res). Although usually not an issue in conventional deployments, with constrained memory simulation executables need to be constrained in size to avoid running in swap. The approach taken in ESP-r is to have alternative header files that support different levels of model complexity. This is also helpful when running simulation within virtual environments or where separate processor cores are used for parallel assessments. Processor type ESP-r has had a long history of deployment on different computer platforms and operating systems. One of the challenges of the study was to build EnergyPlus on ARM. Although most users perceive EnergyPlus as a simulation engine, it runs within the context of a set of pre-processing and post-processing utilities. The simulation engine is relatively straightforward to compile from the Fortran source, however the utilities (e.g. ) in the standard 8.1 distributions are pre-compiled executables. It proved
  • 5. 4 difficult to compile the complete set of utilities from scratch for use on ARM. Eventually the 8.2 EnergyPlus source was used. Computational platforms considered The context of the study is a range of computers spanning ultra low cost computers, tablets, legacycomputers as well as conventional workstations. The mix of hardware configurations allowed many of the hardware sensitivities to be explored. For this study, most comparisons are done with computers configured to run Linux or emulating Linux. With minor variants, the command syntax, form of user interaction, benchmarking tools, operating system resource requirements and support for scripting are roughly consistent. The list below summarises the computer platforms in terms of . Name, computer type, CPU, CPU speed, RAM, Operating system, compiler, swop space and epoch. 1. Dell 7010, desktop, 4x Intel i5-3470 @3.2 GHz, 8GB ram, Ubuntu 14.04.1 LTS. Linux 3.11.0, GCC 4.8.2, Cache 8061 Mb, 2012 2. Macbook Air, laptop, 2x Intel i5 @ 1.3 GHz, 4GB RAM, OSX 10.8.3, GCC 4.7, Cache 4GB, SSD, 2013 3. Dell 755 2x, desktop, Intel Core 2 Duo E6550 @ 2.33GHz 3.91GB RAM, Mint 16, GCC 4.8.1, Cache 4049 MB, 2007 4. Dell 755 virtualbox 1.9GB memory 1 processor, WattOS, GCC 4.7.2 5. IBM thinkpad T61, laptop, Intel Core2 2.0 GHz 2GB RAM Linux Mint, GCC 4.8, SSD, 2007 6. IBM thinkpad T61, laptop, Intel Core2 2.0 GHz 2GB RAM Linux Mint, GCC 4.8 7. EeePC901, netbook, Atom N270, 1.6GHz,1GB RAM, WattOS, GCC 4.7.2 , Cache 512 Mb, SSD, 2008 8. Odroid, SBC, ARMv7l, 2GB RAM, Ubuntu 14.04 GCC 4.8.2 , Cache 0 Mb, eMMC, 2014 9. HUDL, tablet, ARMv7l 1GB RAM, Debian 7.7, GCC 4.6.3, Cache 0 Mb, eMMC, 2013 10. IBM T23, laptop, P-III 1.2Ghz 1GB RAM, Vector Linux, GCC 4.5.2, Cache 1024Mb,2003 11. Raspberry Pi 2 SBC, ARMv7l, 762MB RAM, Debian 7.8, GCC 4.6.3, Cache 921 Mb, SDHC, 2015 12. BBB, SBC, ARMv7l, 507GB RAM, Linux 3.8.13, ? GCC 4.7.2 , Cache 820 Mb, SDHC, 2013 13. Raspberry Pi, SBC, ARMv6l, 481MB RAM, Debian 7.2, GCC 4.6.3, Cache 921 Mb, SDHC, 2012 Standard Linux Hardinfo benchmarks <sourceforge.net/projects/hardinfo.berlios> are shown in Table 2 (ordered by the FFT benchmark). In the table VB denotes a virtual computer. Notice the impact of running a virtual computer. The newer ARM processors are in the same magnitude as older laptops for single core performance. However they tend to perform rather better for simulation tasks than the benchmarks indicate. Table 2 Typical benchmarks Computer CPU Blowfish Crypto- hash Fabo- nacci FFT Dell 7010 2.5 712.7 1.4 0.7 Dell 755 7.4 187.0 3.7 3.5 Dell 755 VB 14.6 89.7 3.8 7.1 IBM T61 8.7 161.1 4.1 3.9 IBM T23 37.5 32.6 8.9 23.1 EeePC901 20.6 41.2 7.5 28.5 HUDL 30.3 35.0 7.5 29.5 Odroid 28.4 37.0 6.9 28.8 Raspberry Pi 2 55.5 17.2 14.5 69.9 BBB 47.6 - 14.7 74.3 Raspberry Pi (orig) 73.8 - 21.7 119. 4 In practice, both EnergyPlus and ESP-r executables run on a single core. ESP-r is a suite of applications so there are usually more than one application active and thus a second core is useful. This study found that some sequential and most parallel invocations of simulation assessments were disk-bound. COMPILER DIRECTIVES The impact of options in the compilation tool chain have rarely been discussed within the simulation community. Some tools which are normally compiled from source, such as Radiance default to directives for speed of execution. The standard distribution of EnergyPlus is also optimised for speed. ESP-r, which is distributed as source, the Install script included no optimisation directives before is study. The compiler tool chain is based on GNU GCC so similar optimisations to be applied to both ESP-r and EnergyPlus. The optimisation directives (gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html ) for the GCC compiler suite are: 1. -O0 (no optimisation, fastest compile time, useful for debug sessions), 2. -O1 (attempts to reduce code size and execution time with least impact on compilation time, more memory required), 3. -O2 (additional attempt to reduce execution time, increased compilation time and memory required) 4. -O3 (additional in-line function and loop optimisation, even longer compile time) To quantify the impact of alternative build directives, both ESP-r and EnergyPlus were re-compiled using the O0, O1, O2 and O3options. For low resource computers, the small version header files were used. Table 3 shows ESP-r build times. For most platforms there is are significant time impacts from O0 to O1
  • 6. 5 but little between O2 and O3. The full install included building the databases and configuring 280 training and validation models. Virtual box (VB) are Linux running under the host Linux. Rotational (rot) and SSD are also listed separately. Table 3 Build time for ESP-r Computer Install type -O0 -O1 -O2 -O3 Dell 7010 full 6m5s 13m34s 17m48 18m16s Mac Air full 8m38s 12m22s 17m5s 18m39s Dell 755 full 11m6s 28m2s 37m53s 39m17s Dell 755 VB full 18m22s? 24m28s? 30m59s 36m36s IBM T61 rot full 12m12s IBM T61 SSD full 11m39s 30m03s 39m50s 41m51s IBM T61 SSD W7 MSYS2 full 27m58s 33m19s 53m35 IBM T23 full 51m14s 91m 113m 140m Odroid full 34m48s 61m 92m 98m EeePC 901 full 39m15s 63m 84m 91m Raspberry Pi 2 full 60m8s 125m 170m 177m Raspberry Pi full 106m 329m 478m - Clearly for development work the -O0 option has benefits, especially given the time involved for low- resource computers. The subsequent discussion about simulation task times will clearly demonstrate the benefit of distributing -O1 or -O2 executables across all machine types, especially older and low resources computers. The details of EnergyPlus builds is shown below for two platforms where EnergyPlus and all the utilities were compiled from scratch. For OSX and Linux on Intel the standard distribution utilities were used with separate EnergyPlus versions compiled with -O0 -O1 -O2 -O3. • Dell 7010 make install -O0 47m27s • Raspberry Pi 2 make install -O0 231m24s The test models Models were selected for ESP-r and EnergyPlus representing different levels of complexity as well as exercising different solvers. For CFD assessments the suite of CFD benchmarks developed by Ian Beausoleil-Morrison (2000) have been used. The four cases highlighted are: 1. basic.dfd 960 cells, 1 inlet, 1 outlet, no blockages k-e pressure and velocity solved ~260 iterations 2. porous.dfd 960 cells, 1 inlet, 1 outlet 18 blockages k-e pressure and velocity solved ~400 iterations 3. bi-cg.dfd 24360 cells, 4 inlets, 4 outlets k-e solve pressure velocity ~2000 iterations Figure 1 shows a variant with an wall inlet, ceiling extract and a grid of internal blockages. Figure 1 CFD domain with internal blockages CFD is numerically intensive with little disk I/O. Looking at CFD performance across the matrix of hardware and software optimisation in Table 4, it is clearly possible to use resource-constrained computers in combination with high optimisation. ARM has a big step in performance between O0 & O1, moderate gains with O2 while O3 is marginal. OSX has a big step in performance from O1 & O2 and marginal gains from O3. Linux on Intel has a big step from O0 to O1 and little or no improvement with O2 & O3. When running on a virtual computer a -O1 or O2 will perform roughly in line with un-optimised software on the host computer. The Raspberry Pi 2 optimises better than the BBB, perhaps because of differences in the ARM chip implementation.
  • 7. 6 Table 4 CFD performance matrix (seconds) basic.dfd -O0 -O1 -O2 -O3 Dell 7010 1.3 <1 <1 <1 Dell 7010 VB 2.2 1.3 <1 <1 MacAir 2.3 2.2 <1 <1 Dell 755 2.8 1.3 1.2 1.2 Dell 755 VB 3.6 2.3 1.8 - IBM T61 2.7 1.3 1.2 1.1 IBM T61 W7 3.5 1.7 1.6 - IBM T23 10.1 7.1 5.5 7.0 Odroid 12.7 4.5 4.0 4.1 HUDL 14.5 4.7 4.2 - Rasp Pi2 30.6 12.0 9.7 9.6 BBB 30.3 16.9 17.0 15.8 Rasp Pi 51.6 25.9 - 61.7 porous.dfd -O0 -O1 -O2 -O3 Dell 7010 4.0 1.5 1.5 1.5 Dell 7010 VB 5.6 2.6 2.1 - Air 6.5 6.2 2.2 2.1 Dell 755 8.2 3.3 3.2 3.1 Dell 755 VB 8.8 4.9 4.4 - IBM T61 8.7 3.3 3.2 3.6 IBM T61 W7 11.2s 4.2 4.0 - IBM T23 29.0 15.6 13.6 15.0 Odroid 37.1 11.7 9.6 - HUDL 41.7 10.0 10.1 10.0 BBB 79.1 33.7 33.7 32.6 Rasp Pi2 84.2 26.7 23.1 22.9 Rasp Pi 127.0 55.6 - 38.4 Bi_cg.dfd -O0 -O1 -O2 -O3 Dell 7010 229 127 127 122 Dell 7010 VB 357 227 227- - Air 410.6 391.0 215 205 Dell 755 599.5 357 338 332 Dell 755 VB 670 574 565 - IBM T61 544 351 332 326 IMB T61 W7 734 486 480 - IBM T23 2378 2202 1742 2279 Odroid 2284 897 758 - Rasp Pi2 - - - 2750 Building models for ESP-r & EnergyPlus Student scale models are characterised by the cellular office (Figure 2). ESP-r includes a dozen variants of this model exploring various simulation facilities. For EnergyPlus, the standard Supermarket.idf was used. Assessment tasks include different periods and are reported in Table 5. For example, a spring week for initial model calibration and reality checks. Every platforms could carry out the task takes less than 30 seconds. A January-February assessment explores the distribution of peak and typical demands and all platforms could carry out this task in less than one minute. The four month summer assessment highlights the benefits of optimisation. Lastly, we see that annual assessment are problematic for low resource machines without at least an -O1 optimisation. Four cores and single cores produce similar timings for sequential tasks. Although the Pi can run 4 simultaneous simulations, disk access is the bottleneck. Figure 2 Student scale model A simple model from the point of view of EnergyPlus is Supermarket.idf. For this study it was adapted for an annual assessment at four time steps per hour. The difference in performance between the two Dell computers is substantial with the Odroid ARM computer approaching the performance of the older Dell more than the difference in performance between the older Dell and the Odroid for ESP-r assessments. EnergyPlus simple model performance predictions • Dell 7010 -O3 compile 0m34s • Dell 755 -O3 compile 4m10s • Odroid -O3 compile 5m56s • Raspberry Pi 2 -O3 compile 10m10s Table 5 Cellular office performance matrix (seconds) Dell 7010 Period -O0 -O1 -O2 -O3 one week <1 <1 <1 <1 summer 4.1 2.1 2.0 - annual 11.7 5.7 5.6 - Air one week <1 <1 <1 <1 summer 5.8 6.3 2.6 - annual 16.4 7.3 6.8 - Dell 755 one week 1.2 <1 <1 <1 summer 7.7 4.2 3.9 3.8 annual 21.9 11.6 10.4 10.2 T61 rotational one week 1.3 <1 <1 <1 summer 8.2 4.7 4.4 4.4 annual 23.4 12.9 11.8 11.8
  • 8. 7 T61 SSD one week 1.2 <1 <1 <1 summer 8.0 4.3 3.9 3.8 annual 22.8 11.7 10.6 10.4 Odroid one week - 2.9 2.8 2.7 summer - 15.5 14.3 14.6 annual - 42.9 40.0 40.9 Raspberry Pi 2 one week 5.7 5.9 5.5 5.4 summer 82.3 38.6 35.3 34.2 annual 247.5 108.7 102.4 97.4 BBB one week 14.1 9.0 8.9 8.9 summer 126.9 77.7 76.2 79.7 The second building model tested is an 1890s Stone Villa (Figure 3) which has typically been used to assess refurbishment options and its geometric and compositional detail reflects this. This model includes 13 thermal zones and 432 surfaces. The composition includes a mix of lightweight and heavy entities (outer walls with 600mm of various stone types). There is considerable diversity of room use throughout the day and for different day types. Figure 3 Moderate complexity model The ESP-r model was exported from ESP-r as a V7.2 IDF file and then upgraded via the usual utilities to an 8.2 IDF. Two variants were created, a base case using conduction transfer functions at the same time step as ESP-r used and the other uses the finite difference solver at 20 time step per hour. The finite difference solver directives would be roughly analogous to the finite volumes used in ESP-r. In this case, the size of the results files can become an issue. ESP-r supports multiple save levels - save 4 which includes a full energy balance at all zones and all surfaces and save 2 which does not include the energy balance for surfaces. For example the stone simi ESP-r model a one week assessment is 6.9MB and 82.4MB, a two month assessment is 57MB and 693MB, summer 118MB and 1.43GB and an annual assessment is 354MB and 4.28 GB. The performance matrix for ESP-r is shown in Table 6. The entries marked s2 are for constrained performance data and s4 include a full energy balance. Of interest is the improvement across all platforms, especially as model complexity increases and for the summer or annual assessments. All other factors being consistent Linux annual run time reductions from 950s to 230s and OSX run time reductions from 744s to 181s has generated a number of oh-my-goodness reactions from users. It was possible for an optimised Dell 755 ESP-r to surpass a newer but un-optimised Dell 7010 ESP-r. Indeed a fully optimised Raspberry Pi 2 approached the Lenovo T61 un-optimised performance. Data recovery timings The extraction of data from ESP-r results files is not particularly sensitive to the level of build optimisation. Rather it depends on the nature of the disk drive, the available memory and the extent of the results file being scanned. The SDHC cards in several of the SBC were seen to be especially slow for scanning large results files. Similarly, constrained memory prevented data recovery from memory buffer and forced disk reads for several of the cases. The Table 7 shows timings including runs which were impacted by lack of free memory (*) Table 6 Stone villa performance matrix (seconds) Dell 7010 Period -O0 -O1 -O2 -O3 s2 week 29.8 10.5 9 - s2 summer 164 57 51 - s2 annual 448 164 139 - s4 summer 167 56 54 s4 annual 459 164 148 Dell 755 s2 summer 347 128 102 103 s2 annual 953 352 281 280 s4 summer 355 133 108 107 s4 annual 974 368 296 294 Dell 755 Virtual Box WattOS s2 summer 371 162 163 162 s2 annual 1010 531 465 s4 summer 340 199 145 s4 annual - - - - Macbook Air s2 week 47 14 12 12 s2 summer 258 78 66 66 s2 annual 774 209 181 181 s4 summer 263 81 70 73 s4 annual 723 226 194 203 T61 rotational s2 week 67 28.4 23.6 23.2 s2 summer 369 157 131 130 s2 annual 1010 430 351 361
  • 9. 8 s4 summer 375 163 146 137 s4 annual 1031 444 395 368 T61 SSD W7 s2 week 93 41 30 s2 summer 497 198 150 s2 annual 1388 533 430 Raspberry Pi 2 s2 week 704 270 220 209 s2 summer 3925 - 1190 1159 s2 annual 10803 - 3217 3218 s4 week 707 - 219 213 s4 summer 3943 - 1225 1238 Table 7 Data recovery matrix (elapsed seconds) Computer one week Jan- feb summer annual Cellular office model Dell 7010 <1 <1 1 2 Mac Air <1 1 2 4 Dell 755 <1 1.5 2.2 4-5 T61 (rot) 1 1.5 2.5 4-6 T61 SSD 1 1.8 2.9 5-7 Odroid 2.4 4s 6 12 Rasp Pi 2 5 8 12 47 BBB 5 12 19 49 Stone villa model (constrained performance data) Dell 7010 <1 3 5 8 Mac Air 1 7 13 37 Dell 755 1.5 6 13 36 Dell 755VB 3 9 17 52 T61 (rot) 2 7 14 41 T61 SSD 2 6 9 26 Odroid 6 25 49 140 Rasp Pi 2 9 28 52 140 BBB 9 29 54 150 Stone villa model (with full energy balance) Dell 7010 1 5 10 15 Mac Air 2 13 32 390* Dell 755 2 12 18 68 Dell 755VB 4 47 189* T61 (rot) 3 14 35 247* T61 SSD 3 19 43 369* Odroid 4-9 50 104 - Raspb Pi 2 9-10 41-48 180* - BBB 10-11 141* 297* - EnergyPlus models The Supermarket.idf model, like the many of the example models distributed with EnergyPlus, can be used for calibration assessments or non-annual assessments on the platforms studied. The impact of compiler optimisations on conduction transfer and finite difference assessments is shown below for annual EnergyPlus 8.2 runs of the Stone villa. The tool-chain optimisation improvements for EnergyPlus are roughly in line with the pattern seen with the ESP-r build process. EnergyPlus annual run timings: Dell 7010 -O1 2m48s with conduction transfer Dell 7010 -O3 0m32s with conduction transfer Dell 7010 -O1 59m20s with finite difference solution Dell 7010 -O3 10m46s with finite difference solution Dell 755 -O0 compile 9m55s Dell 755 -O1 compile 4m41s Dell 755 -O2 compile 3m32s Dell 755 -O0 compile, finite difference 192m8s Dell 755 -O1 compile, finite difference 102m58s Dell 755 -O2 compile, finite difference 76m30s Mac Air EnergyPlus 8.1 standard distribution 1m7s Raspberry Pi 2 -O1 30m4s with conduction transfer Raspberry Pi 2 -O3 12m34s with conduction transfer Raspberry Pi 2 -O1 591m55s with finite difference Raspberry Pi 2 -O3 194m28s with finite difference It is unclear why the Dell 755 is so much less suited to EnergyPlus finite difference production work. For models which make use of the finite difference solver low resource computers have a distinct disadvantage. The GCC optimisations yield the expected pattern in run time changes for both Fortran and C++. However, the -O3 optimisation with GCC delivers a less performance than the compiler used by the EnergyPlus development team. CONCLUSION A matrix of computer hardware and software options have been tested against a range of ESP-r and EnergyPlus simulation models and for a range of simulation tasks. Timings for numerical tasks and performance recovery tasks have been reported. What is more difficult to quantify in terms of timings are user interactions associated with creating and evolving models. Typically, these tasks require a fraction of the available computing resource and here the user experience of using low resource and older hardware is less marked than standard numerical benchmarks would suggest. The use of compiler optimisation directives removes most of the latency in the drawing of wire-frames and in the navigation of models. Creating and evolving with models of moderate complexity on all but the most constrained of platforms would likely be acceptable to many practitioners. Optimised software and hardware has thus been seen to expand options for deploying simulation to non-traditional platforms
  • 10. 9 It has been seen that ARM processors support a higher degree of optimisation followed by Intel Linux and then Windows 7. For ESP-r it makes sense to adopt at least the -O1 level for software to be distributed. On Intel computers optimisation beyond - O1 produces marginal improvements for a massive increase in build times. Developers building EnergyPlus may choose to debug with –O0 but should remember to rebuild with –O3 for distribution. Without any hardware changes, a 2007 Dell 755 with optimised software was seen to perform similarly to an un-optimised 2012 Dell 7010 for a number of simulation tasks. Similarly, optimised software can make up for much of the numerical inefficiency in the use of virtual machines. A Raspberry Pi 2 has been used for student projects essentially without comment if the software is fully optimised and instructions avoid the generation of large files and extensive data mining tasks. There are clear indications that critical adjustments to the scope of assessments and simulation work flow can improve production tasks by avoiding the use of virtual memory and ensuring that data recovery can make use of reads from the memory buffer rather than disk. The study provides evidence that careful selection of refurbishment options for legacy hardware can extend their life considerably. For example, the combination of a replacement SSD and optimally compiled software increased user productivity on 2006 laptops and workstations. Legacy hardware that can no longer run Windows XP can usually be repurposed as Linux computers capable of a number of simulation related work tasks. For example, a re-configuredNetbook which was seen to be in line with the better ARM SBC for browsing models and checking details while visiting sites for consulting projects. Although ESP-r is natively hosted on Windows platforms, tests show that, on the same hardware ESP-r is roughly 30% slower on Windows 7. This might be because of OS resource requirements and it might be due to inefficient use of virtual memory. This suggests that there may be additional optimisation techniques to be explored for the Windows platform. The full matrix of computers/ compilers/ operating system variants and observations is being compiled and extended for a journal paper. ACKNOWLEDGEMENT Some of the computers used in this study were sourced within the University of Strathclyde. Critical advise on compiling EnergyPlus came from Linda Laurie. REFERENCES Beausoleil-Morrison, I. 2000. The adaptive coupling of heat and airflow modelling within dynamic whole-building simulation, Glasgow, University of Strathclyde. Hand J. 2014. Opportunities and constraints in the use of simulation on low cost ARM-based computers. eSim Conference, Ottawa Canada. Hand, J. 2015. Strategies for deploying virtual representations of the built environment, Glasgow Scotland.
  • 11. 10