I understand that physics and hardware emmaded on the use of finete .pdf
survey_of_matrix_for_simulation
1. Energy Systems Research Unit Email: esru@strath.ac.uk
Dept. of Mechanical and Aerospace Engineering Tel: +44 (0)141 548 2314
75 Montrose Street, Glasgow G1 1XJ http://www. strath.ac.uk/esru
The University of Strathclyde is a charitable body, registered in Scotland, number SC015263
ESRU occasional paper
Survey of a matrix of computing hardware and
compilation influences on the deployment of ESP-r and
EnergyPlus
Dr. Jon W. Hand
Energy Systems Research Unit
13 October 2015
2. 1
SURVEY OF A MATRIX OF HARDWARE AND COMPILATION INFLUENCES ON1
THE DEPLOYMENT OF ESP-r AND ENERGYPLUS2
3
Dr. Jon W. Hand1
4
1
Energy Systems Research Unit, University of Strathclyde, Glasgow, Scotland5
6
13 October 20157
8
9
10
Contents'11
!12
ABSTRACT .............................................................................................................................................................2!13
INTRODUCTION ....................................................................................................................................................2!14
HARDWARE ISSUES.............................................................................................................................................2!15
Disk types and size ...............................................................................................................................................2!16
Computer memory................................................................................................................................................3!17
Virtual (cache) memory........................................................................................................................................3!18
Processor type.......................................................................................................................................................3!19
Computational platforms considered....................................................................................................................4!20
COMPILER DIRECTIVES......................................................................................................................................4!21
The test models.....................................................................................................................................................5!22
Building models for ESP-r & EnergyPlus............................................................................................................6!23
Data recovery timings...........................................................................................................................................7!24
EnergyPlus models ...............................................................................................................................................8!25
CONCLUSION.........................................................................................................................................................8!26
ACKNOWLEDGEMENT........................................................................................................................................9!27
REFERENCES .........................................................................................................................................................9!28
29
30
31
32
33
3. 2
1
ABSTRACT
This is an interim report from an ongoing
investigation of the relative contribution of various
hardware and compiler options on the efficacy of
specific ESP-r and EnergyPlus simulation tasks. The
paper draws on a range of techniques and
methodologies developed to port ESP-r to ARM
platforms such as the Raspberry Pi and extends them
across a range of traditional and emerging computing
platforms. Adapting to the hardware limits of ultra
low cost computers exposed a number of issues
related to disk access, disk type, memory, number of
computing cores, compiler options as well as the use
of virtual machines for a range of tool development
and simulation tasks. Among other findings is that
CFD convergence times can be reduced from 41s to
less than 10s, annual multi-domain assessments from
950s to 280s and non-optimal memory and virtual
computing can extend data recovery times by more
than a factor of 10.
INTRODUCTION
In 2012 the Raspberry Pi (www.raspberrypi.org) was
introduced, primarily as a vehicle to address a lack of
programming skills in UK schools. Its combination
of price and computational power proved attractive to
a far wider audience, including the 'maker'
community. This spawned competitors such as the
BeagleBone Black (beagleboard.org) and an
ecosystem of related add-on devices as well as a
community of developers testing the bounds of this
new class of device which were typically distributed
with Linux.
Few perceived such lightweight platforms might
support numerical simulation. However, in a
historical context, the development of simulation
tools and many classic numerical studies were
accomplished with even more constrained
computational resources At the eSim conference in
2014 the author presented the results of an initial port
of ESP-r to the original Raspberry Pi and
BeagleBone Black ARM-based computers and
observations of their use as software development
platforms as well as for carrying out various
performance assessment goals for different user
types.
From the subset of simulation tasks that the first
generation supported, subsequent ARM-based
computers, for example the Odroid-U3 from Korea
(www.hardkernel.com), have include multiple cores,
more memory and faster disk access. The user
experience gap between a conventional desktop
computer and the $70 Odoid-U3 is surprisingly
modest. For example, simultaneously editing of a
3200 surface ESP-r model while running a CFD
assessment and an Octave (an open source equivalent
to MatLab) turbine blade analysis session does not
saturate the Odroid's resources. However, less
constrained hardware is only part of the story.
The author observed that particular adjustments to
the numerical source code and to the compiling tool
chain resulted in significant improvements in the
build process, subsequent user interactions and the
run times for assessments. The magnitude of
improvement was dependent on the specifics of the
hardware, the complexity of the model and the nature
of the assessments carried out.
This paper assesses whether the techniques explored
during the ARM study are applicable in a broader
context of numerical tools, machine configurations
and ordering of simulation work tasks. Both ESP-r
(www.esru.strath.ac.uk/publications.htm ) and
EnergyPlus (apps1.eere.energy.gov/buildings/
energyplus/) are used to test this idea. As many of
the constraints noted in ARM platforms are found in
older laptops and workstations the study also assess
the extent of improvement for such computers as
well as for computers which fit the conventional
definition of a numerical workstation. The author
observed that the details of computer hardware, e.g.
type of disk, provision of physical memory and
virtual memory and processor type had an impact on
the time it took to carry out specific simulation tasks
on models of different complexity.
HARDWARE ISSUES
Disk types and size
Most ARM single board computers (SBC) rely on
SDHC cards for disk storage as well as swap space.
SDHC cards were not designed for operating systems
and disk I/O is substantially constrained. Some SBC
and tablets make use of eMMC storage that are mid-
way between SDHC cards and rotational drives in
terms of speed. Conventional computers often
include slower rotational drives rather than SSD
drives. SBC are typically paired with 8GB or 16GB
SDHC cards or eMMC for reasons of cost and this
constrains the space available for the build process
(EnergyPlus requires ~4GB to build) as well as the
space available for simulation files and performance
prediction files.
Another class of drive are the virtual drives
implemented within virtual computers. Tests indicate
that the overheads involved result in disk access
roughly half the speed of the computers native drive.
Disk I/O associated with lots of small files or with
random access files can be a magnitude slower than
sequential reads and writes reported in benchmarks.
ESP-r models may consist of scores of small files and
performance predictions for each domain are held in
4. 3
sequentially written binary files with data recovery
mostly involving random access. Conversly,
EnergyPlus is reading few files and sequentially
writing ASCII files and optionally a SQL database.
Benchmarking tools such as (a) Blackmagick disk
speed test on OSX and (b) CrystalDiskMark on
Windows use various disk I/O tests for sequential
and random access. Table 1 shows their reporting
across a range of devices. This is only somewhat
indicative of the mix of simulation tool data recovery
tasks which have been assessed.
Table 1 Typical disk I/O speeds MB/s
Drive Type Sequential (a) Random (b)
write read write read
SDHC class 10 10-20 20 1.6 5.3
eMMC Odroid 15 55
USB 2 stick 2-8 18 0.03 5.4
USB 3 stick 15-20 60 0.6 4.1
USB 3 rotational 60 65 1.9 0.5
Old 2.5" rotational 40 45
USB 3 SSD 110 245 1.2 1.6
Network drive 110 110 1.2 1.6
T61 SSD 110 130 33 22
T61 Rotational 60 60 0.3 0.4
Dell 7010
rotational
90 77 1.5 1.2
Dell 755 rotational 84 87
Dell 755 virtualbox 42 47
Macbook Air SSD 222 600 99 20
Many simulation tasks are disk-bound. ESP-r has a
number of user choices which impact the size of the
results files created: the extent of performance data
written (i.e. save level), the period of the assessment
and the building and system time step used. An ESP-
r model at the limits of geometric complexity running
several months at a one minute building time step
and systems time step of one second time step might
generate upwards of 40 GB. For example an annual
15 minute time step 1890s villa model constrained
performance file is 354MB and the extensive file is
4.27GB. EnergyPlus also incudes optional directives
which may constrain the number of entities which are
reported on or, for example omit or constrain SQL
outputs..
Computer memory
Constrained resource computers often run Linux
because of the memory footprint of the operating
system. With ~512MB RAM there is still ~400MB
available. One of the initial challenges of porting
ESP-r was to create a suite of executables that could
run in their usual combinations within the available
memory. Rather than purge multi-domain
functionality the route taken was to constrain model
complexity via alternative sets of header files for
small and standard deployments. Small deployments
are targeted at low resource computers, but can be
advantageous when running assessments within
virtual computers.
Memory is also an issue during the build process,
some compile options require substantially more
memory and may drive the whole process into virtual
memory. The build process on ESP-r essentially
doubled in speed when low resource linking
commands are used. EnergyPlus is hungry for
memory and disk space during the build process and
was rarely successful if there is less than ~800MB
RAM and 4GB of free disk space available.
Given sufficient memory, operating systems use
memory as a buffer for the disk I/O associated with
simulations and subsequent data-mining. There is a
considerable speedup if free memory is greater than
the size of the files being written. It is also the case
that where multiple assessments need to be carried
out it is often much faster to:
simulate_a extract_a simulate_b extract_b
rather than
simulate_a simulate_b extract_a extract_b.
Thus, critical adjustments to the scope of assessment
or the ordering of tasks can reduce the penalty for
data extraction across most platforms.
Virtual (cache) memory
Virtual memory via a swap file on the disk is used
when physical memory is depleted. If slow disks are
combined with limited memory then swap is
increasingly used and performance degrades. This
was evident in the initial porting of ESP-r to the
Raspberry Pi.
Tools such as ESP-r are composed of many modules,
for example the ESP-r project manager (prj) will
invoke the simulation engine (bps) and later the
results analysis module (res). Although usually not
an issue in conventional deployments, with
constrained memory simulation executables need to
be constrained in size to avoid running in swap. The
approach taken in ESP-r is to have alternative header
files that support different levels of model
complexity. This is also helpful when running
simulation within virtual environments or where
separate processor cores are used for parallel
assessments.
Processor type
ESP-r has had a long history of deployment on
different computer platforms and operating systems.
One of the challenges of the study was to build
EnergyPlus on ARM. Although most users perceive
EnergyPlus as a simulation engine, it runs within the
context of a set of pre-processing and post-processing
utilities. The simulation engine is relatively
straightforward to compile from the Fortran source,
however the utilities (e.g. ) in the standard 8.1
distributions are pre-compiled executables. It proved
5. 4
difficult to compile the complete set of utilities from
scratch for use on ARM. Eventually the 8.2
EnergyPlus source was used.
Computational platforms considered
The context of the study is a range of computers
spanning ultra low cost computers, tablets,
legacycomputers as well as conventional
workstations. The mix of hardware configurations
allowed many of the hardware sensitivities to be
explored. For this study, most comparisons are done
with computers configured to run Linux or emulating
Linux. With minor variants, the command syntax,
form of user interaction, benchmarking tools,
operating system resource requirements and support
for scripting are roughly consistent. The list below
summarises the computer platforms in terms of .
Name, computer type, CPU, CPU speed, RAM,
Operating system, compiler, swop space and epoch.
1. Dell 7010, desktop, 4x Intel i5-3470 @3.2 GHz,
8GB ram, Ubuntu 14.04.1 LTS. Linux 3.11.0,
GCC 4.8.2, Cache 8061 Mb, 2012
2. Macbook Air, laptop, 2x Intel i5 @ 1.3 GHz,
4GB RAM, OSX 10.8.3, GCC 4.7, Cache 4GB,
SSD, 2013
3. Dell 755 2x, desktop, Intel Core 2 Duo E6550 @
2.33GHz 3.91GB RAM, Mint 16, GCC 4.8.1,
Cache 4049 MB, 2007
4. Dell 755 virtualbox 1.9GB memory 1 processor,
WattOS, GCC 4.7.2
5. IBM thinkpad T61, laptop, Intel Core2 2.0 GHz
2GB RAM Linux Mint, GCC 4.8, SSD, 2007
6. IBM thinkpad T61, laptop, Intel Core2 2.0 GHz
2GB RAM Linux Mint, GCC 4.8
7. EeePC901, netbook, Atom N270, 1.6GHz,1GB
RAM, WattOS, GCC 4.7.2 , Cache 512 Mb,
SSD, 2008
8. Odroid, SBC, ARMv7l, 2GB RAM, Ubuntu
14.04 GCC 4.8.2 , Cache 0 Mb, eMMC, 2014
9. HUDL, tablet, ARMv7l 1GB RAM, Debian 7.7,
GCC 4.6.3, Cache 0 Mb, eMMC, 2013
10. IBM T23, laptop, P-III 1.2Ghz 1GB RAM,
Vector Linux, GCC 4.5.2, Cache 1024Mb,2003
11. Raspberry Pi 2 SBC, ARMv7l, 762MB RAM,
Debian 7.8, GCC 4.6.3, Cache 921 Mb, SDHC,
2015
12. BBB, SBC, ARMv7l, 507GB RAM, Linux
3.8.13, ? GCC 4.7.2 , Cache 820 Mb, SDHC,
2013
13. Raspberry Pi, SBC, ARMv6l, 481MB RAM,
Debian 7.2, GCC 4.6.3, Cache 921 Mb, SDHC,
2012
Standard Linux Hardinfo benchmarks
<sourceforge.net/projects/hardinfo.berlios> are
shown in Table 2 (ordered by the FFT benchmark).
In the table VB denotes a virtual computer. Notice
the impact of running a virtual computer. The newer
ARM processors are in the same magnitude as older
laptops for single core performance. However they
tend to perform rather better for simulation tasks than
the benchmarks indicate.
Table 2 Typical benchmarks
Computer CPU
Blowfish
Crypto-
hash
Fabo-
nacci
FFT
Dell 7010 2.5 712.7 1.4 0.7
Dell 755 7.4 187.0 3.7 3.5
Dell 755 VB 14.6 89.7 3.8 7.1
IBM T61 8.7 161.1 4.1 3.9
IBM T23 37.5 32.6 8.9 23.1
EeePC901 20.6 41.2 7.5 28.5
HUDL 30.3 35.0 7.5 29.5
Odroid 28.4 37.0 6.9 28.8
Raspberry
Pi 2
55.5 17.2 14.5 69.9
BBB 47.6 - 14.7 74.3
Raspberry
Pi (orig)
73.8 - 21.7 119.
4
In practice, both EnergyPlus and ESP-r executables
run on a single core. ESP-r is a suite of applications
so there are usually more than one application active
and thus a second core is useful. This study found
that some sequential and most parallel invocations of
simulation assessments were disk-bound.
COMPILER DIRECTIVES
The impact of options in the compilation tool chain
have rarely been discussed within the simulation
community. Some tools which are normally compiled
from source, such as Radiance default to directives
for speed of execution. The standard distribution of
EnergyPlus is also optimised for speed. ESP-r,
which is distributed as source, the Install script
included no optimisation directives before is study.
The compiler tool chain is based on GNU GCC so
similar optimisations to be applied to both ESP-r and
EnergyPlus. The optimisation directives
(gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html )
for the GCC compiler suite are:
1. -O0 (no optimisation, fastest compile time,
useful for debug sessions),
2. -O1 (attempts to reduce code size and execution
time with least impact on compilation time, more
memory required),
3. -O2 (additional attempt to reduce execution
time, increased compilation time and memory
required)
4. -O3 (additional in-line function and loop
optimisation, even longer compile time)
To quantify the impact of alternative build directives,
both ESP-r and EnergyPlus were re-compiled using
the O0, O1, O2 and O3options. For low resource
computers, the small version header files were used.
Table 3 shows ESP-r build times. For most platforms
there is are significant time impacts from O0 to O1
6. 5
but little between O2 and O3. The full install
included building the databases and configuring 280
training and validation models. Virtual box (VB) are
Linux running under the host Linux. Rotational (rot)
and SSD are also listed separately.
Table 3 Build time for ESP-r
Computer Install type -O0 -O1 -O2 -O3
Dell 7010 full 6m5s 13m34s 17m48 18m16s
Mac Air full 8m38s 12m22s 17m5s 18m39s
Dell 755 full 11m6s 28m2s 37m53s 39m17s
Dell 755 VB full 18m22s? 24m28s? 30m59s 36m36s
IBM T61 rot full 12m12s
IBM T61 SSD full 11m39s 30m03s 39m50s 41m51s
IBM T61 SSD
W7 MSYS2
full 27m58s 33m19s 53m35
IBM T23 full 51m14s 91m 113m 140m
Odroid full 34m48s 61m 92m 98m
EeePC 901 full 39m15s 63m 84m 91m
Raspberry Pi 2 full 60m8s 125m 170m 177m
Raspberry Pi full 106m 329m 478m -
Clearly for development work the -O0 option has
benefits, especially given the time involved for low-
resource computers. The subsequent discussion about
simulation task times will clearly demonstrate the
benefit of distributing -O1 or -O2 executables across
all machine types, especially older and low resources
computers.
The details of EnergyPlus builds is shown below for
two platforms where EnergyPlus and all the utilities
were compiled from scratch. For OSX and Linux on
Intel the standard distribution utilities were used with
separate EnergyPlus versions compiled with -O0 -O1
-O2 -O3.
• Dell 7010 make install -O0 47m27s
• Raspberry Pi 2 make install -O0 231m24s
The test models
Models were selected for ESP-r and EnergyPlus
representing different levels of complexity as well as
exercising different solvers. For CFD assessments
the suite of CFD benchmarks developed by Ian
Beausoleil-Morrison (2000) have been used. The four
cases highlighted are:
1. basic.dfd 960 cells, 1 inlet, 1 outlet, no
blockages k-e pressure and velocity solved ~260
iterations
2. porous.dfd 960 cells, 1 inlet, 1 outlet 18
blockages k-e pressure and velocity solved ~400
iterations
3. bi-cg.dfd 24360 cells, 4 inlets, 4 outlets k-e solve
pressure velocity ~2000 iterations
Figure 1 shows a variant with an wall inlet, ceiling
extract and a grid of internal blockages.
Figure 1 CFD domain with internal blockages
CFD is numerically intensive with little disk I/O.
Looking at CFD performance across the matrix of
hardware and software optimisation in Table 4, it is
clearly possible to use resource-constrained
computers in combination with high optimisation.
ARM has a big step in performance between O0 &
O1, moderate gains with O2 while O3 is marginal.
OSX has a big step in performance from O1 & O2
and marginal gains from O3. Linux on Intel has a big
step from O0 to O1 and little or no improvement with
O2 & O3. When running on a virtual computer a -O1
or O2 will perform roughly in line with un-optimised
software on the host computer. The Raspberry Pi 2
optimises better than the BBB, perhaps because of
differences in the ARM chip implementation.
7. 6
Table 4 CFD performance matrix (seconds)
basic.dfd -O0 -O1 -O2 -O3
Dell 7010 1.3 <1 <1 <1
Dell 7010 VB 2.2 1.3 <1 <1
MacAir 2.3 2.2 <1 <1
Dell 755 2.8 1.3 1.2 1.2
Dell 755 VB 3.6 2.3 1.8 -
IBM T61 2.7 1.3 1.2 1.1
IBM T61 W7 3.5 1.7 1.6 -
IBM T23 10.1 7.1 5.5 7.0
Odroid 12.7 4.5 4.0 4.1
HUDL 14.5 4.7 4.2 -
Rasp Pi2 30.6 12.0 9.7 9.6
BBB 30.3 16.9 17.0 15.8
Rasp Pi 51.6 25.9 - 61.7
porous.dfd -O0 -O1 -O2 -O3
Dell 7010 4.0 1.5 1.5 1.5
Dell 7010 VB 5.6 2.6 2.1 -
Air 6.5 6.2 2.2 2.1
Dell 755 8.2 3.3 3.2 3.1
Dell 755 VB 8.8 4.9 4.4 -
IBM T61 8.7 3.3 3.2 3.6
IBM T61 W7 11.2s 4.2 4.0 -
IBM T23 29.0 15.6 13.6 15.0
Odroid 37.1 11.7 9.6 -
HUDL 41.7 10.0 10.1 10.0
BBB 79.1 33.7 33.7 32.6
Rasp Pi2 84.2 26.7 23.1 22.9
Rasp Pi 127.0 55.6 - 38.4
Bi_cg.dfd -O0 -O1 -O2 -O3
Dell 7010 229 127 127 122
Dell 7010 VB 357 227 227- -
Air 410.6 391.0 215 205
Dell 755 599.5 357 338 332
Dell 755 VB 670 574 565 -
IBM T61 544 351 332 326
IMB T61 W7 734 486 480 -
IBM T23 2378 2202 1742 2279
Odroid 2284 897 758 -
Rasp Pi2 - - - 2750
Building models for ESP-r & EnergyPlus
Student scale models are characterised by the cellular
office (Figure 2). ESP-r includes a dozen variants of
this model exploring various simulation facilities. For
EnergyPlus, the standard Supermarket.idf was used.
Assessment tasks include different periods and are
reported in Table 5. For example, a spring week for
initial model calibration and reality checks. Every
platforms could carry out the task takes less than 30
seconds. A January-February assessment explores the
distribution of peak and typical demands and all
platforms could carry out this task in less than one
minute. The four month summer assessment
highlights the benefits of optimisation. Lastly, we see
that annual assessment are problematic for low
resource machines without at least an -O1
optimisation. Four cores and single cores produce
similar timings for sequential tasks. Although the Pi
can run 4 simultaneous simulations, disk access is the
bottleneck.
Figure 2 Student scale model
A simple model from the point of view of
EnergyPlus is Supermarket.idf. For this study it was
adapted for an annual assessment at four time steps
per hour. The difference in performance between the
two Dell computers is substantial with the Odroid
ARM computer approaching the performance of the
older Dell more than the difference in performance
between the older Dell and the Odroid for ESP-r
assessments.
EnergyPlus simple model performance predictions
• Dell 7010 -O3 compile 0m34s
• Dell 755 -O3 compile 4m10s
• Odroid -O3 compile 5m56s
• Raspberry Pi 2 -O3 compile 10m10s
Table 5
Cellular office performance matrix (seconds)
Dell 7010
Period -O0 -O1 -O2 -O3
one week <1 <1 <1 <1
summer 4.1 2.1 2.0 -
annual 11.7 5.7 5.6 -
Air
one week <1 <1 <1 <1
summer 5.8 6.3 2.6 -
annual 16.4 7.3 6.8 -
Dell 755
one week 1.2 <1 <1 <1
summer 7.7 4.2 3.9 3.8
annual 21.9 11.6 10.4 10.2
T61 rotational
one week 1.3 <1 <1 <1
summer 8.2 4.7 4.4 4.4
annual 23.4 12.9 11.8 11.8
8. 7
T61 SSD
one week 1.2 <1 <1 <1
summer 8.0 4.3 3.9 3.8
annual 22.8 11.7 10.6 10.4
Odroid
one week - 2.9 2.8 2.7
summer - 15.5 14.3 14.6
annual - 42.9 40.0 40.9
Raspberry Pi 2
one week 5.7 5.9 5.5 5.4
summer 82.3 38.6 35.3 34.2
annual 247.5 108.7 102.4 97.4
BBB
one week 14.1 9.0 8.9 8.9
summer 126.9 77.7 76.2 79.7
The second building model tested is an 1890s Stone
Villa (Figure 3) which has typically been used to
assess refurbishment options and its geometric and
compositional detail reflects this. This model
includes 13 thermal zones and 432 surfaces. The
composition includes a mix of lightweight and heavy
entities (outer walls with 600mm of various stone
types). There is considerable diversity of room use
throughout the day and for different day types.
Figure 3 Moderate complexity model
The ESP-r model was exported from ESP-r as a V7.2
IDF file and then upgraded via the usual utilities to
an 8.2 IDF. Two variants were created, a base case
using conduction transfer functions at the same time
step as ESP-r used and the other uses the finite
difference solver at 20 time step per hour. The finite
difference solver directives would be roughly
analogous to the finite volumes used in ESP-r.
In this case, the size of the results files can become
an issue. ESP-r supports multiple save levels - save 4
which includes a full energy balance at all zones and
all surfaces and save 2 which does not include the
energy balance for surfaces. For example the stone
simi ESP-r model a one week assessment is 6.9MB
and 82.4MB, a two month assessment is 57MB and
693MB, summer 118MB and 1.43GB and an annual
assessment is 354MB and 4.28 GB.
The performance matrix for ESP-r is shown in Table
6. The entries marked s2 are for constrained
performance data and s4 include a full energy
balance. Of interest is the improvement across all
platforms, especially as model complexity increases
and for the summer or annual assessments. All other
factors being consistent Linux annual run time
reductions from 950s to 230s and OSX run time
reductions from 744s to 181s has generated a number
of oh-my-goodness reactions from users. It was
possible for an optimised Dell 755 ESP-r to surpass a
newer but un-optimised Dell 7010 ESP-r. Indeed a
fully optimised Raspberry Pi 2 approached the
Lenovo T61 un-optimised performance.
Data recovery timings
The extraction of data from ESP-r results files is not
particularly sensitive to the level of build
optimisation. Rather it depends on the nature of the
disk drive, the available memory and the extent of
the results file being scanned. The SDHC cards in
several of the SBC were seen to be especially slow
for scanning large results files. Similarly, constrained
memory prevented data recovery from memory
buffer and forced disk reads for several of the cases.
The Table 7 shows timings including runs which
were impacted by lack of free memory (*)
Table 6 Stone villa performance matrix (seconds)
Dell 7010
Period -O0 -O1 -O2 -O3
s2 week 29.8 10.5 9 -
s2 summer 164 57 51 -
s2 annual 448 164 139 -
s4 summer 167 56 54
s4 annual 459 164 148
Dell 755
s2 summer 347 128 102 103
s2 annual 953 352 281 280
s4 summer 355 133 108 107
s4 annual 974 368 296 294
Dell 755 Virtual Box WattOS
s2 summer 371 162 163 162
s2 annual 1010 531 465
s4 summer 340 199 145
s4 annual - - - -
Macbook Air
s2 week 47 14 12 12
s2 summer 258 78 66 66
s2 annual 774 209 181 181
s4 summer 263 81 70 73
s4 annual 723 226 194 203
T61 rotational
s2 week 67 28.4 23.6 23.2
s2 summer 369 157 131 130
s2 annual 1010 430 351 361
9. 8
s4 summer 375 163 146 137
s4 annual 1031 444 395 368
T61 SSD W7
s2 week 93 41 30
s2 summer 497 198 150
s2 annual 1388 533 430
Raspberry Pi 2
s2 week 704 270 220 209
s2 summer 3925 - 1190 1159
s2 annual 10803 - 3217 3218
s4 week 707 - 219 213
s4 summer 3943 - 1225 1238
Table 7 Data recovery matrix (elapsed seconds)
Computer one
week
Jan-
feb
summer annual
Cellular office model
Dell 7010 <1 <1 1 2
Mac Air <1 1 2 4
Dell 755 <1 1.5 2.2 4-5
T61 (rot) 1 1.5 2.5 4-6
T61 SSD 1 1.8 2.9 5-7
Odroid 2.4 4s 6 12
Rasp Pi 2 5 8 12 47
BBB 5 12 19 49
Stone villa model (constrained performance data)
Dell 7010 <1 3 5 8
Mac Air 1 7 13 37
Dell 755 1.5 6 13 36
Dell 755VB 3 9 17 52
T61 (rot) 2 7 14 41
T61 SSD 2 6 9 26
Odroid 6 25 49 140
Rasp Pi 2 9 28 52 140
BBB 9 29 54 150
Stone villa model (with full energy balance)
Dell 7010 1 5 10 15
Mac Air 2 13 32 390*
Dell 755 2 12 18 68
Dell 755VB 4 47 189*
T61 (rot) 3 14 35 247*
T61 SSD 3 19 43 369*
Odroid 4-9 50 104 -
Raspb Pi 2 9-10 41-48 180* -
BBB 10-11 141* 297* -
EnergyPlus models
The Supermarket.idf model, like the many of the
example models distributed with EnergyPlus, can be
used for calibration assessments or non-annual
assessments on the platforms studied.
The impact of compiler optimisations on conduction
transfer and finite difference assessments is shown
below for annual EnergyPlus 8.2 runs of the Stone
villa. The tool-chain optimisation improvements for
EnergyPlus are roughly in line with the pattern seen
with the ESP-r build process.
EnergyPlus annual run timings:
Dell 7010 -O1 2m48s with conduction transfer
Dell 7010 -O3 0m32s with conduction transfer
Dell 7010 -O1 59m20s with finite difference solution
Dell 7010 -O3 10m46s with finite difference solution
Dell 755 -O0 compile 9m55s
Dell 755 -O1 compile 4m41s
Dell 755 -O2 compile 3m32s
Dell 755 -O0 compile, finite difference 192m8s
Dell 755 -O1 compile, finite difference 102m58s
Dell 755 -O2 compile, finite difference 76m30s
Mac Air EnergyPlus 8.1 standard distribution 1m7s
Raspberry Pi 2 -O1 30m4s with conduction transfer
Raspberry Pi 2 -O3 12m34s with conduction transfer
Raspberry Pi 2 -O1 591m55s with finite difference
Raspberry Pi 2 -O3 194m28s with finite difference
It is unclear why the Dell 755 is so much less suited
to EnergyPlus finite difference production work. For
models which make use of the finite difference solver
low resource computers have a distinct disadvantage.
The GCC optimisations yield the expected pattern in
run time changes for both Fortran and C++.
However, the -O3 optimisation with GCC delivers a
less performance than the compiler used by the
EnergyPlus development team.
CONCLUSION
A matrix of computer hardware and software options
have been tested against a range of ESP-r and
EnergyPlus simulation models and for a range of
simulation tasks. Timings for numerical tasks and
performance recovery tasks have been reported.
What is more difficult to quantify in terms of timings
are user interactions associated with creating and
evolving models. Typically, these tasks require a
fraction of the available computing resource and here
the user experience of using low resource and older
hardware is less marked than standard numerical
benchmarks would suggest. The use of compiler
optimisation directives removes most of the latency
in the drawing of wire-frames and in the navigation
of models. Creating and evolving with models of
moderate complexity on all but the most constrained
of platforms would likely be acceptable to many
practitioners. Optimised software and hardware has
thus been seen to expand options for deploying
simulation to non-traditional platforms
10. 9
It has been seen that ARM processors support a
higher degree of optimisation followed by Intel
Linux and then Windows 7. For ESP-r it makes sense
to adopt at least the -O1 level for software to be
distributed. On Intel computers optimisation beyond -
O1 produces marginal improvements for a massive
increase in build times. Developers building
EnergyPlus may choose to debug with –O0 but
should remember to rebuild with –O3 for
distribution.
Without any hardware changes, a 2007 Dell 755 with
optimised software was seen to perform similarly to
an un-optimised 2012 Dell 7010 for a number of
simulation tasks. Similarly, optimised software can
make up for much of the numerical inefficiency in
the use of virtual machines. A Raspberry Pi 2 has
been used for student projects essentially without
comment if the software is fully optimised and
instructions avoid the generation of large files and
extensive data mining tasks.
There are clear indications that critical adjustments to
the scope of assessments and simulation work flow
can improve production tasks by avoiding the use of
virtual memory and ensuring that data recovery can
make use of reads from the memory buffer rather
than disk.
The study provides evidence that careful selection of
refurbishment options for legacy hardware can
extend their life considerably. For example, the
combination of a replacement SSD and optimally
compiled software increased user productivity on
2006 laptops and workstations.
Legacy hardware that can no longer run Windows
XP can usually be repurposed as Linux computers
capable of a number of simulation related work tasks.
For example, a re-configuredNetbook which was
seen to be in line with the better ARM SBC for
browsing models and checking details while visiting
sites for consulting projects.
Although ESP-r is natively hosted on Windows
platforms, tests show that, on the same hardware
ESP-r is roughly 30% slower on Windows 7. This
might be because of OS resource requirements and it
might be due to inefficient use of virtual memory.
This suggests that there may be additional
optimisation techniques to be explored for the
Windows platform.
The full matrix of computers/ compilers/ operating
system variants and observations is being compiled
and extended for a journal paper.
ACKNOWLEDGEMENT
Some of the computers used in this study were
sourced within the University of Strathclyde. Critical
advise on compiling EnergyPlus came from Linda
Laurie.
REFERENCES
Beausoleil-Morrison, I. 2000. The adaptive coupling
of heat and airflow modelling within dynamic
whole-building simulation, Glasgow, University
of Strathclyde.
Hand J. 2014. Opportunities and constraints in the
use of simulation on low cost ARM-based
computers. eSim Conference, Ottawa Canada.
Hand, J. 2015. Strategies for deploying virtual
representations of the built environment,
Glasgow Scotland.