A shared-filesystem-memory approach for running IDA in parallel over informal computer clusters

National Technical University of Athens
School of Civil Engineering
A shared-filesystem-memory approach for
running IDA in parallel over informal
computer clusters
D. Vamvatsikos
National Technical University of Athens
…and friends….
EOSD 2017 Opensees Days Europe Porto, June 19-20, 2017

2
Introduction
• Seismic performance evaluation. How?
• Incremental Dynamic Analysis is accurate but slow
– Multiple nonlinear dynamic analyses, Multiple records
– 1 CPU  Needs patience!
• Can we run it faster?
– Use N CPUs.
– Parallelize each nonlinear analysis? OpenSeesMP
– Parallelize IDA? Easier, repetitive, more efficient 

3
LA9 building
• 9 stories, 1 basement, T1 = 2.3s
• 2D model with internal gravity frame, P-Delta, beam-hinges

4
The IDA tracing problem
• High variability  many
records
• Complex shapes  many runs
• Focus: 20 recs, 12 runs each
• One i5 core = 40 hours
• Only for academic use!

5
How to run IDA in parallel?
• Each record is completely independent from the others
– Distribute single-record IDA tasks
– Use up to 20 cores, pure linear efficiency
• Within each records the runs are not independent
– Distribute single runs  need new tracing algorithm
– Up to 20x12=240 cores (ideally)
– Dependencies  Runs may be wasted

6
Target Application Environment
• MATLAB + OpenSees
• Informal cluster of dissimilar processors
– Multiple cores per physical processor
– New and old PCs
– Unreliable networks
– Incompetent system admins
– Random PC deaths (Win10 reboots, student magic….)
• Need resilience over an unreliable “cluster”

7
Scheme 1: Distribute records
• Shared memory approach
– Everything stored in a file server (shared directory)
– One job = One .mat file
– Race conditions? Use directories as lockfiles
• Master / slave model
– Master assigns tasks (records)
– Slaves run task, return one IDA curve per record
– Master assigns tasks to self  Max efficiency
Master CPU
Slave CPUs
Send record
Receive IDA curve
Assign record to self

8
Scheme 1: Coarse grained
• Pros
– Negligible communication overhead
– Uses existing IDA tracing algorithms (hunt & fill)
– Easy programming
• Cons
– Low scalability
– Records / CPUs <> integer  CPUs are idling
e.g. 20 records / 3 cpus  2 cpus run 7 records, one runs 6.

9
Scheme 2: Distribute records then runs
• Master / slave-1 / slave-2 model
– Master assigns tasks (records)
– Slaves-1 assign single runs to slaves-2
– Master & slaves-1 assign runs to selves
Master CPU
Slave-1 CPUs
Send record
Receive IDA curve
Assign record/runs to self
Assign runs to self
Slave-2 CPUs
Send run IM
Receive EDPs

10
Scheme 2: Medium-grained
• Pros
– Still low communication overhead
– Excellent scalability
– Almost linear efficiency
– Minimal CPU idling
• Cons
– Tougher to program: Dynamic Allocation of Tasks
– Slave-cores compete for jobs (first-come, first-serve)
– Needs new IDA tracing algorithm

11
Serial Hunt & Fill IDA tracing
• Hunt-up (large steps)
• Bisect to find collapse
• Fill the gaps going down
• Hunt-up and Bisect are
unpredictable due to
collapse
Hunt-up
Fill-down
Bisect

12
Parallel Hunt & Fill IDA tracing
• 3 CPUs  more non-
converged runs
• Fill-in is still efficient
• 14 runs instead of 12
• More CPUs in hunt-up or
Bisect  more waste!
• Still, more efficient that
stepping algorithm
Hunt-up
Fill-down
Bisect

13
Example: 20 records, 3 identical CPUs
• Parallel hunt&fill achieves near linear performance (2.96/3 = 99%)
• When the last 2 records were run, cpu3 was helping alternatively cpus 1,2

14
Performance Comparison
• For 60 CPUs, each record uses 3+ cores simultaneously
• Efficiency drops to 88% only

How is this done in software?
15
• Master script:
[anls]=runIDA_NDpx(anls,trace,fupdate,sharedmem,runmode);
• Slave script
[icase]=runIDA_NDpx_slave(sharedmem,masterflag,...,imultcpu);
• One record subscript
[icase]=runIDA_NDpx_slave_onerec(sharedmem,...,imultcpu);
• One run subscript
[icase]=runIDA_NDpx_slave_onerun(sharedmem,...,imultcpu);
• Subscripts are run based on availability of jobs
Analysis and tracing
parameters
Tcl file updating info
for all runs & models
Shared memory
IP address
0 = partition records
1 = partition runs
Assign a number
to each core
Set to 1 if master
self-assigned the run

16
Application: Model parameter uncertainties
• Monte-Carlo based
– Use full IDA (Ibarra, Dolsek, Vamva & Frag)
– Approximate IDA
• Response Surface (Liel et al)
• SPO2IDA (Frag & Vamva) + …
• Moment-Estimation based
– FOSM (Ibarra, Liel et al, Lee & Mosalam, Vamva & Frag)
– PEM (Vamva & Frag) + …

17
Beam point-hinge model
• Allows hardening & softening with pinching loops
• Residual plateau terminates at ultimate rotation  True hinge
• Means: ah = 10%, μc = 3, ac = -50%, r = 50%, μu = 6, aMy = 1

18
“Mean” model IDA curves
• 30 records, 12 runs each
• P-Delta and beam hinging 
Clear flatlines
• Summarization into fractiles
• Med. collapse capacity
• Large dispersion

19
200 median IDAs (real or approximate)

20
Let’s try to improve
• Progressive application of LHS
– Initial small sample
– Double size by adding samples in each step
– Stop when adequate accuracy
• Allow sampling on a record-by-record basis
– Old  full multi-record IDA for each parameter sample
– New  single-record IDA for each sample
– Recycle records if not enough
– Not new ideas, e.g. Schotanus & Franchin

21
Progressive LHS (1)
• Start with Npoints ≥ Nvars
• Say 4 samples for 2 vars
• Run analysis
• Double the samples  8

22
Progressive LHS (1)
• Start with Npoints ≥ Nvars
• Say 4 samples for 2 vars
• Run analysis

23
Progressive LHS (2)
• Run analysis
• Double again  32
• Stop, e.g. when dispersion is
“stable”

24
Good but not perfect
• Pros
– Excellent scalability
– Removes problem of a priori definition of sample size
– Handle large number of r.v’s: insignificants disappear!
– Better coverage of sample space (like orthogonal LHS)
• Cons
– Cannot use fast Iman-Conover algorithm for correlation
– Prefer genetic-algorithms, e.g. Charmpis&Panteli (2004)
– These may be slower but more accurate

25
Include records in LHS sampling
• If not enough records: Recycle + vary incident angle
No X1 X2 … XN XN+1 XN+2
1 x1,1 x1,2 … x1,N ang1 Rec1
2 x2,1 x2,2 … x2,N ang2 Rec2
… … … … … … …
M xM,1 xM,2 … xM,N angM RecM
M+1 xM+1,1 xM+1,2 … xM+1,N angM+1 Rec1
M+2 xM+2,1 xM+2,2 … xM+2,N angM+2 Rec2
… … … … … … ….

26
Good but not perfect (again)
• Pros
– Handles large number of random vars ( > 500)
– Can vary incident angle (More records is better though)
– Place influential vars first to better capture correlation
• Cons
– Cannot use some “fast-IDA” techniques (e.g. Liel et al response
surface, or Azarbakht-Dolsek priority lists, or SPO2IDA)
– Cannot distinguish epistemic from aleatory (Do we care?)

27
Convergence
• Stable after 4th generation
• 160 samples
• Stable after 5th generation
•  320 samples
• Note 270 random vars!

28
Compare with “mean” model
• Medians differ!
• Conservative bias due to
correlation structure
• Betas are similar
• In contrast with other results
• Still needs work!

29
Some concluding remarks
• Is this worth it?
– Why not OpenSeesSP / OpenSeesMP?
– Why not HTCondor or similar?
– Why Matlab and not Python?
• Where to go now?
– If found useful, easily ported to Python
– Works for MSA, IDA, Cloud, any strategy
– You can use this now: Will release by end of summer!

A shared-filesystem-memory approach for running IDA in parallel over informal computer clusters

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A shared-filesystem-memory approach for running IDA in parallel over informal computer clusters

Similar to A shared-filesystem-memory approach for running IDA in parallel over informal computer clusters (20)

More from openseesdays

More from openseesdays (20)

Recently uploaded

Recently uploaded (20)

A shared-filesystem-memory approach for running IDA in parallel over informal computer clusters