Water Industry Process Automation & Control Monthly - April 2024
A shared-filesystem-memory approach for running IDA in parallel over informal computer clusters
1. National Technical University of Athens
School of Civil Engineering
A shared-filesystem-memory approach for
running IDA in parallel over informal
computer clusters
D. Vamvatsikos
National Technical University of Athens
…and friends….
EOSD 2017 Opensees Days Europe Porto, June 19-20, 2017
2. 2
Introduction
• Seismic performance evaluation. How?
• Incremental Dynamic Analysis is accurate but slow
– Multiple nonlinear dynamic analyses, Multiple records
– 1 CPU Needs patience!
• Can we run it faster?
– Use N CPUs.
– Parallelize each nonlinear analysis? OpenSeesMP
– Parallelize IDA? Easier, repetitive, more efficient
3. 3
LA9 building
• 9 stories, 1 basement, T1 = 2.3s
• 2D model with internal gravity frame, P-Delta, beam-hinges
4. 4
The IDA tracing problem
• High variability many
records
• Complex shapes many runs
• Focus: 20 recs, 12 runs each
• One i5 core = 40 hours
• Only for academic use!
5. 5
How to run IDA in parallel?
• Each record is completely independent from the others
– Distribute single-record IDA tasks
– Use up to 20 cores, pure linear efficiency
• Within each records the runs are not independent
– Distribute single runs need new tracing algorithm
– Up to 20x12=240 cores (ideally)
– Dependencies Runs may be wasted
6. 6
Target Application Environment
• MATLAB + OpenSees
• Informal cluster of dissimilar processors
– Multiple cores per physical processor
– New and old PCs
– Unreliable networks
– Incompetent system admins
– Random PC deaths (Win10 reboots, student magic….)
• Need resilience over an unreliable “cluster”
7. 7
Scheme 1: Distribute records
• Shared memory approach
– Everything stored in a file server (shared directory)
– One job = One .mat file
– Race conditions? Use directories as lockfiles
• Master / slave model
– Master assigns tasks (records)
– Slaves run task, return one IDA curve per record
– Master assigns tasks to self Max efficiency
Master CPU
Slave CPUs
Send record
Receive IDA curve
Assign record to self
8. 8
Scheme 1: Coarse grained
• Pros
– Negligible communication overhead
– Uses existing IDA tracing algorithms (hunt & fill)
– Easy programming
• Cons
– Low scalability
– Records / CPUs <> integer CPUs are idling
e.g. 20 records / 3 cpus 2 cpus run 7 records, one runs 6.
9. 9
Scheme 2: Distribute records then runs
• Master / slave-1 / slave-2 model
– Master assigns tasks (records)
– Slaves-1 assign single runs to slaves-2
– Master & slaves-1 assign runs to selves
Master CPU
Slave-1 CPUs
Send record
Receive IDA curve
Assign record/runs to self
Assign runs to self
Slave-2 CPUs
Send run IM
Receive EDPs
10. 10
Scheme 2: Medium-grained
• Pros
– Still low communication overhead
– Excellent scalability
– Almost linear efficiency
– Minimal CPU idling
• Cons
– Tougher to program: Dynamic Allocation of Tasks
– Slave-cores compete for jobs (first-come, first-serve)
– Needs new IDA tracing algorithm
11. 11
Serial Hunt & Fill IDA tracing
• Hunt-up (large steps)
• Bisect to find collapse
• Fill the gaps going down
• Hunt-up and Bisect are
unpredictable due to
collapse
Hunt-up
Fill-down
Bisect
12. 12
Parallel Hunt & Fill IDA tracing
• 3 CPUs more non-
converged runs
• Fill-in is still efficient
• 14 runs instead of 12
• More CPUs in hunt-up or
Bisect more waste!
• Still, more efficient that
stepping algorithm
Hunt-up
Fill-down
Bisect
13. 13
Example: 20 records, 3 identical CPUs
• Parallel hunt&fill achieves near linear performance (2.96/3 = 99%)
• When the last 2 records were run, cpu3 was helping alternatively cpus 1,2
15. How is this done in software?
15
• Master script:
[anls]=runIDA_NDpx(anls,trace,fupdate,sharedmem,runmode);
• Slave script
[icase]=runIDA_NDpx_slave(sharedmem,masterflag,...,imultcpu);
• One record subscript
[icase]=runIDA_NDpx_slave_onerec(sharedmem,...,imultcpu);
• One run subscript
[icase]=runIDA_NDpx_slave_onerun(sharedmem,...,imultcpu);
• Subscripts are run based on availability of jobs
Analysis and tracing
parameters
Tcl file updating info
for all runs & models
Shared memory
IP address
0 = partition records
1 = partition runs
Assign a number
to each core
Set to 1 if master
self-assigned the run
16. 16
Application: Model parameter uncertainties
• Monte-Carlo based
– Use full IDA (Ibarra, Dolsek, Vamva & Frag)
– Approximate IDA
• Response Surface (Liel et al)
• SPO2IDA (Frag & Vamva) + …
• Moment-Estimation based
– FOSM (Ibarra, Liel et al, Lee & Mosalam, Vamva & Frag)
– PEM (Vamva & Frag) + …
17. 17
Beam point-hinge model
• Allows hardening & softening with pinching loops
• Residual plateau terminates at ultimate rotation True hinge
• Means: ah = 10%, μc = 3, ac = -50%, r = 50%, μu = 6, aMy = 1
18. 18
“Mean” model IDA curves
• 30 records, 12 runs each
• P-Delta and beam hinging
Clear flatlines
• Summarization into fractiles
• Med. collapse capacity
• Large dispersion
20. 20
Let’s try to improve
• Progressive application of LHS
– Initial small sample
– Double size by adding samples in each step
– Stop when adequate accuracy
• Allow sampling on a record-by-record basis
– Old full multi-record IDA for each parameter sample
– New single-record IDA for each sample
– Recycle records if not enough
– Not new ideas, e.g. Schotanus & Franchin
21. 21
Progressive LHS (1)
• Start with Npoints ≥ Nvars
• Say 4 samples for 2 vars
• Run analysis
• Double the samples 8
22. 22
Progressive LHS (1)
• Start with Npoints ≥ Nvars
• Say 4 samples for 2 vars
• Run analysis
• Double the samples 8
23. 23
Progressive LHS (2)
• Run analysis
• Double the samples 16
• Double again 32
• Stop, e.g. when dispersion is
“stable”
24. 24
Good but not perfect
• Pros
– Excellent scalability
– Removes problem of a priori definition of sample size
– Handle large number of r.v’s: insignificants disappear!
– Better coverage of sample space (like orthogonal LHS)
• Cons
– Cannot use fast Iman-Conover algorithm for correlation
– Prefer genetic-algorithms, e.g. Charmpis&Panteli (2004)
– These may be slower but more accurate
26. 26
Good but not perfect (again)
• Pros
– Handles large number of random vars ( > 500)
– Can vary incident angle (More records is better though)
– Place influential vars first to better capture correlation
• Cons
– Cannot use some “fast-IDA” techniques (e.g. Liel et al response
surface, or Azarbakht-Dolsek priority lists, or SPO2IDA)
– Cannot distinguish epistemic from aleatory (Do we care?)
27. 27
Convergence
• Stable after 4th generation
• 160 samples
• Stable after 5th generation
• 320 samples
• Note 270 random vars!
28. 28
Compare with “mean” model
• Medians differ!
• Conservative bias due to
correlation structure
• Betas are similar
• In contrast with other results
• Still needs work!
29. 29
Some concluding remarks
• Is this worth it?
– Why not OpenSeesSP / OpenSeesMP?
– Why not HTCondor or similar?
– Why Matlab and not Python?
• Where to go now?
– If found useful, easily ported to Python
– Works for MSA, IDA, Cloud, any strategy
– You can use this now: Will release by end of summer!