SlideShare a Scribd company logo
Hierarchical Resilience for Structured AMR
Anshu Dubey, H. Fujita, Z. Rubenstein, B. Van Straalen and A. Chien
SIAM-CSE 2015
1
n+ 2
t
10 2
levellevellevel
t
t
sync sync
sync
n+1
n
t
refinement
level
 Block Structured Local Refinement
 Refined regions are organized into logically-rectangular patches
 Refined grids are dynamically created and destroyed
 Refinement is performed in time as well as in space
 The depth of hierarchy depends upon the range of scales
 Mostly solve hyperbolics, some elliptic and some PIC
The Basics of AMR
Resiliency Through Rollback
This talk is mostly from the perspective of hyperbolics
Rollback and Recovery
The first step is to look at the whole application
 Traditionally - checkpoint the entire state and reconstruct for restart
 Application determined state for transparent reconstruction
 Less data to save than system based checkpoint
 There have always been other reasons for checkpointing
 Batch queue limits
 Steering the simulation
 Experimenting with parameter tweaking
 Our first step was to utilize this capability to continue execution in
face of resource degradation
 Restart on a fewer number of MPI ranks
 We partnered with GVR which saves and versions the state as
dictated by the application
 Incorporate restart without an abort
GVR – Global View Resilience
 Two key ideas
 Multi-version, multi-stream distributed arrays: preserves application critical data,
enables flexible recovery from complex errors with fine-grain, localized manner
 Open resilience: cross-layer partnership for error handling, with unified error
handling interface
Our next step was to see if we can exploit the natural hierarchy of granularity in
AMR for differentiated localized recovery and quantify the costs
Exploiting Granularities and Encapsulations
A Level provides coarsest granularity in the framework
Convention – level below is coarser, level above is finer
A level knows
 All boxes at the level and their layout
 The mapping of boxes to processors
 Exchanges at the same level
 Fine coarse boundaries with the level below
 Local time step and the time step of the level below
Each level defines data on unions of rectangles
 Manages communications within itself
 During regridding carries out redistribution of boxes within
the level
 Also the reconstructs data into newly formed boxes from
the level below
 A level can do its own restart
GDS_level_0 GDS_level_2GDS_level_1
Mapping a level to an Array
Level n
Level n+1
Level n+2
P0
P1
P2
P3
P4
P5
P6
If we know that a level has no boxes on a processor, that level need not initiate
a recovery
Level Based Recovery
Once an array has been constructed for a level, version
increment is sufficient to save the state the next time
One wrinkle – versioning applies only as long as
there is no regridding
For regridding current array has to be destroyed and
new one constructed
This is one of the todo’s for GVR – allow
expansion and contraction during versioning
In general versioning is cheaper than constructing a
new array
We save the Box layout, the data for each box, and in
the metadata part a pointer to where the data of
each box lies
Exploiting Granularities and Encapsulations
The next layer of granularity is box within a level
 Box with its surrounding halo of ghost cells is a completely defined
computational region
 Iterators go through a list of local boxes and apply the same
operator to them
 All boxes of one level have same dx and same dt
 All boxes mapped to a processor may not be at the same level
 Different boxes on the same processor have different dt and
the operator maybe applied to them a different number of
times.
The way we construct arrays gives us box level granularity for
free
Failure Scenarios and Recovery Modes
We already discussed the node failure
 Data corruption, recovery at one level
 Read in the metadata to get the pointer to the data of the corrupted
patch
 Read in the data for corrupted patch
 Roll back to the beginning of the level’s time step and re-compute
 Meta-data corruption: read in the meta-data
 Roll-back may not be needed
 Data corruption – recovery at multiple levels
 Cascading effect- recovery may be needed in all the finer levels
(may not always be needed though)
 All finer levels roll-back to the starting timestep of the affected level
 May get costlier as affected level gets coarser
Still cheaper than global recovery because of no regridding

More Related Content

Viewers also liked

Animaciã³n tecnologia final
Animaciã³n tecnologia finalAnimaciã³n tecnologia final
Animaciã³n tecnologia final
Silvia Carmona
 
Medium presentation
Medium presentationMedium presentation
Medium presentation
Alex Stephen
 
презентація ман
презентація манпрезентація ман
презентація манnjhujdbwz
 
POSmedia_kampan roku 2012
POSmedia_kampan roku 2012POSmedia_kampan roku 2012
POSmedia_kampan roku 2012Marie Machatova
 
Steven N. Volpe resume
Steven N. Volpe resumeSteven N. Volpe resume
Steven N. Volpe resumeSteven Volpe
 
Myers_SIAMCSE15
Myers_SIAMCSE15Myers_SIAMCSE15
Myers_SIAMCSE15
Karen Pao
 
The top 5 hearing aid myths exposed
The top 5 hearing aid myths exposed   The top 5 hearing aid myths exposed
The top 5 hearing aid myths exposed
Bright Audiology
 
З досвіду роботи вчителя української мови та літератури
З досвіду роботи вчителя української мови та літературиЗ досвіду роботи вчителя української мови та літератури
З досвіду роботи вчителя української мови та літератури
njhujdbwz
 
Микола Вороний 10кл.
Микола Вороний 10кл.Микола Вороний 10кл.
Микола Вороний 10кл.
njhujdbwz
 
Medium Company Pres - EECS441 W15
Medium Company Pres - EECS441 W15Medium Company Pres - EECS441 W15
Medium Company Pres - EECS441 W15Alex Stephen
 
16158 євген маланюк
16158 євген маланюк16158 євген маланюк
16158 євген маланюк
njhujdbwz
 

Viewers also liked (12)

Animaciã³n tecnologia final
Animaciã³n tecnologia finalAnimaciã³n tecnologia final
Animaciã³n tecnologia final
 
Medium presentation
Medium presentationMedium presentation
Medium presentation
 
презентація ман
презентація манпрезентація ман
презентація ман
 
POSmedia_kampan roku 2012
POSmedia_kampan roku 2012POSmedia_kampan roku 2012
POSmedia_kampan roku 2012
 
Steven N. Volpe resume
Steven N. Volpe resumeSteven N. Volpe resume
Steven N. Volpe resume
 
Myers_SIAMCSE15
Myers_SIAMCSE15Myers_SIAMCSE15
Myers_SIAMCSE15
 
The top 5 hearing aid myths exposed
The top 5 hearing aid myths exposed   The top 5 hearing aid myths exposed
The top 5 hearing aid myths exposed
 
Gaurav Resume
Gaurav ResumeGaurav Resume
Gaurav Resume
 
З досвіду роботи вчителя української мови та літератури
З досвіду роботи вчителя української мови та літературиЗ досвіду роботи вчителя української мови та літератури
З досвіду роботи вчителя української мови та літератури
 
Микола Вороний 10кл.
Микола Вороний 10кл.Микола Вороний 10кл.
Микола Вороний 10кл.
 
Medium Company Pres - EECS441 W15
Medium Company Pres - EECS441 W15Medium Company Pres - EECS441 W15
Medium Company Pres - EECS441 W15
 
16158 євген маланюк
16158 євген маланюк16158 євген маланюк
16158 євген маланюк
 

Similar to Dubey_SIAMCSE15

MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEAravind NC
 
Dynamo.ppt
Dynamo.pptDynamo.ppt
Dynamo.ppt
ksjk1
 
Dynamo.ppt
Dynamo.pptDynamo.ppt
Dynamo.ppt
kaja56
 
[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)
Susang Kim
 
Concepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionConcepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, Attention
SaumyaMundra3
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
Geoffrey Fox
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
Yousef Fadila
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Junaid Bhat
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
Chao Han chaohan@vt.edu
 
deeplearning
deeplearningdeeplearning
deeplearning
huda2018
 
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarksVauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarksMumtaz Hannah Vauhkonen
 
Scalability and Reliability in the Cloud
Scalability and Reliability in the CloudScalability and Reliability in the Cloud
Scalability and Reliability in the Cloud
gmthomps
 
Multilayer & Back propagation algorithm
Multilayer & Back propagation algorithmMultilayer & Back propagation algorithm
Multilayer & Back propagation algorithm
swapnac12
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
Data Con LA
 
Bh36352357
Bh36352357Bh36352357
Bh36352357
IJERA Editor
 
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2
Iffat Anjum
 
Container independent failover framework
Container independent failover frameworkContainer independent failover framework
Container independent failover frameworktelestax
 
Container Independent failover framework - Mobicents Summit 2011
Container Independent failover framework - Mobicents Summit 2011Container Independent failover framework - Mobicents Summit 2011
Container Independent failover framework - Mobicents Summit 2011telestax
 

Similar to Dubey_SIAMCSE15 (20)

MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
 
Dynamo.ppt
Dynamo.pptDynamo.ppt
Dynamo.ppt
 
Dynamo.ppt
Dynamo.pptDynamo.ppt
Dynamo.ppt
 
[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)
 
Concepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionConcepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, Attention
 
Os
OsOs
Os
 
Reginf pldi3
Reginf pldi3Reginf pldi3
Reginf pldi3
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
deeplearning
deeplearningdeeplearning
deeplearning
 
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarksVauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
 
Scalability and Reliability in the Cloud
Scalability and Reliability in the CloudScalability and Reliability in the Cloud
Scalability and Reliability in the Cloud
 
Multilayer & Back propagation algorithm
Multilayer & Back propagation algorithmMultilayer & Back propagation algorithm
Multilayer & Back propagation algorithm
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
 
Bh36352357
Bh36352357Bh36352357
Bh36352357
 
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2
 
Container independent failover framework
Container independent failover frameworkContainer independent failover framework
Container independent failover framework
 
Container Independent failover framework - Mobicents Summit 2011
Container Independent failover framework - Mobicents Summit 2011Container Independent failover framework - Mobicents Summit 2011
Container Independent failover framework - Mobicents Summit 2011
 

More from Karen Pao

LupoPasini_SIAMCSE15
LupoPasini_SIAMCSE15LupoPasini_SIAMCSE15
LupoPasini_SIAMCSE15
Karen Pao
 
Druinsky_SIAMCSE15
Druinsky_SIAMCSE15Druinsky_SIAMCSE15
Druinsky_SIAMCSE15
Karen Pao
 
Barker_SIAMCSE15
Barker_SIAMCSE15Barker_SIAMCSE15
Barker_SIAMCSE15
Karen Pao
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15
Karen Pao
 
Austin_SIAMCSE15
Austin_SIAMCSE15Austin_SIAMCSE15
Austin_SIAMCSE15
Karen Pao
 
Slattery_SIAMCSE15
Slattery_SIAMCSE15Slattery_SIAMCSE15
Slattery_SIAMCSE15
Karen Pao
 
Loffeld_SIAMCSE15
Loffeld_SIAMCSE15Loffeld_SIAMCSE15
Loffeld_SIAMCSE15
Karen Pao
 

More from Karen Pao (7)

LupoPasini_SIAMCSE15
LupoPasini_SIAMCSE15LupoPasini_SIAMCSE15
LupoPasini_SIAMCSE15
 
Druinsky_SIAMCSE15
Druinsky_SIAMCSE15Druinsky_SIAMCSE15
Druinsky_SIAMCSE15
 
Barker_SIAMCSE15
Barker_SIAMCSE15Barker_SIAMCSE15
Barker_SIAMCSE15
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15
 
Austin_SIAMCSE15
Austin_SIAMCSE15Austin_SIAMCSE15
Austin_SIAMCSE15
 
Slattery_SIAMCSE15
Slattery_SIAMCSE15Slattery_SIAMCSE15
Slattery_SIAMCSE15
 
Loffeld_SIAMCSE15
Loffeld_SIAMCSE15Loffeld_SIAMCSE15
Loffeld_SIAMCSE15
 

Recently uploaded

NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 

Recently uploaded (20)

NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 

Dubey_SIAMCSE15

  • 1. Hierarchical Resilience for Structured AMR Anshu Dubey, H. Fujita, Z. Rubenstein, B. Van Straalen and A. Chien SIAM-CSE 2015
  • 2. 1 n+ 2 t 10 2 levellevellevel t t sync sync sync n+1 n t refinement level  Block Structured Local Refinement  Refined regions are organized into logically-rectangular patches  Refined grids are dynamically created and destroyed  Refinement is performed in time as well as in space  The depth of hierarchy depends upon the range of scales  Mostly solve hyperbolics, some elliptic and some PIC The Basics of AMR Resiliency Through Rollback This talk is mostly from the perspective of hyperbolics
  • 3. Rollback and Recovery The first step is to look at the whole application  Traditionally - checkpoint the entire state and reconstruct for restart  Application determined state for transparent reconstruction  Less data to save than system based checkpoint  There have always been other reasons for checkpointing  Batch queue limits  Steering the simulation  Experimenting with parameter tweaking  Our first step was to utilize this capability to continue execution in face of resource degradation  Restart on a fewer number of MPI ranks  We partnered with GVR which saves and versions the state as dictated by the application  Incorporate restart without an abort
  • 4. GVR – Global View Resilience  Two key ideas  Multi-version, multi-stream distributed arrays: preserves application critical data, enables flexible recovery from complex errors with fine-grain, localized manner  Open resilience: cross-layer partnership for error handling, with unified error handling interface Our next step was to see if we can exploit the natural hierarchy of granularity in AMR for differentiated localized recovery and quantify the costs
  • 5. Exploiting Granularities and Encapsulations A Level provides coarsest granularity in the framework Convention – level below is coarser, level above is finer A level knows  All boxes at the level and their layout  The mapping of boxes to processors  Exchanges at the same level  Fine coarse boundaries with the level below  Local time step and the time step of the level below Each level defines data on unions of rectangles  Manages communications within itself  During regridding carries out redistribution of boxes within the level  Also the reconstructs data into newly formed boxes from the level below  A level can do its own restart
  • 7. Level n Level n+1 Level n+2 P0 P1 P2 P3 P4 P5 P6 If we know that a level has no boxes on a processor, that level need not initiate a recovery
  • 8. Level Based Recovery Once an array has been constructed for a level, version increment is sufficient to save the state the next time One wrinkle – versioning applies only as long as there is no regridding For regridding current array has to be destroyed and new one constructed This is one of the todo’s for GVR – allow expansion and contraction during versioning In general versioning is cheaper than constructing a new array We save the Box layout, the data for each box, and in the metadata part a pointer to where the data of each box lies
  • 9. Exploiting Granularities and Encapsulations The next layer of granularity is box within a level  Box with its surrounding halo of ghost cells is a completely defined computational region  Iterators go through a list of local boxes and apply the same operator to them  All boxes of one level have same dx and same dt  All boxes mapped to a processor may not be at the same level  Different boxes on the same processor have different dt and the operator maybe applied to them a different number of times. The way we construct arrays gives us box level granularity for free
  • 10. Failure Scenarios and Recovery Modes We already discussed the node failure  Data corruption, recovery at one level  Read in the metadata to get the pointer to the data of the corrupted patch  Read in the data for corrupted patch  Roll back to the beginning of the level’s time step and re-compute  Meta-data corruption: read in the meta-data  Roll-back may not be needed  Data corruption – recovery at multiple levels  Cascading effect- recovery may be needed in all the finer levels (may not always be needed though)  All finer levels roll-back to the starting timestep of the affected level  May get costlier as affected level gets coarser Still cheaper than global recovery because of no regridding