SlideShare a Scribd company logo
1 of 20
Silent error resilience in
numerical time-stepping schemes
Austin Benson
arbenson@stanford.edu
Stanford University
ICME Colloquium, Jan. 26 2015
Joint work with
Sven Schmit, Stanford
Rob Schreiber, HP Labs
code + data: http://stanford.edu/~arbenson/silent.html
paper: Intl. J. of High Performance Computing Applications, 2014
1
 Computer systems are getting bigger and more complicated.
 Software systems are getting bigger and more complicated.
 Pushing energy limits.
 Things break. 2
What breaks?
 Hardware wears out
 Bit flips from cosmic rays
 Data races and other software bugs
 Firmware bugs
Silent errors are errors in application state that
have escaped low-level error detection.
3
What can we do?
 Checkpoint/restart: Occasionally save state of
system. If error is detected, restart.
Does not scale. How to detect errors?
 Other ABFT: Clever algorithms that address these
issues for particular algorithms.
 This work: Error detection for iterative
computation in general, numerical time-stepping
schemes in particular.
4
Spot the error!
5
At time step 120, multiplied single entry in
right-hand-side of Crank-Nicolson and
Backward Euler linear solves by 0.995. 6
General algorithm:
 “Base method” generates sequence B1, B2, …
 “Auxiliary method” generates sequence A1, A2, …
 If Di = ||Bi – Ai|| is abnormal, possible error
7
Base method:
high-order numerical integration scheme:
Runge-Kutta 5
Auxiliary method:
lower-order scheme: Runge-Kutta 4
Difference:
Di = |Bi – Ai|
Re-purposing an old idea for step-size control
[Fehlberg, 1969], [Dormand and Prince, 1980]
8
Key idea: re-use data
RK 1/2 scheme for u’ = f(t, u)
Second-order
scheme has
error O(h^3)
No extra function evaluations.
Provides O(h^2) check.
9
Key idea: re-use data
Implicit solve
that is stable
Explicit solve checks.
It is OK that the explicit solve may be unstable. (Why?) 10
e.g., Backward Euler
e.g., Forward Euler
 Backward/Forward Euler
 Richardson/Crank-Nicolson
 Runge-Kutta 1/2, 2/3, 4/5
 Adams-Bashforth linear multistep method 2/3, 4/5
 Explicit check on implicit scheme
 Extrapolation
Lots of these checks for
numerical time-stepping algorithms…
11
Exercise in step detection (change point detection)
Algorithmic details in the paper. Main parameters:
Relative jump
Variance change
12
Experimental setup:
 Solve heat equation for T time steps and
artificially inject error at one time step.
 Do this many times with different
types of errors.
 True positive rate:
#(real errors detected) / #(trials)
 False positive rate:
#(non-errors “detected”) / #(time steps)
13
Are large errors easier to detect?
Local truncation error (LTE)-normalized error
Output when no fault is injected.
Output when fault is injected.
14
Error injection:
Multiply single entry of RHS
in linear solves by
z ~ N(1, 5e-5) at a single
time step
15
Error injection:
Multiply q(x, t) at one
discrete x by z ~ N(1, 0.1)
at a single time step
16
Takeaways
17
 We have a general framework for detecting silent errors.
 Numerical integration is our central application.
 We detect large errors more easily.
 Not too many false positives.
 How many silent errors are there? How worried should we be?
 Do we need systems solutions or algorithmic solutions? Both?
 “Defense in depth” is good. But how easy are ABFT methods to
incorporate into existing solvers?
Resilience: what do we need to discuss?
18
Silent error resilience in
numerical time-stepping schemes
Austin Benson
arbenson@stanford.edu
Stanford University
ICME Colloquium, Jan. 26 2015
Joint work with
Sven Schmit, Stanford
Rob Schreiber, HP Labs
code + data: http://stanford.edu/~arbenson/silent.html
paper: Intl. J. of High Performance Computing Applications, 2014
19
Tardy error detection
20

More Related Content

Similar to Silent error resilience in numerical time-stepping schemes

Application Fault Tolerance (AFT)
Application Fault Tolerance (AFT)Application Fault Tolerance (AFT)
Application Fault Tolerance (AFT)Daniel S. Katz
 
Polyspace CETIC presentation
Polyspace CETIC presentationPolyspace CETIC presentation
Polyspace CETIC presentationcponsard
 
Approximation and error
Approximation and errorApproximation and error
Approximation and errorrubenarismendi
 
Approximation and error
Approximation and errorApproximation and error
Approximation and errorrubenarismendi
 
V center operations enterprise standalone technical presentation
V center operations enterprise standalone technical presentationV center operations enterprise standalone technical presentation
V center operations enterprise standalone technical presentationsolarisyourep
 
TMPA-2017: 5W+1H Static Analysis Report Quality Measure
TMPA-2017: 5W+1H Static Analysis Report Quality MeasureTMPA-2017: 5W+1H Static Analysis Report Quality Measure
TMPA-2017: 5W+1H Static Analysis Report Quality MeasureIosif Itkin
 
Numerical analysis using Scilab: Error analysis and propagation
Numerical analysis using Scilab: Error analysis and propagationNumerical analysis using Scilab: Error analysis and propagation
Numerical analysis using Scilab: Error analysis and propagationScilab
 
Integrating Model Checking and Procedural Languages
Integrating Model Checking and Procedural LanguagesIntegrating Model Checking and Procedural Languages
Integrating Model Checking and Procedural Languagesbutest
 
Orthogonal array approach a case study
Orthogonal array approach   a case studyOrthogonal array approach   a case study
Orthogonal array approach a case studyKarthikeyan Rajendran
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection철 김
 
Lecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsLecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsVivek Bhargav
 
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...Automatic Differentiation and SciML in Reality: What can go wrong, and what t...
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...Chris Rackauckas
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data ScientistsAjay Ohri
 
A Machine Learning approach to predict Software Defects
A Machine Learning approach to predict Software DefectsA Machine Learning approach to predict Software Defects
A Machine Learning approach to predict Software DefectsChetan Hireholi
 
Process Synchronization -1.ppt
Process Synchronization -1.pptProcess Synchronization -1.ppt
Process Synchronization -1.pptjayverma27
 
Theory and Design for Mechanical Measurements solutions manual Figliola 4th ed
Theory and Design for Mechanical Measurements solutions manual Figliola 4th edTheory and Design for Mechanical Measurements solutions manual Figliola 4th ed
Theory and Design for Mechanical Measurements solutions manual Figliola 4th edDiego Fung
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral ResearchPo-Ting Wu
 

Similar to Silent error resilience in numerical time-stepping schemes (20)

Adsa u1 ver 1.0
Adsa u1 ver 1.0Adsa u1 ver 1.0
Adsa u1 ver 1.0
 
Application Fault Tolerance (AFT)
Application Fault Tolerance (AFT)Application Fault Tolerance (AFT)
Application Fault Tolerance (AFT)
 
Polyspace CETIC presentation
Polyspace CETIC presentationPolyspace CETIC presentation
Polyspace CETIC presentation
 
Approximation and error
Approximation and errorApproximation and error
Approximation and error
 
Approximation and error
Approximation and errorApproximation and error
Approximation and error
 
V center operations enterprise standalone technical presentation
V center operations enterprise standalone technical presentationV center operations enterprise standalone technical presentation
V center operations enterprise standalone technical presentation
 
TMPA-2017: 5W+1H Static Analysis Report Quality Measure
TMPA-2017: 5W+1H Static Analysis Report Quality MeasureTMPA-2017: 5W+1H Static Analysis Report Quality Measure
TMPA-2017: 5W+1H Static Analysis Report Quality Measure
 
Numerical analysis using Scilab: Error analysis and propagation
Numerical analysis using Scilab: Error analysis and propagationNumerical analysis using Scilab: Error analysis and propagation
Numerical analysis using Scilab: Error analysis and propagation
 
Integrating Model Checking and Procedural Languages
Integrating Model Checking and Procedural LanguagesIntegrating Model Checking and Procedural Languages
Integrating Model Checking and Procedural Languages
 
Orthogonal array approach a case study
Orthogonal array approach   a case studyOrthogonal array approach   a case study
Orthogonal array approach a case study
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Lecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithmsLecture 5: Asymptotic analysis of algorithms
Lecture 5: Asymptotic analysis of algorithms
 
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...Automatic Differentiation and SciML in Reality: What can go wrong, and what t...
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
 
A Machine Learning approach to predict Software Defects
A Machine Learning approach to predict Software DefectsA Machine Learning approach to predict Software Defects
A Machine Learning approach to predict Software Defects
 
Testing
TestingTesting
Testing
 
Numerical Method
Numerical Method Numerical Method
Numerical Method
 
Process Synchronization -1.ppt
Process Synchronization -1.pptProcess Synchronization -1.ppt
Process Synchronization -1.ppt
 
Theory and Design for Mechanical Measurements solutions manual Figliola 4th ed
Theory and Design for Mechanical Measurements solutions manual Figliola 4th edTheory and Design for Mechanical Measurements solutions manual Figliola 4th ed
Theory and Design for Mechanical Measurements solutions manual Figliola 4th ed
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
 

More from Austin Benson

Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)Austin Benson
 
Spectral embeddings and evolving networks
Spectral embeddings and evolving networksSpectral embeddings and evolving networks
Spectral embeddings and evolving networksAustin Benson
 
Computational Frameworks for Higher-order Network Data Analysis
Computational Frameworks for Higher-order Network Data AnalysisComputational Frameworks for Higher-order Network Data Analysis
Computational Frameworks for Higher-order Network Data AnalysisAustin Benson
 
Higher-order link prediction and other hypergraph modeling
Higher-order link prediction and other hypergraph modelingHigher-order link prediction and other hypergraph modeling
Higher-order link prediction and other hypergraph modelingAustin Benson
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsAustin Benson
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsAustin Benson
 
Higher-order link prediction
Higher-order link predictionHigher-order link prediction
Higher-order link predictionAustin Benson
 
Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionAustin Benson
 
Three hypergraph eigenvector centralities
Three hypergraph eigenvector centralitiesThree hypergraph eigenvector centralities
Three hypergraph eigenvector centralitiesAustin Benson
 
Semi-supervised learning of edge flows
Semi-supervised learning of edge flowsSemi-supervised learning of edge flows
Semi-supervised learning of edge flowsAustin Benson
 
Choosing to grow a graph
Choosing to grow a graphChoosing to grow a graph
Choosing to grow a graphAustin Benson
 
Link prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structureLink prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structureAustin Benson
 
Higher-order Link Prediction GraphEx
Higher-order Link Prediction GraphExHigher-order Link Prediction GraphEx
Higher-order Link Prediction GraphExAustin Benson
 
Higher-order Link Prediction Syracuse
Higher-order Link Prediction SyracuseHigher-order Link Prediction Syracuse
Higher-order Link Prediction SyracuseAustin Benson
 
Random spatial network models for core-periphery structure
Random spatial network models for core-periphery structureRandom spatial network models for core-periphery structure
Random spatial network models for core-periphery structureAustin Benson
 
Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.Austin Benson
 
Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionAustin Benson
 
Simplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusionsSimplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusionsAustin Benson
 
Sampling methods for counting temporal motifs
Sampling methods for counting temporal motifsSampling methods for counting temporal motifs
Sampling methods for counting temporal motifsAustin Benson
 
Set prediction three ways
Set prediction three waysSet prediction three ways
Set prediction three waysAustin Benson
 

More from Austin Benson (20)

Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)Hypergraph Cuts with General Splitting Functions (JMM)
Hypergraph Cuts with General Splitting Functions (JMM)
 
Spectral embeddings and evolving networks
Spectral embeddings and evolving networksSpectral embeddings and evolving networks
Spectral embeddings and evolving networks
 
Computational Frameworks for Higher-order Network Data Analysis
Computational Frameworks for Higher-order Network Data AnalysisComputational Frameworks for Higher-order Network Data Analysis
Computational Frameworks for Higher-order Network Data Analysis
 
Higher-order link prediction and other hypergraph modeling
Higher-order link prediction and other hypergraph modelingHigher-order link prediction and other hypergraph modeling
Higher-order link prediction and other hypergraph modeling
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting Functions
 
Hypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting FunctionsHypergraph Cuts with General Splitting Functions
Hypergraph Cuts with General Splitting Functions
 
Higher-order link prediction
Higher-order link predictionHigher-order link prediction
Higher-order link prediction
 
Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link prediction
 
Three hypergraph eigenvector centralities
Three hypergraph eigenvector centralitiesThree hypergraph eigenvector centralities
Three hypergraph eigenvector centralities
 
Semi-supervised learning of edge flows
Semi-supervised learning of edge flowsSemi-supervised learning of edge flows
Semi-supervised learning of edge flows
 
Choosing to grow a graph
Choosing to grow a graphChoosing to grow a graph
Choosing to grow a graph
 
Link prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structureLink prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structure
 
Higher-order Link Prediction GraphEx
Higher-order Link Prediction GraphExHigher-order Link Prediction GraphEx
Higher-order Link Prediction GraphEx
 
Higher-order Link Prediction Syracuse
Higher-order Link Prediction SyracuseHigher-order Link Prediction Syracuse
Higher-order Link Prediction Syracuse
 
Random spatial network models for core-periphery structure
Random spatial network models for core-periphery structureRandom spatial network models for core-periphery structure
Random spatial network models for core-periphery structure
 
Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.Random spatial network models for core-periphery structure.
Random spatial network models for core-periphery structure.
 
Simplicial closure & higher-order link prediction
Simplicial closure & higher-order link predictionSimplicial closure & higher-order link prediction
Simplicial closure & higher-order link prediction
 
Simplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusionsSimplicial closure and simplicial diffusions
Simplicial closure and simplicial diffusions
 
Sampling methods for counting temporal motifs
Sampling methods for counting temporal motifsSampling methods for counting temporal motifs
Sampling methods for counting temporal motifs
 
Set prediction three ways
Set prediction three waysSet prediction three ways
Set prediction three ways
 

Recently uploaded

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 

Recently uploaded (20)

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 

Silent error resilience in numerical time-stepping schemes

  • 1. Silent error resilience in numerical time-stepping schemes Austin Benson arbenson@stanford.edu Stanford University ICME Colloquium, Jan. 26 2015 Joint work with Sven Schmit, Stanford Rob Schreiber, HP Labs code + data: http://stanford.edu/~arbenson/silent.html paper: Intl. J. of High Performance Computing Applications, 2014 1
  • 2.  Computer systems are getting bigger and more complicated.  Software systems are getting bigger and more complicated.  Pushing energy limits.  Things break. 2
  • 3. What breaks?  Hardware wears out  Bit flips from cosmic rays  Data races and other software bugs  Firmware bugs Silent errors are errors in application state that have escaped low-level error detection. 3
  • 4. What can we do?  Checkpoint/restart: Occasionally save state of system. If error is detected, restart. Does not scale. How to detect errors?  Other ABFT: Clever algorithms that address these issues for particular algorithms.  This work: Error detection for iterative computation in general, numerical time-stepping schemes in particular. 4
  • 6. At time step 120, multiplied single entry in right-hand-side of Crank-Nicolson and Backward Euler linear solves by 0.995. 6
  • 7. General algorithm:  “Base method” generates sequence B1, B2, …  “Auxiliary method” generates sequence A1, A2, …  If Di = ||Bi – Ai|| is abnormal, possible error 7
  • 8. Base method: high-order numerical integration scheme: Runge-Kutta 5 Auxiliary method: lower-order scheme: Runge-Kutta 4 Difference: Di = |Bi – Ai| Re-purposing an old idea for step-size control [Fehlberg, 1969], [Dormand and Prince, 1980] 8
  • 9. Key idea: re-use data RK 1/2 scheme for u’ = f(t, u) Second-order scheme has error O(h^3) No extra function evaluations. Provides O(h^2) check. 9
  • 10. Key idea: re-use data Implicit solve that is stable Explicit solve checks. It is OK that the explicit solve may be unstable. (Why?) 10 e.g., Backward Euler e.g., Forward Euler
  • 11.  Backward/Forward Euler  Richardson/Crank-Nicolson  Runge-Kutta 1/2, 2/3, 4/5  Adams-Bashforth linear multistep method 2/3, 4/5  Explicit check on implicit scheme  Extrapolation Lots of these checks for numerical time-stepping algorithms… 11
  • 12. Exercise in step detection (change point detection) Algorithmic details in the paper. Main parameters: Relative jump Variance change 12
  • 13. Experimental setup:  Solve heat equation for T time steps and artificially inject error at one time step.  Do this many times with different types of errors.  True positive rate: #(real errors detected) / #(trials)  False positive rate: #(non-errors “detected”) / #(time steps) 13
  • 14. Are large errors easier to detect? Local truncation error (LTE)-normalized error Output when no fault is injected. Output when fault is injected. 14
  • 15. Error injection: Multiply single entry of RHS in linear solves by z ~ N(1, 5e-5) at a single time step 15
  • 16. Error injection: Multiply q(x, t) at one discrete x by z ~ N(1, 0.1) at a single time step 16
  • 17. Takeaways 17  We have a general framework for detecting silent errors.  Numerical integration is our central application.  We detect large errors more easily.  Not too many false positives.
  • 18.  How many silent errors are there? How worried should we be?  Do we need systems solutions or algorithmic solutions? Both?  “Defense in depth” is good. But how easy are ABFT methods to incorporate into existing solvers? Resilience: what do we need to discuss? 18
  • 19. Silent error resilience in numerical time-stepping schemes Austin Benson arbenson@stanford.edu Stanford University ICME Colloquium, Jan. 26 2015 Joint work with Sven Schmit, Stanford Rob Schreiber, HP Labs code + data: http://stanford.edu/~arbenson/silent.html paper: Intl. J. of High Performance Computing Applications, 2014 19

Editor's Notes

  1. & u_t = \frac{1}{100}u_{xx} + 0.1\left(\sin(2\pi t) + \cos(2\pi x)\right) \nonumber \\ & t \in [0, 2], x \in [0, 1] \nonumber \\ & u(x, 0) = x(x-1) \nonumber \\ & \Delta x = 1 / 160, \Delta t = 1 / 100 \nonumber
  2. & u_t = \frac{1}{100}u_{xx} + 0.1\left(\sin(2\pi t) + \cos(2\pi x)\right) \nonumber \\ & t \in [0, 2], x \in [0, 1] \nonumber \\ & u(x, 0) = x(x-1) \nonumber \\ & \Delta x = 1 / 160, \Delta t = 1 / 100 \nonumber
  3. & \textcolor{blue}{k_1^{B}} = f(t_n, u_n^{B}) \nonumber \\ & u^{B}_{n+1} = u_n^{B} + hf\left(t_n + h/2, u_n^{B} + h\textcolor{blue}{k_1^{B}}/2\right) \nonumber \\ & \\ & u_{n+1}^{A} = u_n^{B} + h\textcolor{blue}{k_1^{B}} \nonumber \\ & \\ & D_{n+1} = \| u_{n+1}^{A} - u_{n+1}^{B} \|
  4. & AU^{B}_{n+1} = \textcolor{blue}{U^{B}_{n}} \nonumber \\ & \\ & U^{A}_{n+1} = B\textcolor{blue}{U^{B}_{n}} \nonumber \\ & \\ & D_{n+1} = \| U^{B}_{n+1} - U^{A}_{n+1} \| \nonumber
  5. & D_{n+1} = \| B_{n+1} - A_{n+1} \|_{\infty} \\ & J_{n+1} = \frac{D_{n+1} - D_n}{D_n} \\ & V_{n+1} = \frac{\text{Var}(D_{n-p+1}, \ldots, D_{n+1})}{\text{Var}(D_{n-p}, \ldots, D_{n})}
  6. L_i = \frac{\| B_i - \hat{B}_i \|}{\| \hat{B}_i - \hat{A}_i \|} \approx \frac{\text{Difference caused by error}}{\text{local truncation error}}
  7. $u_t = 0.001u_{xx} + (1 - \sqrt{1 - 4(t - t^2)}) / (2 - 2t)$ $u(x, 0) = 6|x - 1/2| - 3$
  8. $u_t = 0.01u_{xx} + q(x, t)$, \quad $q(x, t) = xe^{-t/2}$ $u(x, 0) = 4x(x-1)(x-2)$