Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Estudio	
  de	
  la	
  robustez	
  frente	
  a	
  SEUs	
  de	
  
algoritmos	
  auto-­‐convergentes	
  
	
  
Dr.	
  Raoul	
...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   2	
  
1.  RadiaBon	
  effects	
  in	
  ICs	
...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   3	
  
1.  RadiaBon	
  effects	
  in	
  ICs	
...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  context	
  
	
  
	
  
• 	
  Aerospace	
  electronic	
  
systems	
  operate	
  in...
•  The	
  microelectronic	
  technology	
  is	
  constantly	
  changing:	
  
–  higher	
  density,	
  	
  
–  faster	
  de...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  types	
  of	
  faults	
  
RadiaBon	
  and	
  Electronic	
  Devices	
  
	
  
Disp...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  descripBon	
  of	
  SEE	
  
What	
  you	
  always	
  wanted	
  to	
  know	
  abo...
The	
  Physical	
  Mechanism	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  The	
  incident	
  parLcle	
  generates	
  a	
  de...
 
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  classificaBon	
  of	
  SEE	
  
SINGLE EVENT UPSET (SEU): CHANGE OF DATA OF MEMO...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  descripBon	
  of	
  SEE
CROSS SECTION (σ)
.
EVENTS
DEV
N
Part Fluence
σ =
LINEAR...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s	
  
	
  Usually,	
  SEE’s	
  have	
  been	
  associated	
...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s	
  
	
  	
  
	
  
SomeBmes,	
  they	
  appeared	
  withou...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s
	
  	
  
	
  
SomeBmes,	
  they	
  appeared	
  without	
 ...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s
	
  	
  
	
  
SomeBmes,	
  they	
  appeared	
  without	
 ...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s
	
  	
  
	
  
SomeBmes,	
  they	
  appeared	
  without	
 ...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s
	
  
But	
  someBmes,	
  we	
  are	
  a	
  li?le	
  naive...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s
	
  	
  
	
  
•  Fortunately,	
  they	
  are	
  easily	
 ...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s
	
  	
  
	
  
	
  Usually,	
  they	
  had	
  been	
  a	
 ...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s
A	
  nice	
  example…	
  
The	
  birth	
  of	
  a	
  star...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s
	
  	
  
	
  
•  The	
  highest	
  fluency	
  is	
  reache...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s
	
  
	
  
Perhaps,	
  we	
  may	
  believe	
  that	
  we	...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s
	
  
	
  
	
  
–  The	
  call	
  of	
  the	
  Thousand	
 ...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s
	
  
	
  
	
  
–  The	
  call	
  of	
  the	
  Thousand	
 ...
ALWAYS DAMNING THE
PROGRAM DEVELOPPER?
PERHAPS, IT MIGHT HAVE
BEEN AN SEU!!!
Universidad	
  Complutense	
  de	
  Madrid	
 ...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s
	
  	
  
	
  
Why	
  these	
  exoBc	
  phenomena	
  are	
...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s
	
  	
  
	
  
In any case, everybody agrees
with an incre...
1.	
  RadiaBon	
  effects	
  in	
  ICs:	
  sources	
  of	
  SEE’s
Can	
  this	
  background	
  be	
  worse?	
  
Yes,	
  it	...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   28	
  
1.  RadiaBon	
  effects	
  in	
  ICs	...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   29	
  
2.	
  Self-­‐stabilizing	
  algorith...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   30	
  
2.	
  Self-­‐stabilizing	
  algorith...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   31	
  
2.	
  Self-­‐stabilizing	
  distribu...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   32	
  
2.	
  Self-­‐stabilizing	
  algorith...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   33	
  
2.	
  Self-­‐stabilizing	
  algorith...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   34	
  
2.	
  Self-­‐stabilizing	
  algorith...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   35	
  
2.	
  Self-­‐stabilizing	
  algorith...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   36	
  
2.	
  Self-­‐stabilizing	
  algorith...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   37	
  
1.  MoBvaBons	
  
2.  The	
  Self-­‐...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   38	
  
•  First	
  studies	
  on	
  SEUs	
 ...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   39	
  
•  Presented	
  for	
  the	
  first	
...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   40	
  
Fault	
  injecBon	
  mechanism:	
  
...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   41	
  
• 	
  Can	
  be	
  applied	
  to	
  ...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   42	
  
1.  RadiaBon	
  effects	
  in	
  ICs	...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   43	
  
4.	
  LEON3	
  processor	
  
General...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   44	
  
4.	
  LEON3	
  processor:	
  interfa...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   45	
  
4.	
  LEON3	
  processor:	
  specific...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   46	
  
4.	
  LEON3	
  processor:	
  Registe...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   47	
  
Processor control registers:
* Proce...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   48	
  
Non-­‐accessible	
  using	
  
the	
 ...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   49	
  
1.  RadiaBon	
  effects	
  in	
  ICs	...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   50	
  
•  Built around two Virtex-4 FPGAs:
...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   51	
  
Operating conditions:
* The PowerPC ...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   52	
  
Control	
  FPGA	
  
DDR-­‐SDRAM	
  
...
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  2015	
   53	
  
1.  RadiaBon	
  effects	
  in	
  ICs	...
6.	
  SimulaBon	
  of	
  SEUs	
  on	
  the	
  LEON3:	
  
	
  	
  	
  	
  	
  CEU	
  fault-­‐injecBon	
  environment	
  
Fa...
6.	
  SimulaBon	
  of	
  SEUs	
  on	
  the	
  LEON3:	
  
	
  	
  	
  	
  	
  CEU	
  fault-­‐injecBon	
  environment	
  
55...
6.	
  SimulaBon	
  of	
  SEUs	
  on	
  the	
  LEON3:	
  
	
  	
  	
  	
  	
  CEU	
  fault-­‐injecBon	
  environment	
  
Ÿ...
6.	
  SimulaBon	
  of	
  SEUs	
  on	
  the	
  LEON3:	
  
	
  	
  	
  	
  	
  Experiment	
  flowchart	
  
Computer	
   Super...
6.	
  SimulaBon	
  of	
  SEUs	
  on	
  the	
  LEON3:	
  
	
  	
  	
  	
  	
  Preliminary	
  results:	
  	
  target	
  =	
 ...
6.	
  SimulaBon	
  of	
  SEUs	
  on	
  the	
  LEON3:	
  
	
  	
  	
  	
  	
  SW	
  modificaBons	
  
Using of modulo operato...
6.	
  SimulaBon	
  of	
  SEUs	
  on	
  the	
  LEON3:	
  
SEU	
  injecBons	
  on	
  the	
  modified	
  version	
  	
  
Targe...
6.	
  SimulaBon	
  of	
  SEUs	
  on	
  the	
  LEON3:	
  
SEU	
  injecBons	
  on	
  the	
  modified	
  version	
  	
  	
  
T...
6.	
  SimulaBon	
  of	
  SEUs	
  on	
  the	
  LEON3:	
  
Triple	
  Modular	
  Redundancy	
  (TMR)	
  
	
  
Core 1
Core 2
C...
6.	
  SimulaBon	
  of	
  SEUs	
  on	
  the	
  LEON3:	
  
Three-­‐cores	
  fault	
  injecBon	
  results.	
  Target:	
  regi...
#Run # of errors # of timeouts # of converges
100000 85(0.085%) 15(0.015%) 1825(1.825%)
Results of fault injection on thre...
6.	
  SimulaBon	
  of	
  SEUs	
  on	
  the	
  LEON3:	
  
Three-­‐cores	
  fault	
  injecBon	
  results.	
  Target:	
  all	...
1.  RadiaBon	
  effects	
  in	
  ICs	
  
2.  The	
  Self-­‐Stabilizing	
  Algorithm	
  
3.  SEUs	
  in	
  processor-­‐based...
7.	
  Conclusions	
  and	
  future	
  work	
  
•  The sensitivity to SEUs of a self-converging algorithm was studied
•  Fa...
Acknowledgements	
  
•  Dr.	
  Francisco	
  Javier	
  Franco	
  Peláez	
  (UCM)	
  
•  Dr.	
  Juan	
  Antonio	
  Clemente	...
THANK YOU FOR YOUR ATTENTION!
TIME FOR QUESTIONS
Universidad	
  Complutense	
  de	
  Madrid	
  -­‐	
  16th	
  march	
  201...
Upcoming SlideShare
Loading in …5
×

Estudio de la robustez frente a SEUs de algoritmos auto-convergentes

557 views

Published on

Conferencia impartida por D. Raoul Velazco el 16 de marzo de 2015 dentro del ciclo de conferencias de Posgrado de la Facultad de Informa´tica.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Estudio de la robustez frente a SEUs de algoritmos auto-convergentes

  1. 1. Estudio  de  la  robustez  frente  a  SEUs  de   algoritmos  auto-­‐convergentes     Dr.  Raoul  Velazco     Laboratorio  TIMA   Grupo  «ARIS»   Grenoble  -­‐  Francia   h?p://Bma.imag.fr     Laboratorio  PRiSME   Grupo  «SYSCOM»   Universidad  de  Versailles   Saint  QuenBn  les  Yvelines  -­‐  Francia   h?p://www.prism.uvsq.fr/    
  2. 2. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   2   1.  RadiaBon  effects  in  ICs   2.  The  Self-­‐Stabilizing  Algorithm   3.  SEUs  in  processor-­‐based  applicaBons   4.  The  LEON3  processor   5.  The  ASTERICS  test  plaYorm   6.  SimulaBon  of  SEUs  on  the  LEON3   7.  Conclusions   Outline  
  3. 3. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   3   1.  RadiaBon  effects  in  ICs   2.  The  Self-­‐Stabilizing  Algorithm   3.  SEUs  in  processor-­‐based  applicaBons   4.  The  LEON3  processor   5.  The  ASTERICS  test  plaYorm   6.  SimulaBon  of  SEUs  on  the  LEON3   7.  Conclusions   Outline  
  4. 4. 1.  RadiaBon  effects  in  ICs:  context       •   Aerospace  electronic   systems  operate  in  a   radiaBon  environment       •   Charged  parBcles  come  from   three  main  sources:  Van  Allen   Belts,  Cosmic  Rays  &  Solar  Flares     Cosmic rays Protons from solar flares Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   4  
  5. 5. •  The  microelectronic  technology  is  constantly  changing:   –  higher  density,     –  faster  devices,     –  lower  power.     •  These  increase  the  devices’  vulnerability  to  the  effects  of   radiaLon  (not  only  in  nuclear-­‐  space  environments)   •  In  some  applicaLons,  no  failure  is  allowed   •  Advanced  technologies  are  potenLally  sensiLve  to  the  effects  of   atmospheric  neutrons   •  Space  Agencies  favor  the  use  of  COTS  technologies   1.  RadiaBon  effects  in  ICs:  context   Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   5  
  6. 6. 1.  RadiaBon  effects  in  ICs:  types  of  faults   RadiaBon  and  Electronic  Devices     Displacement T.I.D. Accumulated Single Particle S. E. E. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   6  
  7. 7. 1.  RadiaBon  effects  in  ICs:  descripBon  of  SEE   What  you  always  wanted  to  know  about     Single  Event  Effects  (SEE’s)     •  What  are  they?:     One  of  the  result  of  the  interacLon  between  the  radiaLon  and   the  electronic  devices   •  How  do  they  act?:    CreaLng  free  charge  in  the  silicon  bulk  that,  in  pracLcal,   behaves  as  a  short-­‐life  but  intense  current  pulse   •  Which  are  the  ul4mate  consequences?    From  simple  bit-­‐flips  or  noise-­‐like  signals  unLl  the  physical   destrucLon  of  the  device   Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   7  
  8. 8. The  Physical  Mechanism                  The  incident  parLcle  generates  a  dense  track  of  electron  hole  pairs  and   this  ionizaLon  causes  a  transient  current  pulse  if  the  strike  occurs  near  a   sensiLve  volume     1.  RadiaBon  effects  in  ICs:  descripBon  of  SEE’s CHARGE COLLECTION VOLUME Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   8  
  9. 9.   1.  RadiaBon  effects  in  ICs:  classificaBon  of  SEE   SINGLE EVENT UPSET (SEU): CHANGE OF DATA OF MEMORY CELLS MULTIPLE BIT UPSET (MBU): SEVERAL SIMULTANEOUS SEU’s SINGLE EVENT TRANSIENT (SET): PEAKS IN COMBINATIONAL IC’s SINGLE EVENT LATCH-UP (SEL): PARASITIC THYRISTOR TRIGGER FUNCTIONAL INTERRUPTION (SEFI): PHENOMENA IN CRITICAL PARTS AND OTHERS… HARD ERRORS and SOFT ERRORS Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   9  
  10. 10. 1.  RadiaBon  effects  in  ICs:  descripBon  of  SEE CROSS SECTION (σ) . EVENTS DEV N Part Fluence σ = LINEAR ENERGY TRANSFER (LET) SOFT ERROR RATE: PROBABILITY OF AN ERROR AT USUAL CONDITIONS FIT: Typical unit of SER à Probability of 1 ERROR every 109 h E.g.- 180-nm SRAM: 1000-3000 FIT/Mb Some Useful Definitions Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   10  
  11. 11. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s    Usually,  SEE’s  have  been  associated  with  space  missions  because  of   the  absence  of  the  atmospheric  shield…   Cosmic rays Protons from solar flares Unfortunately, our quiet oasis seems to be vanishing since the enemy is knocking on the door… • Alpha particles from vestigial U or Th traces • Atmospheric neutrons and other cosmic rays Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   11  
  12. 12. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s         SomeBmes,  they  appeared  without  a  warning  and,  aher  some  months  and   spending  a  lot  of  money,  the  source  is  detected*.   •  In  1978,  Intel  had  to  stop  a  factory  because  water  was  extracted  from  a  nearby   river  that,  upstream,  is  too  close  to  an  old  uranium  mine.   Alpha Particles * J. F. Ziegler and H. Puchner, “SER – History, Trends and Challenges. A guide for Designing with Memory ICs”, Cypress Semiconductor, USA, 2004.Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   12  
  13. 13. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s       SomeBmes,  they  appeared  without  a  warning  and,  aher  some  months  and   spending  a  lot  of  money,  the  source  is  detected*   •  In  1978,  Intel  had  to  stop  a  factory  because  water  was  extracted  from  a  nearby   river  that,  upstream,  is  too  close  to  an  old  uranium  mine.   Alpha Particles * J. F. Ziegler and H. Puchner, “SER – History, Trends and Challenges. A guide for Designing with Memory ICs”, Cypress Semiconductor, USA, 2004.Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   13  
  14. 14. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s       SomeBmes,  they  appeared  without  a  warning  and,  aher  some  months  and   spending  a  lot  of  money,  the  source  is  detected*.   •  In  1986,  IBM  detected  a  high  rate  of  useless  devices  and  related  it  to  the   phosphoric  acid,  the  bo?les  of  which  were  cleaned  with  a  210P  deionizer  gadget… hundreds  of  kms  far.   Alpha Particles * J. F. Ziegler and H. Puchner, “SER – History, Trends and Challenges. A guide for Designing with Memory ICs”, Cypress Semiconductor, USA, 2004.Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   14  
  15. 15. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s       SomeBmes,  they  appeared  without  a  warning  and,  aher  some  months  and   spending  a  lot  of  money,  the  source  is  detected*.   •  In  1992,  the  problem  came  from  the  use  of  bat  droppings  living  in  cavern  with   traces  of  Th  and  U  to  obtain  phosphorus.   Alpha Particles * J. F. Ziegler and H. Puchner, “SER – History, Trends and Challenges. A guide for Designing with Memory ICs”, Cypress Semiconductor, USA, 2004.Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   15  
  16. 16. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s   But  someBmes,  we  are  a  li?le  naive…   •  Solder  balls  are  usually  made  from  Sn  and  Pb,  which  come  from  minerals  where   there  may  be  uranium  and  thorium  traces.       Nevertheless,  the  designer  forgets  this    detail  and  places    the  solder   balls  too  close  to  cri4cal  nodes!   Alpha Particles Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   16  
  17. 17. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s       •  Fortunately,  they  are  easily  controlled  following  some  simple  rules  during   the  manufacturing  process.      But,  some4mes,  the  enemy  strikes  back!    In 2005, a figure of 2·106 FIT/Mbit was observed in the SRAMs attached to pacemakers where: •  the package had been removed by cosmetic reasons and the solder balls had not been previously purified*. Fortunately, nobody deceased (We cross our fingers). Alpha Particles * J. Wilkinson, IEEE Trans. Dev. Mat. Reliab., 5 (3), pp. 428-433, 2005 Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   17  
  18. 18. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s        Usually,  they  had  been  a  headache  for  the  designers  of  electronics   boarded  in  space  missions…        Here  you  are  some  of  their  pracBcal  jokes*…   • Cassini Mission (1997).- Some information was lost because of MBUs. • Deep Space 1.- An SEU caused a solar panel to stop opening out. • Mars Odyssey (2001).- Two weeks after the launch, alarms went off because some errors lately attributed to an SEU. • GPS satellite network.- One of the satellites is out of work, probably because of a latch-up. Cosmic Rays * B. E. Pritchard, IEEE NSREC 2002 Data Workshop Proceedings, pp. 7-17, 2002 Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   18  
  19. 19. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s A  nice  example…   The  birth  of  a  star,     picture  taken  by     the  Hubble  Telescope   Cosmic Rays Don’t you realise that there is something odd in the picture? Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   19  
  20. 20. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s       •  The  highest  fluency  is  reached  between  15-­‐20  km  of  alBtude.   •  Less  than  1%  of  this  parBcle  rain  reaches  the  sea  level.   •  The  composiBon  has  also  changed…   •  Basically,  neutrons,  muons  and  some  pions   Usually, the neutron flux is referenced to that of New York City, its value been of (in appearance) only 15 n/cm2/h •  This value depends on the altitude (approximately, x10 each 3 km until saturation at 15-20 km). •  And also on latitude, since the nearer the Poles, the higher rate. •  South America Anomaly (SAA), close to Argentina. •  1.5 m of concrete reduces the flux to a half. What a weak foe, really should be we afraid of? Cosmic Rays at Ground Level Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   20  
  21. 21. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s     Perhaps,  we  may  believe  that  we  are  in  a  safe  shelter  but…     –  1992.-­‐  The  PERFORM  system,  used  by  airplanes  to  manage  the   taking-­‐off  manoeuvre  had  to  be  suddenly  replaced  because  of  the   SEUs  in  their  SRAMs*.   –  1998.-­‐  A  study  reported  that,  every  day,  the  1  out  of  10000  SRAMs   a?ached  to  pacemakers  underwent  biYlips**.   This  factor  being  300  Bmes  higher  if  the  paBent  had  taken  an   transoceanic  aircrah.     Cosmics Rays at Ground Level * J. Olsen, IEEE Trans. Nucl. Sci., 1993, 40, 74-77 ** P. D. Bradley, IEEE Trans. Nucl. Sci., 45 (6), 2829-2940 Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   21  
  22. 22. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s       –  The  call  of  the  Thousand  (2000).-­‐  Sun  Unix  server  systems  crashed  in  dozens   of  places  all  over  the  USA  because  of  SEU’s  happening  in  their  cache   memory,  cosBng  several  millions  of  dollars*.   –  In  2003  the  elecBons  in  Belgium  were  realized  simultaneously  in  the   tradiBonal  way  and  in  electronic  way.  A  difference  of  4096  was  find.  Experts   explained  this  difference  as  a  consequence  of  an  SEU**.     –  2005.  Aher  102  days,  the  ASC  Q  Cluster  supercomputer  showed  7170  errors   in  its  81-­‐Gb  cache  memory,  243  of  which  led  to  a  crash  of  the  programs  or   the  operaBng  system***.   Cosmic Rays at Ground Level * Forbes, 2000 ** Chantal Enguehard, Jean-Didier Graton. Electronic Voting: the Devil is in the Details 2008. hal-00274635 *** K. W. Harris, IEEE Trans. Dev. Mat. Reliab., 2005, 5, 336-342 Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   22  
  23. 23. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s       –  The  call  of  the  Thousand  (2000).-­‐  Sun  Unix  server  systems  crashed  in  dozens   of  places  all  over  the  USA  because  of  SEU’s  happening  in  their  cache   memory,  cosBng  several  millions  of  dollars*.   –  2005.-­‐  Aher  102  days,  the  ASC  Q  Cluster  supercomputer  showed  7170  errors   in  its  81-­‐Gb  cache  memory,  243  of  which  led  to  a  crash  of  the  programs  or   the  operaBng  system**.   Cosmic Rays at Ground Level * FORBES, 2000 ** K. W. Harris, IEEE Trans. Dev. Mat. Reliab., 2005, 5, 336-342 Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   23  
  24. 24. ALWAYS DAMNING THE PROGRAM DEVELOPPER? PERHAPS, IT MIGHT HAVE BEEN AN SEU!!! Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   24  
  25. 25. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s       Why  these  exoBc  phenomena  are  appearing  at  lower  and  lower  alBtude?   The present trend is to minimise the typical layout length. This has helped to decrease the sensitive volume but, also, the critical charge does. Most pessimistic simulations show a rock-bottom at 130-180 nm and a sudden increase is expected for more advanced technologies. Cosmic Rays at Ground Level T. Granlund, IEEE Trans. Nuc. Sci., 2003, 50, 2065-2068 Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   25  
  26. 26. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s       In any case, everybody agrees with an increasing error rate in the whole system… And with the increasing sensitivity of the combinational logic devices. Cosmic Rays at Ground Level * R. Baumann, IEEE Trans. Dev. Mat. Reliab., 2005, 5, 305-316 Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   26  
  27. 27. 1.  RadiaBon  effects  in  ICs:  sources  of  SEE’s Can  this  background  be  worse?   Yes,  it  can.  Some  details  may  increase  the  neutron  sensiBvity.     –  Power  supply  values.-­‐  The  lower,  the  more  likely  the  SEU’s   –  Frequency  of  work.-­‐  SEU’s  are  more  dangerous  while  the  system  is  reading   or  wriBng.   –  Presence  of  Boron.-­‐  There  is  an  isotope  of  boron,  10B,  able  to  trap  low   energy  thermal  neutrons  and  release  an  energeBc  alpha  parBcle.         –  AlBtude   10 1 4 7 5 0 2 3B n Liα+ → + Cosmic Rays at Ground Level Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   27  
  28. 28. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   28   1.  RadiaBon  effects  in  ICs   2.  The  Self-­‐Stabilizing  Algorithm   3.  SEUs  in  processor-­‐based  applicaBons   4.  The  LEON3  processor   5.  The  ASTERICS  test  plaYorm   6.  SimulaBon  of  SEUs  on  the  LEON3   7.  Conclusions   Outline  
  29. 29. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   29   2.  Self-­‐stabilizing  algorithms   •  Self-­‐Stabilizing  Algorithms  are  used  for  communicaLons   between  computer  or  sensor  networks     They  are  supposed  to  have  fault  tolerant  capabiliLes   •  Are  there  robust  with  respect  to  soh  errors?     The  ASTERICS  test  plaYorm  was  used  to  simulate  SEUs  by  HW/SW  means   SEU  fault  injecLon  experiments  were  performed  on  the  LEON3  while   execuLng  a  self-­‐converging  applicaLon   •  Final  goal:  idenLfy  sensiLve  resources  and  explore  SW  fault   tolerance  soluLons  for  the  self-­‐stabilizing  algorithm  
  30. 30. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   30   2.  Self-­‐stabilizing  algorithms   •  Defined  by  Edsger  Dijkstra  in  1974   •  Is  a  property  of  distributed  systems:                when  the  system  is  wrongly  iniLalized  or  perturbed,            it  can  automaLcally  go  back  to  a  correct  operaLon            in  a  finite  number  of  calculaLon  steps     •  ApplicaLons:   –  in  «  theorethical  compuLng  science  »  in  domains  where   the  human  intervenLon  for  restarLng  a  system  aeer  a   failure  is  impossible   –  In  computer  networks,  sensor  networks  as  well  as  in   criLcal  systems  such  as  satellites.   Edsger Dijkstra « Testing shows the presence, not the absence, of bugs ! » Edsger Dijkstra
  31. 31. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   31   2.  Self-­‐stabilizing  distributed  algorithms   •  Idea:  a  fault  can  put  the  system  in  any  arbitrary  state     •  From  any  state,  resume  a  normal  behavior  and  remains  in  it     •  Defined  by:   –  Convergence:  the  sytem  eventually  reaches  a  normal   behavior   –  Closure:  when  no  fault  occurs,  the  system  behaves  in  the   intended  manner    
  32. 32. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   32   2.  Self-­‐stabilizing  algorithms  behaviour  
  33. 33. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   33   2.  Self-­‐stabilizing  algorithms:  Self-­‐convergence   •  A fault leads to an arbitrary state •  The algorithm gives a correct answer: –  If the error occurs not too close to the end (e.g. just before return) –  If the error does not modify the data
  34. 34. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   34   2.  Self-­‐stabilizing  algorithms:  Distributed   Shortest  Paths  in  a  graph   •  Given:   –  A  weighted  graph  G  defined  by  its  matrix  (an  array)  and  its   size  (an  integer)       •  Computes:   –  shortest  paths  from  any  node  i  to  node  0   •  Mimics  the  behavior  of  distributed  self-­‐stabilizing  algorithm  
  35. 35. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   35   2.  Self-­‐stabilizing  algorithms:  Distributed  Shortest   Paths  in  a  graph  (cnt’d)   •  Any  node  i  knows   –  Its  distance  lij  to  any  neighbor  j     •  Node  0  knows  it  is  the  sink   –  So  its  distance  to  itself  is  0,  and   the  shortest  path  is  to  remain   on  0     –  Once  no  computaLon  can   modify  d,  di  is  the  distance  from   i  to  0  and  nexti  is  the  next  step   on  the  shortest  path  from  i  to  0.     If(i=0) di:=0 nexti:= 0 else di:=min{lij+dj} nexti:=argmin{lij+dj} // with j neighbor of i endif   « The shortest path in a graph is never the one we think, it can come from nowhere and, most of the time, it does not exist » Edsger Dijkstra
  36. 36. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   36   2.  Self-­‐stabilizing  algorithms:  Self-­‐convergent   shortest  paths   b=c=1 T= NxN matrix Matrix T represents a graph. Nodes i and j are D= Nx1 matrix connected by an edge of length T(i,j) while(b||c) { c=b; The distance between node I and 0 is Di=min(Tij+Dij) b=0; D[0]=0; for(i=1; i<N; i++) { m = VERY LARGE; for(j = 0; j<N; j++) { if(m>=D[j]+T[N*i+j]) m=D[j]+T[N*i+j]; } if(D[i]!=m) b=1; D[i]=m; } }
  37. 37. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   37   1.  MoBvaBons   2.  The  Self-­‐Stabilizing  Algorithm   3.  SEUs  in  processor-­‐based  applicaBons   4.  The  LEON3  processor   5.  The  ASTERICS  test  plaYorm   6.  SimulaBon  of  SEUs  on  the  LEON3   7.  Conclusions   Outline  
  38. 38. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   38   •  First  studies  on  SEUs  were  done  end  of  60s     •  They  strictly  considered  space  applicaLons   •  ICs  issued  from  advanced  manufacturing  processes  are  sensiLve   to  thermal  neutrons  present  in  the  Earth’s  atmosphere  even  at   the  ground  level   •  Processor  and  memories  embed  significant  number  of  SEU   targets     •  ApplicaLons  for  which  soe  errors  may  have  criLcal  consequences   must  be  evaluated  with  respect  to  SEUs   3. SEUs in processor-based applications
  39. 39. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   39   •  Presented  for  the  first  Lme  in  2000   •  EsLmates  the  number  of  parLcles  required  to  obtain  an  observable  event  on  an   applicaLon  by  combining  fault  injecLon  and  accelerated  test  results   •  Provide  data  on  system’s  sensiLvity  at  a  early  stage  of  the  development   •  How  to  do  that?   1.  Calculate  the  probability  for  a  fault  to  provoke  an  error  on  the  applicaLon     2.  Obtain  the  staLc  cross-­‐secLon  (literature  or  measurements)     3.  Obtain  the  system  error  rate       *    R.  Velazco,  S.  Rezgui,  R.  Ecoffet,  “PredicLng  Error  Rate  for  Microprocessor-­‐Based  Digital  Architectures  through  C.E.U.  (Code   EmulaLng  Upsets)  InjecLon”,  IEEE  TransacLon  of  Nuclear  Science,  Vol.  47,  No.  6,  Dec.  2000,  pp.  2405-­‐2411.   faultsinjected errorsnapplicatio INJ ⋅⋅ ⋅⋅ = # # τ fluency memoryionconfigurattheinerrors SEU ⋅⋅⋅⋅⋅ = # σ τστ INJSEUPRED *= 3.  SEUs  in  processor-­‐based  applicaBons:       The  CEU  method  
  40. 40. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   40   Fault  injecBon  mechanism:   •  Faults  are  injected  using  an  external  interrupLon  of  the  processor   •  Bit-­‐flip  target  using  the  instrucLon  set    =>  The  accuracy  of  the  method  depends  on  the  number  of  accessible  memory   elements  compared  to  the  total  number  of  memory  cells  embedded  in  the  DUT     3.  SEUs  in  processor-­‐based  applicaBons:       The  CEU  method  
  41. 41. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   41   •   Can  be  applied  to  any  processor  :   –  In  HW  version   –  Implemented  in  an  FPGA     •   SEU  targets  are    memory  cells  accessible  though  the  instrucBon  set:   –  Registers   –  Special  funcLon  registers  (SP,  PC,….)   –  Internal  SRAM   –  Cache  memory   –  …   •   CEU  codes  strongly  depend  on  the  studied  processor’s  architecture   and  instrucBon  set   3.  SEUs  in  processor-­‐based  applicaBons:       The  CEU  method  
  42. 42. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   42   1.  RadiaBon  effects  in  ICs   2.  The  Self-­‐Stabilizing  Algorithm   3.  SEUs  in  processor-­‐based  applicaBons   4.  The  LEON3  processor   5.  The  ASTERICS  test  plaYorm   6.  SimulaBon  of  SEUs  on  the  LEON3   7.  Conclusions   Outline  
  43. 43. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   43   4.  LEON3  processor   Generalities: LEON3 is a synthesizable VHDL model Ÿ  32-bit processor compliant with the SPARC V8 architecture Main features: Ÿ  7-stage pipeline Ÿ  High-performance, fully pipelined IEEE-754 FPU Ÿ  Separate instruction and data cache (Harvard architecture) Ÿ  AMBA-2.0 AHB bus interface Ÿ  Symmetric Multi-processor support (SMP) Ÿ  Up to 125 MHz in FPGA and 400 MHz on 0.13 µm ASIC technologies Ÿ  Fault-tolerant and SEU-proof version available for space applications Ÿ  High Performance: 1.4 DMIPS/MHz, 1.8 CoreMark/MHz (gcc -4.1.2) Ÿ  Free: http://www.gaisler.com/
  44. 44. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   44   4.  LEON3  processor:  interfaces  and  peripherals  
  45. 45. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   45   4.  LEON3  processor:  specificiBes   •  The LEON3 processor does not have a unique Stack Pointer (SP) register like in typical processors •  The LEON3 is organized around a system of 8 ‘windows’. Each window provides a separate register environment •  A function call or an interruption provoke a window switch •  input registers of window Wn become output registers of window Wn+1 and Wn+1 receives a new set of local and out registers •  Each window has its own pointer stored in o6 (out register)
  46. 46. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   46   4.  LEON3  processor:  Register  file   • 136 General purpose registers 8 global registers + 128 window registers • Only 32 accessible at any time by an instruction: - 8 global registers (g0 to g7) - 24 window registers 8 in registers (i0 to i7) 8 local registers (l0 to l7) 8 out registers (o0 to o7)
  47. 47. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   47   Processor control registers: * Processor State Register (PSR) * Current Window Pointer (CWP) * Window Invalid Mask (WIM) * Program Counters (PC & nPC) User application registers and memories: * Register file 136 General purpose registers 8 global registers + 128 window registers. Program Counter (PC) and next Program Counter (nPC) are special registers in the interrupt Window * Data and Instruction caches They are both configurable caches, (associativity, size…) Our data cache is 1Kb direct mapped Our Instruction cache is 1Kb direct mapped 4.  LEON3  processor:  accessible  SEU-­‐targets    
  48. 48. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   48   Non-­‐accessible  using   the  instrucLon  set   Accessible  using   the  instrucLon  set   LEON3  integer  unit   4.  LEON3  processor:            Accessible  and  non-­‐accessible  registers  
  49. 49. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   49   1.  RadiaBon  effects  in  ICs   2.  The  Self-­‐Stabilizing  Algorithm   3.  SEUs  in  processor-­‐based  applicaBons   4.  The  LEON3  processor   5.  The  ASTERICS  test  plaYorm   6.  SimulaBon  of  SEUs  on  the  LEON3   7.  Conclusions   Outline  
  50. 50. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   50   •  Built around two Virtex-4 FPGAs: •  Control FPGA: XC4VFX60 •  Chipset FPGA: XC4VLX40 •  Use of the PowerPc embedded in the FPGA for controlling the tester •  Up to 1GB of DDR-SDRAM for the Control FPGA •  Compact Flash memory used to store the FPGA configuration and the PowerPC instruction code. •  Up to 180 IOs available for connecting the Device Under Test (DUT) to the tester via a high-speed connector •  The DUT can access to 32Mb of SRAM memory and 512Mb of DDR-SDRAM •  The configuration of the chipset FPGA is managed by the control FPGA •  Tester remotely controlled via a 10/100/1000 Ethernet link 5.  ASTERICS  (Advanced  System  for  the  TEst   under  RadiaBon  of  IC  and  Systems)  
  51. 51. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   51   Operating conditions: * The PowerPC embedded in the Control FPGA runs at 300MHz * DUT frequency up to 200MHz * Available IO voltages: 3.3V, 2.5V, 1.8V, 1.5V, 1.2V Typical target DUTs (Device Under Test): * Advanced digital processors up to 64bits * Memories (SRAM, DRAM, etc …) * Mixed analog/digital circuits (ADC, DAC, SoC, …) * MEMs (potential upgrade depending on the specs) 5.  ASTERICS  characterisBcs  
  52. 52. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   52   Control  FPGA   DDR-­‐SDRAM   for  the  PowerPC   Ethernet  link   DUT  Connector   Chipset  FPGA   DUT  DDR-­‐SDRAM   DUT  SRAM   5.  ASTERICS  characterisBcs  
  53. 53. Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   53   1.  RadiaBon  effects  in  ICs   2.  The  Self-­‐Stabilizing  Algorithm   3.  SEUs  in  processor-­‐based  applicaBons   4.  The  LEON3  processor   5.  The  ASTERICS  test  plaYorm   6.  SimulaBon  of  SEUs  on  the  LEON3   7.  Conclusions   Outline  
  54. 54. 6.  SimulaBon  of  SEUs  on  the  LEON3:            CEU  fault-­‐injecBon  environment   Fault injection mechanism Ÿ Faults are injected using an external interruption of the processor Ÿ Bitflip target is selected using the instruction set 54 Experimental results can be used to predict the application error-rate •  The accuracy of the error-rate prediction method depends on the number of accessible memory elements compared to the total number of memory cells embedded in the DUT Universidad  Complutense  de  Madrid  -­‐  16th  march  2015  
  55. 55. 6.  SimulaBon  of  SEUs  on  the  LEON3:            CEU  fault-­‐injecBon  environment   55 Ÿ Hardware setup: PC + ASTERICS + Power supply Ÿ No DUT board : Chipset FPGA used as DUT Ÿ ASTERICS memory : LEON3 code & data Ÿ Functions embedded in Chipset FPGA: - Shared-memory controller (allow access by the CP and by the Leon3) - Supervisor (control the experiment LEON3 and its peripherals) Ÿ LEON3 application: a benchmark Self-stabilizing algorithm Comm.   FPGA   LEON3  +  Peripherals   Shared-­‐ memory   controller   Supervisor  Memory   Ethernet   link   ASTERICS   Chipset  FPGA   Power   supply   Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   55
  56. 56. 6.  SimulaBon  of  SEUs  on  the  LEON3:            CEU  fault-­‐injecBon  environment   Ÿ  Store the injection vectors: instant, target, register, bit mask Ÿ  Start the execution of the LEON3 application Ÿ  Generate the interruption according to the instant vector Ÿ  Detect normal end of application Ÿ  Compare the obtained results with the expected results and count the errors. Ÿ  Deal with timeouts: there are 3 type of timeouts - Boot timeout: when the boot sequence does not finish - ASTERICS timeout: when the running application does not finish - Computer timeout: when the supervisor does not work properly or the ASTERICS stops responding Expected   end   Fault   injecLon   ASTERICS   Lmeout   Computer   Lmeout   Boot   Lmeout   Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   56
  57. 57. 6.  SimulaBon  of  SEUs  on  the  LEON3:            Experiment  flowchart   Computer   Supervisor   LEON3   IniLalize  shared-­‐ memory   Generate  injecLon   vectors   Store  injecLon  vectors   Send  init.  Memory   command   ApplicaLon  run   Generate  interrupt   Fault  injecLon   rouLne   Detect  end  of  execuLon   or  generate  Lmeout   Send  Read  Memory   command   Send  results   Compare  results  with   reference   Fault injection rate: 1 SEU/2 sec Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   57
  58. 58. 6.  SimulaBon  of  SEUs  on  the  LEON3:            Preliminary  results:    target  =  register  file   Self  converging  algorithm:   b=c=1! N= 16! T= NxN matrix! D= Nx1 matrix! while(b||c){ ! c=b; ! b=0; ! D[0]=0; ! for(i=1; i<N; i++){ ! m = BIGNUMBER; ! for(j = 0; j<N; j++) { ! If(m>=D[j]+T[N*i+j])! ! m=D[j]+T[N*i+j]; ! } ! if(D[i]!=m) ! ! b=1;! D[i]=m;! } !! }! Test # Inj. Faults Result errors Timeout Silent Run limit 1 130577 204 (0.15 %) 32143 (24.6 %) 219 1,5 2 199550 324 (0.16 %) 49478 (24.8 %) 384 1,5 3 15068 1709 (11.3 %) 992 (6.6 %) 28 5 4 14264 1614 (11.3 %) 900 (6.3 %) 0 8 5 8007 887 (11,07 %) 508 (6.3 %) 17 16 Preliminary  Results  of  fault  injecBon  experiments   Variable Observed errors recoverable i timeouts yes j timeouts yes m errors and timeouts yes D errors and timeouts no T errors and timeouts no b timeouts yes c timeouts yes SensiBvity  of  the  program  variables   •  During Tests 1 and 2 were detected very few errors but high number of timeouts •  Self-converging requires more than 1.5 x 336 ms (the nominal time) to converge •  Tests 3, 4 and 5 proved that timeouts masked result errors: => a suitable timeout limit is higher than 5 times the nominal execution time 58
  59. 59. 6.  SimulaBon  of  SEUs  on  the  LEON3:            SW  modificaBons   Using of modulo operator « % » when calling an array, i.e. m=D[j%16]+T[((N*(i%16))+(j%16))%256]; Specifying for every variable a register in the register file by using the following « C » instruction Goal: reduce the number of used registers register unsigned int variable asm ("register name"); Initialize the variables b and c with 8 bits number instead of « 1 » to avoid a bitflip that make them equal to « 0 » Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   59
  60. 60. 6.  SimulaBon  of  SEUs  on  the  LEON3:   SEU  injecBons  on  the  modified  version     Target=register  file     •  The running limit set to be 5 times the time required for the application to end execution without fault injection •  The erronoeus decrease from 11.3% to 4.45% •  The timeouts decrease from 6.6% to 2.6% #Runs # errors # timeouts # converges 8000 356 (4.45%) 208 (2.6%) 2972 (37.15%) Results of fault injection on the modified source code Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   60
  61. 61. 6.  SimulaBon  of  SEUs  on  the  LEON3:   SEU  injecBons  on  the  modified  version       Target:  other  ressources     Zone   #  of  runs   # of errors   # of timeouts   Inst. cache   12174   107(0.88%)   385(3.16%)   Data cache   12348   547(4,42%)   0 (0%)   Multi-resources   88410   2196 (2.48%)   1415(1.6%)   Results of fault injection in new resources •  Data and instruction caches are also very sensitive to SEUs. They both can be accessed by the CEU through the load and store instructions •  A fault injection campaign was performed on each of the caches, while the LEON3 executed the modified algorithm •  The last campaign was performed on all the resources at the same time (2075 registers of 32 bits each): - Register file - PC and nPC - Instruction cache - Data cache •  Running limit was set to 5   Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   61
  62. 62. 6.  SimulaBon  of  SEUs  on  the  LEON3:   Triple  Modular  Redundancy  (TMR)     Core 1 Core 2 Core 3 TMR Error Timeout Converge •  A TMR was emulated : 3 LEON3 cores executing simultaneously the same self- convergent algorithm •  The comparison was done in the external PC •  SEUs can hit, one two or three cores in one simulation •  The executable is the modified self- convergence algorithm •  The TMR results will be: –  Error:  if  there  are  two  errors,  or  one  error  and  a   Lmeout   –  Timeout:  if  two  Lmeouts  occur   –  Converge:  if  the  self  converging  algorithm   converge  in  at  least  one  of  the  cores,  with  a   correct  result   Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   62
  63. 63. 6.  SimulaBon  of  SEUs  on  the  LEON3:   Three-­‐cores  fault  injecBon  results.  Target:  register  file     •  The running limit is set to be 5 times the time required for the application to end execution without fault injection •  In 17.73 % of the simulations the self-converging algorithm converges to correct results •  The error rate decreases from 4.45% to 0.64% •  The timeouts decrease from 2.6% to 0.18% #Runs # errors # timeouts #converges 42543 276(0.64%) 77 (0.18%) 7543 (17.73%) Results of fault injection on three cores processor Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   63
  64. 64. #Run # of errors # of timeouts # of converges 100000 85(0.085%) 15(0.015%) 1825(1.825%) Results of fault injection on three cores for all resources •  The running limit is set to be 5 •  In 1.825 % of the simulations the self converging algorithm converges to correct results •  The erronoeus results decrease from 2.48% to 0.085% •  The timeouts decrease from 1.6% to 0.015% Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   6.  SimulaBon  of  SEUs  on  the  LEON3:   Three-­‐cores  fault  injecBon  results.  Target:  all   ressources   64
  65. 65. 6.  SimulaBon  of  SEUs  on  the  LEON3:   Three-­‐cores  fault  injecBon  results.  Target:  all   ressources   2618 2 1 1 1DC/1IC 2DC 1DC/1RF 1DC/1nPC 1DC/1PC 48 double SEUs 9 1 15 1 9 1 1 2IC/1DC 2DC/1PC 2DC1IC 3IC 3DC 1DC/1IC/1PC 2DC/1RF 37 triple SEUs Distribution of errors on all resources Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   65
  66. 66. 1.  RadiaBon  effects  in  ICs   2.  The  Self-­‐Stabilizing  Algorithm   3.  SEUs  in  processor-­‐based  applicaBons   4.  The  LEON3  processor   5.  The  ASTERICS  test  plaYorm   6.  SimulaBon  of  SEUs  on  the  LEON3   7.  Conclusions   Outline   Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   66
  67. 67. 7.  Conclusions  and  future  work   •  The sensitivity to SEUs of a self-converging algorithm was studied •  Fault injection experiments were performed on a benchmark self- converging program executed by a Leon3 processor implemented on an FPGA •  The CEU (Code Emulated Upsets) approach was adopted to perform SEU fault injection experiments using ASTERICS test platform was used •  Obtained results show the fault tolerance and “Achilles Hails” of the studied program •  Different versions were explored. The one implementing a TMR was immune to SEUs and quite robust with respect to MBU. SEUs in the voter were not injected •  In futur work new versions of self-converging algorithms will be implemented in a Network on Chip to perform radiation ground testing Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   67
  68. 68. Acknowledgements   •  Dr.  Francisco  Javier  Franco  Peláez  (UCM)   •  Dr.  Juan  Antonio  Clemente  (UCM)   •  Dr.  Devan  Sohier  (Prisme,  Univ.  de  Versailles)   •  Dr.  Alain  Bui  (Prisme,  Univ.  de  Versailles)   •  Dr.  Greicy  Costa  (TIMA  Lab.)       Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   68
  69. 69. THANK YOU FOR YOUR ATTENTION! TIME FOR QUESTIONS Universidad  Complutense  de  Madrid  -­‐  16th  march  2015   69

×