Documente Academic
Documente Profesional
Documente Cultură
! ! ! !
Hardware redundancy
add extra hardware for detection or tolerating faults add extra software for detection and possibly tolerating faults extra information, i.e. codes extra time for performing tasks for fault tolerance
Software redundancy
Information redundancy
Time redundancy
Page: 1
Fault Tolerance
! ! ! !
Page: 2
Error Detection
!
ideal check
determined solely from specification complete, correct check should be independent from system
acceptable check
cost reasonable check, e.g. monitor rate of change performed by system on system components e.g. power-up diagnostics
diagnostics
Page: 3
Damage Confinement
! ! ! !
error might propagate and spread identify boundaries to state beyond which no information exchange has occurred dynamically => hard statically => e.g. fire wall
Page: 4
Error Recovery
!
backward recovery
requires checkpoints
most frequently used recovery overhead try to make state error-free need accurate assessment of damage highly application-dependent
forward recovery
Page: 5
Fault Treatment
! !
on-line, no manual intervention, (automatic) dynamic system reconfiguration spare (hot or cold)
Page: 6
Fault Coverage
!
! !
recovery implies that the system as a whole is operational this does not imply that a repair occurred e.g. duplex system with benign fault can recover to continue operation on one non-faulty processor
Page: 7
Hardware Redundancy
!
Passive (static)
uses fault masking to hide occurrence of fault no action from the system is required e.g. voting uses comparison for detection and/or diagnoses remove faulty hardware from system => reconfiguration combine both approaches masking until diagnostic complete expensive, but better to achieve higher reliability
Active (dynamic)
Hybrid
Page: 8
parallelism
TMR (Triple Modular Redundancy) Voter: is single point of failure. V could be very simple, but who guards the guard?
Page: 9
Replicate voters V V V
Restoring Organ: since it produces 3 correct outputs even if one input is faulty. eliminate single point of failure
Page: 10
Page: 11
Voting
!
I1 I2 I3
2007 A.W. Krings
Flux Summing
! !
Inherent property of closed loop control system If one module becomes faulty, remaining modules compensate automatically.
Page: 13
Out C Agree
Page: 14
Page: 15
Stand-by-sparing
cold spares
Page: 16
Johnson 1989
Page: 17
duplication combined with compare & spare 2 modules are always on-line 2-of-N switch pairs are often combined
Page: 18
Johnson 1989
Page: 19
N active + S spare modules (off-line) voting and comparison replace erroneous module from spare pool maintains N constant uses N-of-(N+S) switch hybrid solution => N = 4 passive solution => N = 5
Page: 20
Johnson 1989
Page: 21
Self-purging NMR
Page: 22
Johnson 1989
Page: 23
Triple-Duplex
redundant self checking each node is really 2 modules + comparator self-disable in event of error simulate benign behavior triple-triplex used in Boeing 777 primary flight computer
Page: 24
Page: 25