P200

BugFix
EWSD
Reliability
Contents Reliability concept Reliability analysis Component reliability Spare parts Hardware reliability analysis Software reliability estimation Reliability data Total system downtime
Telecommunication systems have to work 24 hours a day, 365 days a year, with a negligible downtime. Therefore they have to have an extremely high level of availability/reliability. EWSD has been designed with stringent requirements for reliability and quality of service (QoS).
Concept & design
design & development
Test
Field
Reliabil
ity
Design development
Function & system test Function tests Stress tests
Operation & maintenance Scheduled maintenance Unscheduled maintenance Repair
e.g. Performance Requirements
System specification "detailed" Reliability analysis - reliability - availability - maintainability
Maintainability tests
Reliabil
Design decisions (redundancy)
ity
e.g. Performance "Rough" analysis
Verification of parameters & data collection
Data collection
Design review
Quality Information System Reliability evaluation Reliability evaluation
Reliability evaluation
Reliability evaluation
Reliability assurance program Questions of reliability play an important role in the development. A reliability assurance program was put in place in order to achieve and maintain such a high level of reliability. This program ensures the quality and reliability by continuous monitoring from the beginning of development through modifications and during field operation. Rough reliability analyses are performed in the concept and design phase to verify the system design and to support system design decisions. During design and development, detailed reliability analyses are performed in order to verify the system design, to optimize the maintenance strategies, and to provide customers with reliability models and results. During the test phase reliability data are collected and reliability-related tests are performed. These tests are used to check the related functions and to verify the model parameters used. A Quality Information System (QUIS) is established in order to monitor, measure and understand the reliability and quality performance in operation. For this purpose, quality and reliability data are collected from systems worldwide and evaluated continuously. EWSD network nodes are in service in more than 100 countries, e.g. USA, Brazil, China, Indonesia, Egypt, South Africa, Portugal, Germany, and 300 telephone administrations. EWSD is well-known for its high reliability and high availability all over the world. In the USA, for example, EWSD is the network node with the best total downtime performance based on 1998 FCC ARMIS data. Reliability objectives Reliability measurements are defined to provide a high standard of service reliability. The reliability measurements and their objectives consider two aspects: the subscriber's/users point of view requires a high reliability and availability of services the service provider's point of view is to carry out limited maintenance and repair work Major reliability measurements are, for example, the total system downtime, the line or trunk downtime, the premature release probability, the incorrect charging probability, but also the number of maintenance actions, or the circuit pack return rate. The different reliability measurements are calculated theoretically by means of reliability modeling and they are recorded and evaluated in operation as well. Typical in-service objectives:
In-service performance (all causes of failure) Total system downtime Single termination downtime SS7 link downtime Premature release probability Repair actions (hardware) 3 minutes per year 30 minutes per year 82 minutes per year 2 x 10-5 5 per 1,000 ports & year
The theoretically calculated reliability measurements are basically used to evaluate the expected reliability of the system and to give the service provider an idea of reliability. We have defined essential requirements that cover most of the known requirements from Telcordia/Bellcore, ITU, and specific customers.
Reliability concept
The reliability concept comprises: Hardware Software Hardware In designing the system, particular attention was paid to achieving the highest possible reliability and availability of the system. This is achieved by full redundancy of all central hardware components of the system. The coordination processor (CP) works as an n+1 redundant multiprocessor and all other central units are duplicated: Common memory (CMY) Input/output processors (IOP) Disks Message buffer (MB) Switching network (SN) Signaling system network control (SSNC) Central clock generator (CCG) The digital line unit (DLU) is internally duplicated and operates according to the loadsharing principle.
All units are monitored by safeguarding programs to ensure that errors are dealt with immediately without impairing operation of the system. Hardware faults are detected and localized automatically. When a faulty unit has been localized, it is disconnected and an alarm is sent. The standby unit of redundant equipment is put into service by fast service recovery functions.
Coordination processor 113E (CP113E) The CP113E is a multiprocessor system. All critical units are duplicated to improve system availability. This ensures that an outage on only one unit (a single error) will never cause outage of the CP113E. To achieve this, the faulty component is immediately localized and replaced by a redundant component as soon as it has been isolated. In many cases, outage of more than one component can also be tolerated without causing outage of the CP. For example, the following devices are duplicated: Base processor (BAP) Input/output control (IOC) Common memory (CMY) ATM bridge processor (AMP) Pool redundancy guarantees sufficient availability of the call processors (CAP). Critical peripheral devices (e.g. magnetic disk device, MDD) are also duplicated. They are connected to different IOCs by different input/output processors (IOP). In the same manner, the periphery is connected to the CP by one or more pairs of IOPs for the message buffer (MB). All BAPs, CAPs, IOCs and AMPs are connected to the two CMYs. One of the two BAPs is BAP-master and the other is BAP-spare. If the BAP-master becomes unavailable, the current BAP-spare automatically becomes BAPmaster. The BAP-spare usually functions as a CAP and is part of the n+1 pool redundancy of the CAPs. IOC and IOP pairs work in load-sharing mode. If one component of these pairs becomes unavailable, its function is performed by the remaining redundant component without any effect on service. Two AMPs are provided as ATM bridge processors (AMPC) to the SSNC via op-
RSU
HTI
SN
DLU
LTG
LTG
SSNC
MB
NetManager
CP
CCG
CP113E redundancy architecture SSNC 0 Master/slave redundancy Pool redundancy (n+1) BAP0 master BAP1 slave Active/ Standby AMP0 AMP1 1 0 1
Load sharing redundancy IOP:MB0 IOP:MB0
IOP:MB7
IOP:MB7
IOPs
IOPs
CAP0
CAPn
IOC0
IOC1
B:CMY0
B:CMY1
CMY0 Active/Standby-redundancy
CMY1
tical fiber interfaces. They work in active/standby redundancy. Central parts of the processor boards and the CMY are internal duplicated in order to detect failures safely and immediately. Message buffer (MB) and switching network (SN) The message buffer (MB) is fully duplicated. The channels are through- connected on both switching network (SN) sides by the associated message buffers MB0 and MB1. As regards the control information to the SN, both MBs operate in active/active mode. The message channels to the LTGs are connected via both SN halves. Although each MB can process the entire data
flow to the LTGs, the MBs operate in load-sharing mode. If one MB becomes unavailable, the MB that is still active takes over all the traffic to the LTGs. The SN is also fully duplicated. One SN half is active and the other is on hot-standby. All connections are set up in parallel and on the same path in both SN halves. If an error occurs in the active SN half, the CP initiates changeover to the standby SN half.
Active/hot-standby redundancy SN0 SNMUX 0 SNMAT SNMUX 0 SN1 SNMAT
Redundancy structure SN/MB
LTG
LTG
SNMUX 15
SNMUX 15
Active/active load-sharing redundancy SSNC MB0 MB0 SSNC
Active/active load-sharing redundancy
IOP:MB
IOP:MB
IOP:MB
IOP:MB
Digital line unit (DLU) and line/trunk group (LTG) The DLU provides the interface for all subscriber lines to the EWSD system. To meet a high standard of reliability, all central parts of the DLU (DLUC) are duplicated except the subscriber line modules (SLM) and circuits used for external alarms and line testing.
Both DLUCs operate in load-sharing mode. If one DLUC or a connected LTG fails, calls established via the failed DLUC are lost and the call handling capacity is reduced. The subscribers have to re-establish their calls, which are now routed via the DLUC that is still in operation. Because of the high reliability of the LTG hardware the reliability requirements for trunk terminations can be met without redundancy. Nevertheless it is advisable to distribute the trunks of a trunk group over several LTGs in order to provide maximum service availability.
Load-sharing redundancy
SN1 SN0
SLMs
DLUC0
LTG
Subscriber lines
SLMs
DLUC0
LTG
SLMs DLU Non-redundant
LTG Trunks LTG
Signaling system network control (SSNC) The redundant structure of the SSNC ensures that no single failure in the central parts will cause a loss of signaling traffic. The SSNC consists of two duplicated active/active ATM switching planes ASN0 and ASN1. Connected to the ATM switching network (ASN), and communicating via it, are: Main processor for administration and maintenance (MP:OAM) A duplicated MP:OAM is running in micro synchronous active/hot-standby redundancy with 2 redundant magnetic disk devices, MDD0 and MDD1 (active/active redundancy), a magneto-optical disk, MOD (not redundant), a local alarm interface (ALI) and a LAN interface. Main processor for signaling manager (MP:SM) A duplicated MP:SM for SS7 signaling manager is running in micro synchronous active/hot-standby redundancy. Main processor for signaling link termination (MP:SLT) Several duplicated MP:SLTs are running in micro synchronous active/hot-standby redundancy. Main processor for global title translation (MP:GTT) Several duplicated MP:GTTs are running in micro synchronous active/hot-standby redundancy. Main processor for number portability (MP:NP) Several duplicated MP:NPs are running in micro synchronous active/hot-standby redundancy. Main processor for IP interface (MP:IP) Several duplicated MP:IP are running in micro synchronous active/hot-standby redundancy. Up to 2 cross-linked ber optics to the MBD One cross-linked ber optics to the CP (AMP) Line interface card (LIC) Several pairs of E1 LICs working in active/standby redundancy.
SSNC LIC LIC SS7 links ASN0
LIC LIC
LIC MP:SLT
LIC MP:GTT
LIC MP:NP
LIC MP:IP
LIC MP:SM
LIC MP:OAM
LIC AMP
MBD
The MP:OAM is the central operation, administration and maintenance (OAM) processor of the system with redundant active/active MDDs for software versions and semi-permanent data. An MOD is provided without redundancy for upgrade purposes and for storing snapshots of the signaling system network control (SSNC) system. The alarm interface module (ALI) shows up to 16 external alarms locally. The signaling system No. 7 (SS7) is managed via the MP:SM. Several MP:SLTs, MP:NPs, MP:GTTs are provided. They communicate directly via the optical fiber interfaces via the message buffer (MB) with LTGs connected to DLUs for SEP traffic.
Clock distribution The central clock generator (CCG) generates the clock for the EWSD system, synchronizes it to the externally applied reference frequencies, and distributes it to the subsequent equipment. The clock pulse is distributed in four levels: I. MB:GCG, II. SN:GCG, III. LTG:GCG, IV. DLU:GCG Each level consists of one or more parallel clock generators which receive the clock of the next higher level as a reference clock. The CCG is duplicated. One CCG is active and the other CCG is on standby and in synchronism with the clock of the active CCG. Each CCG is supplied with two external reference clocks. If the active CCG or both reference clocks fail, changeover to the standby CCG takes place immediately without loss of synchronization. The SSNC has a separate clock distribution system which is also completely redundant. The inputs to it are connected to the primary CCG.
R1 R4
CCG0
MB0 GCG
SN0 GCG
LTG GCG
DLU GCG0
MB1 R3 R2 CCG1 GCG
SN1 GCG
LTG GCG GCG1
Active / Standby redundancy CP1 CP0
LTG GCG
SSNC ASN0 ACCG ACCG ACCG LIC1 LIC0
ASN1 ACCG ACCG ACCG
MPU0
MPU1
Software In terms of adaptability, system modification and system expansion, the EWSD software is modular in structure and functional in organization. An essential quality attribute for the software is the software reliability. Software reliability comprises the following aspects: technical correctness, completeness consistency, integrity, error prevention protection against failure, minimization of error propagation error neutralization mechanism (recovery) analysis and correction of software errors robustness against overload Technical correctness and completeness are achieved by means of inspections, reviews and tests. Thus the whole development process from system definition to field operation is controlled by a quality management system (ISO 9000). Consistency, integrity, and error prevention are achieved, for example by the following measures: le protection against unauthorized access periodic consistency checks for data special security measures to prevent multiple access to data special security measures for data modication validity and consistency checks for data transferred at the interface checksum procedures for monitoring data and program code corrective audits of critical data
CP recovery level New start System function CP Initialization of - processes - stack - process data - heap System reset; CP load from disk LTGs loaded Periphery
Protection against failure and minimization of error propagation are achieved by the following measures, for example: division of program code and data into separate link modules, which are likewise stored in separate memory areas memory protection for program code and for semipermanent data duplication of system les and user les monitoring the real-time response of programs monitoring system performance The aim of recovery is to neutralize an error in such a way that switching operation is either not impaired at all or only slightly. Central and peripheral recovery are divided up into recovery levels for this purpose. The individual recovery levels initiate specific recovery actions, which quickly and effectively restore the system to service.
Customer effect
- Calls and connections maintained; - Some calls in setup released; - Call-charge data retained
Initial start
- Switched connection released; - Nailed-up connections re-established; - Call-charge data retained Depends on recovery level initiating basic operation (New start or initial start) - Switched connections released; - Nailed-up connections re-established; - Call-charge data restored
Basic operation Initial start with last generation
Initialization of vital processes, stack, data; non-essential processes inhibited; non-essential OA&M functions inhibited Reload last generation from disk LTGs loaded
SSNC recovery level Local recovery of an MP platform (FULLREC, code only) Local recovery of an MP platform (code & data) Basic operation System-wide ecovery (LOADREC2) Initial start with last generation
System function CP Initialization of - processes - stack - process data - heap Platform reset; MP load from disk LIC or ACCG is loaded (code & data) Periphery
Customer effect
- Any failed MP:SM links are restored; - Nailed-up connections re-established;
- Any failed MP:SM links are restored; - Nailed-up connections re-established Depends on recovery level initiating basic operation (local recovery only with code or with code and data) - All links are interrupted and restored; - Nailed-up connections re-established. - All links are interrupted and restored; - Nailed-up connections re-established.
Initialization of vital processes, stack, data; Non-essential processes inhibited; Non-essential OA&M functions inhibited System reset; MP:SA load from disk Reload last generation from disk All ACCGs and LICs are loaded (code & data) All ACCGs and LICs are loaded (code & data)
Most of the software errors occurring on CP or MPs are neutralized by the affected process itself if possible. The few remaining software errors are neutralized by the first recovery level (new start of all processes not relevant for call processing on CP). To ensure that the called recovery level clears an error completely the system supervises the run time of the recovery checks whether further errors occur while recovery is running, and checks whether further errors occur again after recovery within a supervision time. If one of the checks indicates that the called recovery level was not successful, the next-higher recovery level is started. If the lowest level of Initial Start recovery is not successful either, basic call processing is started. This system status known as "basic operation" allows a reduced process set to be activated, which guarantees that the basic call processing functions are maintained. In this way, errors in areas of software that are not concerned with call processing are masked out. Fault symptom files for problem analysis and a remotecontrolled software correction system are used for analysis and correction of software errors.
Robustness against overloads is achieved by means of an overload protection procedure. The overload protection procedures use a step-by-step load rejection strategy. The procedure is designed so that it can differentiate between short-term load peaks, which may be tolerated without any overload protection measure, and long term overloads.
Reliability analysis
The reliability analysis comprises: Component reliability Spare parts Hardware reliability analysis Software reliability estimation Component reliability Overall component reliability is based on the reliability of the various items of hardware (resistors, capacitors, ICs, etc.). The failure rates of the components used are calculated on the basis of Siemens norm SN29500. SN29500 contains failure rates of the components for reference conditions and methods of considering dependencies of the failure rates at operating conditions. SN29500 complies with IEC 1709 Electronic Components Reliability Reference conditions for failure rates and stress models for conversion. The basis for SN29500 is the worldwide field experience in Siemens products, detailed service and repair statistics, components test etc. The mean failure rate of circuit boards is in the range of 2,000 to 6,000 FIT (failure in 109 hours) or an MTBF of 60 to 20 years.
Spare parts Network nodes are systems in which component failures and thus also device failures can be expected and which therefore require corrective maintenance. This gives rise to a certain demand for spare parts and the need to maintain stocks of such parts (in this case spare modules) either by repairing faulty modules or by ordering new ones from time to time from the manufacturer. The failure rates of the individual modules and the number of modules installed can be used to calculate the probability of a certain number of module failures occurring within a particular period.
The cumulative poisson distribution is used for calculating the required number of spare modules. Essential customer-specific parameters for this calculation are the required service continuity probability and the turnaround time, which is defined as the interval between the time when a replacement is ordered and the time when the replacement is received. The spare parts requirements are calculated individually for each project. Hardware reliability analysis Reliability analysis and modeling are an integral part of the development process. Reliability block diagrams and state transition diagrams are used for hardware reliability modeling. The models consider all aspects of the system that affect its reliability, for example the ability of the system to detect faults, the ability to identify a faulty unit and isolate it, or the frequency of periodic diagnosis. Hardware failure rates of all components are predicted at an early stage of the development process. All predictions are based on the Siemens norm SN29500 for component failure rate calculations. The reliability block diagram shows the simplified reliability block diagram relating to the total system downtime. Reliability block diagrams of this kind are created for each specified reliability measurement such as total system downtime, single termination outage, or SS7 link downtime.
99.9% service continuity prob. turnaround time: 1 month 35 Required spare modules 30 25 20 15 10 5 0 1000 2000 3000 4000 FIT FIT FIT FIT Failure rate of module 5000 1000 s 100 ule 5000 od fm FIT ro be m nu
CP113E incl. clock CCG BAP CMY AMP IOC/IOP MB
SN/MB SN
CCG
BAP
CMY
AMP
IOC/IOP
MB
SN
SSNC SSNC
SSNC
The individual subsystems are modeled by Markov modeling techniques. The reliability model of the subsystems (Markov model) shows all failure, detection, recovery and repair actions relevant for the reliability of the subsystem. Examples are: failure in both sides of the system uncovered faults failure on the active system side during repair of the other side non-redundant operation of the system due to periodic automatic diagnostics of the redundant side and the effect of a failure during this period
The reliability analysis finishes with the computation of the defined reliability measurements and verification of the measurements with the requirements. Additionally, the effect of the choice of parameter value on the resulting reliability measurement is analyzed in order to determine the optimized system parameters. Examples are the minimum necessary fault detection probability or the optimized period for periodic diagnosis of redundant components. During system integration testing, dedicated test steps are used to verify the reliability structure and the correctness of the reliability models. For this purpose, hardware faults are inserted to study how the system behaves if a hardware fault occurs. Parameters used in the hardware reliability models, such as switchover performance or fault detection probability, are evaluated statistically.
Example of a typical Markov model for a redundant system 1 Normal operation of redundant units routine Routing diagnosis switchover 9 c DOWN up to end of diag. 10 2 dia (1-d) c undetected failure on the standby 2 routine unit SIMPLEX undetected fault 8 c SIMPLEX travel to the site 2 c DOWN uncoverage fault 6 remote repair SIMPLEX on-site repair 3 c repair DOWN uncoverage fault 7 repair SIMPLEX on-site repair 11 c repair dia (d+c)c failure on one of the redundant units r (1-c)c 2 m (1-r)(1-c)c
travel repair travel + repair
travel
DOWN travel to the site 4 dia c d r
travel
DOWN on-site repair 5
DOWN 2nd failure during repair 12
c m repair travel routine
failure rate of common parts failure rate of minor faults repair rate travel rate Z routing rate
diagnosis rate coverage probability detection probability remote repair probability
10
Software reliability estimation Reliability of software used in telecommunication networks is a crucial determinant of network reliability. Software reliability estimation is an important element of software reliability management. In particular, software reliability estimations guide the system testing process and decisions on release of software. Software errors are errors of logic, not of equipment - it is therefore possible to achieve 100 % reliability for small programs. The average size of program modules in the software does not as a rule exceed 1,000 statements - they can therefore be regarded as small modules.
Software reliability models, which assume that the cumulative number of software errors increases exponentially to an asymptotic value, are applied to evaluate the software reliability and quality. With the aid of the software reliability model it is possible to estimate the number of errors in a software product and to estimate the testing time and testing resources required to reach a predefined quality level.
Cumulative failures total failures found to date *)
failure intensity
*) fixed failure (cumulative) fixed Prio1 failure (cumulative) failure intensity objective test time still unfixed Prio1 failure still unfixed failure 0 test time test time Qualittskriterien fr Meilenstein B600: kein Vorkommen nicht korrigierter Prio 1-Fehler 0 test time
*)
additional test time required
On the other hand, a software error is defined as a departure of program operation from the specification, caused by a software problem. Because of the different characteristics and effects of software errors, and because of the error-tolerant software architecture, only an infinitely small proportion of software errors affect the reliability of the system. The downtime due to software errors is basically dependent on the frequency and duration of the recovery level affecting the service capability of the switch. For the estimation of the software reliability of a new system version, detailed recovery statistics from versions worldwide in the field, recovery statistics from the test bed, and recovery runtime estimations and measurements are used. Worldwide statistics show that the share of software-related failure in the total system downtime is approximately 1 min/year on average.
Software reliability increases with testing time as error corrections are made in response to failures. During the test phase, weekly progress reports on error finding and error fixing activities are provided. These reports are used as measurements of software quality during the test phase. The measurements are compared with defined software quality objectives at given milestones.
11
Reliability data
Total system downtime The total system downtime amounts on average to less than 1 hour in 20 years (3 min/year) for hardware and software failures. This corresponds to an overall system availability of 99.9994 percent-plus. Hardware: The MTBF of the system due to hardware faults has been calculated at more than 600 years. The mean accumulated downtime is calculated in the range of 0.01 to 0.05 min/year, depending on the assumed repair time.
Software: Field performance measurements show that less than one software error in 3 years requires a recovery level affecting the service capability of the system longer than 30 seconds. In this case the service capability can be restored in approx. 3 minutes on average. Thus the share of software-related failure in the total system downtime is less than 1 min/year on average.
SSNC CCG CP113E MB SN SSNC
CCG
CP113E
MB
SN
SSNC
MTBF Downtime
33 848 years
4 672 years
6 371 years
1 044 years
0.0009 min/year
0.0034 min/year
0.0026 min/year
0.0039 min/year
MTBFtotal =
1 1 MTBFi
= 736 years
Total System Downtime = Downtimei = 0.0177 min/year
MTTR = 0.5 h 0 h travel time 0.5 h repair time MTBF years CCG CP113E small large SN/MB small large SSNC small large Total (small) Total (large) 1 030 1 044 671 736 0.0088 0.0039 0.0177 0.0099 3 384 6 371 0.0050 0.0026 5 136 4 672 0.0030 0.0034 33 848 Downtime min/ year 0.0009
MTTR = 2 h 1.5 h travel time 0.5 h repair time MTBF years 33 511 Downtime min/ year 0.0036
MTTR = 4 h 3 h travel time 1 h repair time MTBF years 33 073 Downtime min/ year 0.0072
5 059 4 600
0.0052 0.0059
4 941 4 488
0.0089 0.0101
3 229 6 209
0.0093 0.0045
3 035 6 006
0.0184 0.0080
1 020 1 031 659 726
0.0107 0.0059 0.0288 0.0163
1 006 1 013 643 711
0.0155 0.0113 0.0500 0.0294
12
Trunk downtime The unavailability encountered by an individual trunk depends on failure of the central equipment as described above and failures in the peripheral equipment. The estimates show that the mean accumulated intrinsic downtime (MAIDT) for an individual termination will be less than 15 minutes per year for hardware and software faults. Thus the relevant ITU recommendation (Q.541) requiring less than 30 minutes per year is met comfortably. Due to the full redundancy of all central equipment, the unavailability of an individual termination is determined by the non-redundant parts (LTG).
Hardware: The MTBF for an individual trunk due to hardware faults has been calculated at more than 23 years. The mean accumulated downtime is calculated in the range of 2 to 14 min/year, depending on the assumed repair time and depending on used LTG (LTGN or LTGP). Software: Field performance measurements show that less than 0.5 software errors per year per LTG require a recovery level causing service interruption to the trunks directly connected, with a duration of about 2 minutes on average.
Total outage SN LTGP SN LTG access MB MB
MTBF Downtime
16 years 1.872 min/year
1 718 years 0.011 min/year
671 years
0.018 min/year
MTBFtotal =
1 1 MTBFi
= 16 years
Trunk Downtime = Downtimei = 1.901 min/year
MTTR = 0.5 h 0 h travel time 0.5 h repair time MTBF years System large SN/MB (LTG access) LTG LTGP (4 x LTG func.) LTGN Trunk (LTGP) Trunk (LTGN) 16 31 16 29 1.872 1.392 1.901 1.421 16 31 15 29 671 1718 Downtime min/ year 0.018 0.011
MTTR = 2 h 1.5 h travel time 0.5 h repair time MTBF years 659 1384 Downtime min/ year 0.029 0.032
MTTR = 4 h 3 h travel time 1 h repair time MTBF years 643 1091 Downtime min/ year 0.050 0.095
6.444 3.648 6.505 3.709
16 31 15 29
12.886 6.789 13.031 6.934
13
Subscriber line downtime The unavailability encountered by an individual subscriber line depends on failure of the central equipment as described above and failures in the peripheral equipment. The estimates show that the mean accumulated intrinsic downtime (MAIDT) for an individual termination will be less than 15 minutes per year for hardware and software faults. Thus the relevant ITU recommendation (Q.541) requiring less than 30 minutes per year is meet comfortably.
Due to the full redundancy of all central equipment, the unavailability of an individual termination is determined by the non-redundant parts (SLM). Hardware: The MTBF for an individual subscriber line has been calculated at more than 5 years. The mean accumulated downtime is calculated in the range of 3 to 13 min/year, depending on the assumed repair time. Software: The probability that both DLU controls or the associated LTGs will fail at the same time is negligible.
Total outage DLUG SLMA DLUG LTGP LTGP SN MB
LTG access SN MB
MTBF
19 years
39 years
1 718 years
671 years
Downtime
1.575 min/year
0.766 min/year
0.011 min/year
0.018 min/year
MTBFtotal =
1 1 MTBFi
= 13 years
Trunk Downtime = Downtimei = 2.362 min/year
MTTR = 0.5 h 0 h travel time 0.5 h repair time MTBF years System large SN/MB (LTG access) DLUG-LTGP (includes loadsharing failure modes) SLM SLMA (32 lines) SLMD (16 lines) Analog line ISDN line 671 1718 39 Downtime min/ year 0.018 0.011 0.766
MTTR = 2 h 1.5 h travel time 0.5 h repair time MTBF years 659 1384 39 Downtime min/ year 0.029 0.032 3.066
MTTR = 4 h 3 h travel time 1 h repair time MTBF years 643 1091 39 Downtime min/ year 0.050 0.095 6.133
19 14 13 10
1.575 2.078 2.362 2.865
19 14 12 10
2.912 4.686 6.026 7.801
19 14 12 10
5.823 9.372 12.081 15.630
Copyright (C) Siemens AG 2001 Issued by Information and Communications Group Hofmannstrae 51 D-81359 Mnchen Technical modifications possible. Technical specifications and features are binding only insofar as they are specifically and expressly agreed upon in a written contract. Order Number: A30828-X1160-P200-1-7618 Visit our Website at: http://www.siemens.com
14

P200

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

P200

Încărcat de

Drepturi de autor:

Formate disponibile

BugFix

Concept & design

design & development

Function & system test Function tests Stress tests

Operation & maintenance Scheduled maintenance Unscheduled maintenance Repair

e.g. Performance Requirements

System specification "detailed" Reliability analysis - reliability - availability - maintainability

e.g. Performance "Rough" analysis

Verification of parameters & data collection

Quality Information System Reliability evaluation Reliability evaluation

Load sharing redundancy IOP:MB0 IOP:MB0

Active/hot-standby redundancy SN0 SNMUX 0 SNMAT SNMUX 0 SN1 SNMAT

Redundancy structure SN/MB

Active/active load-sharing redundancy SSNC MB0 MB0 SSNC

Active/active load-sharing redundancy

SLMs DLU Non-redundant

LTG Trunks LTG

SSNC LIC LIC SS7 links ASN0

MB1 R3 R2 CCG1 GCG

LTG GCG GCG1

Active / Standby redundancy CP1 CP0

SSNC ASN0 ACCG ACCG ACCG LIC1 LIC0

ASN1 ACCG ACCG ACCG

Basic operation Initial start with last generation

- Any failed MP:SM links are restored; - Nailed-up connections re-established;

CP113E incl. clock CCG BAP CMY AMP IOC/IOP MB

travel repair travel + repair

DOWN travel to the site 4 dia c d r

DOWN on-site repair 5

DOWN 2nd failure during repair 12

c m repair travel routine

diagnosis rate coverage probability detection probability remote repair probability

Cumulative failures total failures found to date *)

additional test time required

SSNC CCG CP113E MB SN SSNC

Total System Downtime = Downtimei = 0.0177 min/year

1 020 1 031 659 726

0.0107 0.0059 0.0288 0.0163

1 006 1 013 643 711

0.0155 0.0113 0.0500 0.0294

Total outage SN LTGP SN LTG access MB MB

16 years 1.872 min/year

1 718 years 0.011 min/year

Trunk Downtime = Downtimei = 1.901 min/year

6.444 3.648 6.505 3.709

12.886 6.789 13.031 6.934

Total outage DLUG SLMA DLUG LTGP LTGP SN MB

Trunk Downtime = Downtimei = 2.362 min/year

1.575 2.078 2.362 2.865

2.912 4.686 6.026 7.801

5.823 9.372 12.081 15.630

S-ar putea să vă placă și